Understanding safety considerations of static promotion of pointers to mutables and ZSTs


#1

Hello everyone!

I was very pleased with the announcement of Rust 1.21.0 a few days ago. One of the most important language changes listed by the release notes was the automatic promotion of certain constant expressions to 'static lifetime. While reading about this feature, I have encountered the corresponding RFC. The “Extensions” section thereof contains the following assertions:

It would be possible to extend support to &'static mut references, as long as there is the additional constraint that the referenced type is zero sized. […] The zero-sized restriction is there because aliasing mutable references are only safe for zero sized types (since you never dereference the pointer for them).

and

There are two ways this could be taken further with zero-sized types:

  1. Remove the UnsafeCell restriction if the type of the rvalue is zero-sized.
  2. The above, but also remove the constexpr restriction, applying to any zero-sized rvalue instead.
    Both cases would work because one can’t cause memory unsafety with a reference to a zero sized value, and they would allow more safe code to compile.

(emphasis mine)

It’s not immediately obvious to me how either of these could be true.

First, I’m puzzled by the wording “you never dereference a pointer to zero-sized types”. Who is never dereferencing such pointers? Surely I do if I write *&(), do I not? Does the RFC author mean that the complier is guaranteed to always turn a deref-of-pointer-to-ZST into a no-op/the singleton value for the ZST/magic? And even if this is the case, how does it guarantee memory safety in the presence of shared mutable pointers?

Second, I feel that “one can’t cause memory unsafety with a reference to a zero sized value” is pretty strong a statement. There is an entire chapter in the Rustonomicon on obscure soundness holes introduced by improper treatment of ZSTs. Could someone also shed some light on what is meant by this one little sentence in this context?


#2

Dereferencing a pointer to a zero sized type reads zero bytes of data. All addresses (other than 0) are valid places to read nothing from.


#3

Gross! This sounds like an awfully nasty edge case to introduce into the rules of safety, and to what end?

Every new landmark piece of functionality that becomes safe can retroactively introduce “bugs” into unsafe code that was formerly perfectly correct, simply by changing what “safe” is.

pub fn function<T>(a: &mut T, b: &mut T) {
    // signal something to the optimizer
    // (we obviously can't get aliasing references from safe callers,
    //  and even if we get them from an unsafe caller, that's their bug)
    if (a as *mut _) == (b as *mut _) {
        unsafe { ::std::intrinsics::unreachable(); }
    }
    // ...   
}

(now this is the part where somebody shows me that my code is NOT perfectly correct)

(ahh I think it IS incorrect, hang on…)


#4

Mutable references never alias. Nothing has changed with respect to that.

EDIT: Oops, I misinterpreted that.

I guess the way to think about it is that the allocated regions of mutable references never alias. If the references are to zero sized types, then the allocated region is of size 0 and the pointers can’t alias by definition.


#5

Turns out my unsafe code is incorrect. In the RFC they have an example of “code that compiles today”, which inspired me to write this snippet:

fn get_an_alias<'a>() -> &'a mut [i32; 0] {
    &mut []
}

fn main() {
    let a: &mut _ = get_an_alias();
    let b: &mut _ = get_an_alias();
    assert_ne!(a as *mut _, b as *mut _);
}

which fails both with and without optimization:

   Compiling playground v0.0.1 (file:///playground)
    Finished release [optimized] target(s) in 0.88 secs
     Running `target/release/playground`
thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `0x559d53f1b1a4`,
 right: `0x559d53f1b1a4`', src/main.rs:9:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

So it appears that this isn’t actually new ground for “surprising things that can happen in safe code.”


#6

This is also a way to produce empty slices with the same address without any static promotion:

fn main() {
    let mut v = vec![1, 2];
    let (a, b) = v.split_at_mut(0);
    let b = &mut b[..0];
    println!("{:p} {:p}", a.as_mut_ptr(), b.as_mut_ptr());
}

#7

Alright, thanks for all the insightful replies! So it seems to me that the key observation is @sfackler’s:

[…] the allocated regions of mutable references never alias.

That makes sense; so basically, optimizations (and unsafe code, and everything else) are not permitted to assume that there are never two &muts that point to the same address; what they are allowed to assume is that no two actual writes or a write and a read will occur from the same address, while being guaranteed that ZSTs don’t result in any reads or writes.

It was also very useful to see how it is possible to create equal &muts without static promotion. Now I’m wondering if that use of split_at_mut() is in fact sound or just an annoying edge case that the designers of this function didn’t anticipate. I would love to believe it is the former…


#8

In light of what you wrote above this paragraph, why do you think this might be unsound? Aliasing can be an issue when you have reads and writes to a location; if you prevent that, then it doesn’t really matter that the addresses might be the same. Immutable references can have equal addresses but we don’t care about that aliasing because you can’t write through them. Similarly, a mutable zero-length slice cannot be written through so its base address being equal to some other base address shouldn’t matter.


#9

Yes, that’s right. I thought there might be some obscure case where it’s not only the actual writing of data that matters, maybe some optimizations that really assume two addresses are different, not only for the purpose of moving data around, but for some other action that might affect observable behavior (such as the example provided by @ExpHP). But as long as that is not the case, it’s of course all right.


#10

I can totally see someone getting tripped up with code like @ExpHP’s, so it is a concern. However, they’d need to do something unsafe to yield unsoundness in their own code; otherwise it would be “merely” a logic bug, I think.

In general, once you start examining raw pointers you need to be certain you understand the impl details - all that stuff like ZSTs and fat pointers (both for slices and trait objects) leak out.