Unsafe rust and the borrow checker (multiple mutable borrows)

Consider the following small change to your code: (Playground):

#[inline(never)]
fn set(p: &mut i32, q: &mut i32, o: i32, v: i32)
{
    while *p == o {
        *q = v
    }
}

fn main() {
    let mut i1 = 1;
    let pointer = &mut i1;
    let pointer2: &mut i32;
    unsafe {
        pointer2 = &mut *(pointer as *mut i32);
    }
    println!("Set *pointer2 = 2");
    *pointer2 = 2;
    println!("==> Value of {:<8} is {}","pointer", pointer);
    println!("==> Value of {:<8} is {}","pointer2", pointer2);
    println!("Set *pointer = 5");
    set(pointer, pointer2, 2, 5);
    *pointer = 5;
    println!("==> Value of {:<8} is {}", "pointer", pointer);
    println!("==> Value of {:<8} is {}", "pointer2", pointer2);
}

what would you expect the output of this program to be? (Hint: run with "Release" in the playground)

2 Likes

Ah, that makes sense, but I think the documentation is misleading.

I see the point (I think). The compiler assumes that *p never changes it's value because *p should be an exclusive mutable reference and therefore replaces the while loop with a loop {}. But I know that we have structures to create "multiple" mutable references by using constructs like Arc<Mutex<Box<T>>>. How is this possible if this should be against the rules? (I know only one thread at a time can access the value, but doesn't this make it possible to have multiple mutable references)?

This sounds a little bit like circular reasoning. If the compiler enforces me to always uphold the contract (it doesn't let me do unsound things) but allows me to do this in unsafe code while holding me responsible for not violating the contract at the same time, why should somebody use unsafe rust at all if he or she is not allowed todo more things than in safe rust.

So the question is what am I allowed to do in unsafe rust which I'm not allowed to do in safe rust while upholding the contract with the compiler?

1 Like

In an unsafe block you gain the following "superpowers":

Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions

All other rules of safe Rust are respected, but if you violate those rules via one of the above actions, it's on you and not the compiler.

2 Likes

Here are some answers that I've posted before in URLO, which address your last two posts, plus a great Quote of the Week from @alice.

2 Likes

Thank you, I will look at the posts. I'm just a little bit confused, because there is a split_at_mut(idx) function in the standard library and it looks like it violates these contracts, doing similiar things like my code, but it seems they do not break optimization:

...
fn main() {
let mut a = [1,2,3,4,5];
let (b,c) = a.split_at_mut(3);
...

You need to look at the implementation of split_at_mut() to see why this is so: here and here. This is a good example of a programmer using unsafe correctly to perform a function too complex for the Rust compiler front-end to analyze while still complying with all the input requirements of the LLVM backend.

1 Like

I've inspected the source and it takes &mut self pointer (the slice) converts it to a raw mutable pointer and calls:
https://doc.rust-lang.org/src/core/slice/raw.rs.html#136
which returns a reference which is dereferenced and then mutable referenced again. What's the difference? Why is it safe to do that in these functions but not in my code?

The soundness rules for Rust must always be followed, even in unsafe code. If they are not followed, your unsafe code is simply incorrect. However, there are cases where the borrow checker is too strict and rejects code that does follow the soundness rules.

In the case of split_at_mut, it is correct because the two halves do not overlap with each other.

2 Likes

These structures make sure that there references are not coexisting (i.e. one of them is destroyed before other is created). If they didn't, yes, they would be unsound.

It’s not forbidden to have two mutable references to the same target if one is a so-called “reborrow” of the other. This can be done in safe rust

fn main() {
    let mut i1 = 1;
    let pointer = &mut i1;
    let pointer2: &mut i32;
    pointer2 = &mut *pointer;
    println!("Set *pointer2 = 2");
    *pointer2 = 2;
    /* no good */ // println!("==> Value of {:<8} is {}","pointer", pointer); // Outputs 2
    println!("==> Value of {:<8} is {}","pointer2", pointer2); // Outputs 2
    /* but it’s okay here */ println!("==> Value of {:<8} is {}","pointer", pointer); // Outputs 2
    println!("Set *pointer = 5");
    *pointer = 5;
    println!("==> Value of {:<8} is {}", "pointer", pointer); // Outputs 5
    /* no good */ // println!("==> Value of {:<8} is {}", "pointer2", pointer2); // Outputs 5
}

the point to notice here is that pointer is not usable while pointer2 ist still alive. This is why the first commented-out line won’t work, because pointer2 is used in the following line, so it must be still alive, and also why the final commented-out line won’t work. If you try adding either of the lines in you’ll get an error from the borrow checker. Now, a mutable reference created indirectly through a pointer that you got from another mutable reference, i.e. what your original code does, is totally fine. It’s going to behave like a re-borrow. However, the intermediate step of having a pointer takes the borrow checker out of the picture. While this code

fn main() {
    let mut i1 = 1;
    let pointer = &mut i1;
    let pointer2: &mut i32;
    unsafe {
        pointer2 = &mut *(pointer as *mut i32);
    }
    println!("Set *pointer2 = 2");
    *pointer2 = 2;
    /* no good */ // println!("==> Value of {:<8} is {}","pointer", pointer); // Outputs 2
    println!("==> Value of {:<8} is {}","pointer2", pointer2); // Outputs 2
    /* but it’s okay here */ println!("==> Value of {:<8} is {}","pointer", pointer); // Outputs 2
    println!("Set *pointer = 5");
    *pointer = 5;
    println!("==> Value of {:<8} is {}", "pointer", pointer); // Outputs 5
    /* no good */ // println!("==> Value of {:<8} is {}", "pointer2", pointer2); // Outputs 5
}

is totally fine, uncommenting one of the two commented-out lines will not result in a compiler error, yet it’s still something you’re disallowed to do. [“Disallowed” means it’s UB (“undefined behavior”) i.e. the compiler can do any kinds of nonsensical “optimizations” to your code, as explained above in the thread.] You can test the above code with miri and see how it only complains once you uncomment at least one of the two commented out lines.

A way of phrasing the rules for raw pointers here is: If you use a mutable reference again (in this case the reference pointer) that was used to create a raw pointer (the pointer as *mut i32), then that pointer is invalidated. Once that raw pointer is invalidated, you must no longer use any mutable reference that was created from dereferencing it anymore either (i.e. pointer2 must not be used anymore after pointer is first accessed again after the creation of the pointer as *mut i32 raw pointer).

As you can see, rules for raw pointers (*mut T) in Rust are really complicated and not even 100% finalized yet, so using unsafe in Rust is kind-of hard and in general, especially for some high-level application programming it’s a good idea to not use it if you don’t have to (and you “have” to use it way less commonly than you might think). Also, if you’re taking the source-code of the standard library as an example for unsafe Rust code, the standard library sometimes does things that you must not do as a user. It’s not the case for this example of split_at_mut, but it’s a useful thing to know. The standard library has some extra power in the form of: It’s written by the authors of the compiler and distributed and versioned together with the compiler, so they can rely on implementation details that are subject to change in the future and write code that’s “officially” UB but the compiler doesn’t do any “bad optimizations” to it yet. (In those cases, if user code was to use the same kind of code, it could break and introduce miscompilation in future compiler versions; the code in the standard library would be replaced / adapted together with those kinds of compiler updated.)

3 Likes

Another important point is that only &mut T references must be unique. There are other types of references such as &Cell<T> or Arc<Mutex<T>> or *mut T that do not have to be unique even if they allow mutation to happen through them.

OFFTOPIC:
I think that is something that really bothers me about rust. Because in my personal opinion a language should make it possible to write an efficient Standard Library without the need for "marker traits" or special attributes. It should not be tightly coupled to the compiler (as far as I know you do not need any special handling on the compiler side to be able to compile the C std library).

To quote @alice reply, I just think this is some sort of special handling. I'm wondering why this structures aren't built in types of the language if they cannot be fully reimplemented by anyone.

I'm used to code in C and it's really easy to do worksplitting (if you know what you're doing) and makes me really stumble upon rust, that I'm not able todo work splitting without many "workarounds" (which can be unefficient or unsound) to make the compiler happy. This is some reason I think you often have to use unsafe (to write efficient code).

I think this example is well known* (and does what it should do without any data races) but is not allowed by the compiler and requires the developer to use something like scoped threads or some more boilerplate (like Arc and Mutex) to achieve the same goal. None of them is really needed to ensure
correctness and safety in this case. Sidenote: Just a minimal example, normally you would operate on disjunctive chunks of data and do heavy operations in the threads

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=bb811062dc45e970e6cb3236cf7f10a6

use std::thread;

// Multi-Threaded: does not work
fn main() {
    let mut full_work = [1,2,3,4,5];
    let mut h = vec![];
    for i in &mut full_work {
        h.push(thread::spawn(move || {
            *i = 1;
        }));
    }
    for i in h {
        i.join().unwrap();
    }
    println!("Full work: {:?}", full_work);
}

/* // Single threaded: works
fn main() {
    let mut full_work = [1,2,3,4,5];
    
    for i in &mut full_work {
        *i = 1;
    }
    
println!("{:?}", full_work);
}
*/
  • meaning I'm not the first one to stumble upon this

Well, they are needed because the compiler doesn't understand in detail what code does. It can only do “so much” with static analysis, it cannot - for example - know where in your code you're guaranteed to have called .join on every element in a vector before leaving the function. Hence it rejects the program, because sound static analysis tools need to be conservative. Note that you can add unsafe code to your example in order to promise to the compiler that you aren't doing anything wrong. (E.g. you can extend the lifetime of i.)

Using unsafe code is always tricky. Even your “well known” “and does what it should do” example above doesn't do what it should do AFAICT. (Explanation follows, please correct me if I'm wrong.) When the OS fails to create a thread, then one of the thread::spawn might fail and panic with the effect that previously spawned threads aren't joined but detached while the main thread unwinds out of the main function. Any of the detached threads still running that and accessing i would be a case of use-after-free. The example would be even more clearly wrong if this wasn't in the main function, but I believe even for main there is a point in time in the execution where the stack frame of main is dropped but unwinding has not finished and terminated the program yet.

The reason why people prefer to use things like scoped threads is because we like Rustʼs feature of “fearless concurrency” which means not having to deal with subtle bugs like the one described above. The creators of crossbeam::scope can handle all the subtle corner cases for us and we can stay within the realm of safe Rust. (Also, scoped threads are arguably even easier to use because you don't need to handle the vector fo handles and the calls to join yourself.)

That was my point. The compiler is not able to figure that out by itself and requires you to use additional data which is not necessary for the execution of the program.

I think if you do not handle a spawn failure (I admit this should be done in the above example) you're code is broken anyway. This can easily be catched with Builder in std::thread - Rust

And as far as I know crossbeam is considered to be unsound (as far as I know it used to live in the standard library as scoped_threads but was removed because of this).

As far as I know the unsound version of scoped threads that used to be in the standard library was using a struct and its destructor instead of a closure. The API that crossbeam offers is different and should be sound.

1 Like

BTW, I do find it odd that thread::spawn panics by default rather than returning a Result<>. Historical artifact? It should either be changed (breaking backwards compatibility intentionally) or deprecated.

2 Likes

Hmm... That is not so clear to me.

If your system cannot spawn threads then it cannot run. End of story, panic.

The system is already running, it just can't spawn an additional thread for whatever new thing it wants to offload. This is a relatively frequent situation. You can of course conjure up all possible scenarios, but there are many where you'd want to either continue (perhaps do some cleanup) or at least gracefully shut down the threads you have already running. And remember that Rust is a language that wants the programmer to handle situations such as lock poisoning (for which there is at least as strong of a case for panicking by default).

Of course, there'll be limited benefit if programmers simply add an .unwrap(), but that seems to be a problem that Rust's error handling strategy faces in general.