Overzealous borrow-checker (IMHO)

I've been struggling with understanding this particular issue with the borrow checker:

I have this code, which works (simplified version):

use std::iter;
use rand;

fn main() {
    let blocksize = 16;
    let mut data = iter::repeat_with(rand::random::<u8>).take(48).collect::<Vec<u8>>();
    for chunk_index in 0..data.len()/16 {
        data[chunk_index * blocksize + 3] = 0;
        println!("data is {:?}", data); 
    }
}

playground link

This works, but is kind of ugly. What I'd like to do is something more like this:

use std::iter;
use rand;

fn main() {
    let mut data = iter::repeat_with(rand::random::<u8>).take(48).collect::<Vec<u8>>();
    for chunk in data.chunks_mut(16) {
        chunk[3] = 0;
        println!("data is {:?}", data);
    }
}

playground link

which is nicer, but gets the following compiler error:

error[E0502]: cannot borrow `data` as immutable because it is also borrowed as mutable
 --> src/main.rs:8:34
  |
6 |     for chunk in data.chunks_mut(16) {
  |                  -------------------
  |                  |
  |                  mutable borrow occurs here
  |                  mutable borrow used here, in later iteration of loop
7 |         chunk[3] = 0;
8 |         println!("data is {:?}", data);
  |                                  ^^^^ immutable borrow occurs here

Now, I understand that I'm violating the borrow rules here by borrowing mutably and then immutably, but it seems like the borrow checker rules don't make much sense here, since a) the two code snippets are doing, AFAICT, exactly the same thing, and b) I really can't see where a data race could occur here since the print has an immutable reference, and neither it nor the enclosing code can modify the data while the function has the reference.

Could someone please enlighten me as to what I'm missing here? It's really driving me nuts.

-Jack

I think you just need to think eviler -- chunks_mut has exclusive access to data, so could uphold its contract by rotating the slice each time, and fixing it when the iterator is dropped. (Obviously that would be a silly way to implement it, so it doesn't, but.)

Basically, you know it would be ok here because you know what the function does. The compiler has to assume the worst possible case for what it might have done.

1 Like

Thanks for the response! That's certainly interesting, but I guess I still don't understand how that would cause a race condition (or maybe I'm misunderstanding your point here).

Even if chunks_mut, say, completely scrambled the data on each iteration, my function (println! in this case, but it could be any function that takes an immutable reference) would still have read-only access to the data, and while it has that access chunks_mut couldn't modify it (unless we're assuming it's doing Evil Things in another thread or something), so I still don't see the race condition the borrow checker is trying to avoid.

Again, it's entirely possible I'm missing something obvious. ¯\_(ツ)_/¯

-Jack

Mutable borrows are guaranteed to be unique borrows for the duration they're held, not just when code is accessing them.

Here's some simpler (and similarly rejected) code showing the same rule:

let x = 0u32;
let y = &mut x;
*y = 5;
println!("{}", x);
*y = 3;

(playground)

This rule allows anything holding a unique borrow to some data to put the data into a bad/invalid state with reasonable assurance that as long as the invalid state is cleaned up before the unique borrow is ended, everything will be fine.

Before the crisis surrounding std::mem::forget shortly before 1.0, a structure holding a unique borrow was allowed to leave data in an unsound state as long as it had a destructor which restored valid state.

Since forget is now safe, though, I can't think of a specific reason why this can't be allowed. Structures can't leave data they uniquely/mutably borrow in an unsound state because they could be forgoten at any moment and subsequently the access would be allowed.

Without that requirement, the only three reasons I can think of that this is disallowed are a) it's a leftover rule from before the forget crisis that was never fixed, b) it's left in for consistency with some other (legitimate) rule, c) it makes it harder to misuse structures like Drain which leave data they mutably borrow in an incorrect (but sound) state.

1 Like

Not evil, per se, but in theory it could move data to another thread (eg to prefetch data), manipulate it there, and send chunk back to you via a (properly synchronized) channel. In such a case, you can hit a data race when printing it, which is unsound.

chunks_mut of course doesn’t do that, but as @scottmcm said, that’s because we know what the method does but that isn’t known to the compiler.

Separately, you can collect into a Vec<Cell<u8>> and iterate immutably, if you’re really keen on the chunks approach. Then you can manipulate the value and print the array inside the body.

4 Likes

Another thing to consider is that &mut references being unique lets the compiler generate more efficient code, like restrict in C (although &mut references in Rust are far more common than restrict is in C). Consider this similar snippet:

let mut ones = vec![1; 48];
let mut data = iter::repeat_with(rand::random::<u8>).take(48).collect::<Vec<u8>>();
for chunk in ones.chunks_mut(16) {
    chunk[3] = 0;
    println!("data is {:?}", data);
}

Because data and ones are known to be distinct, the compiler can compile this to only load data once. If messing about with ones can change data, the compiler has to load data on each iteration. By the time you get to codegen, it's likely the only way the compiler can know ones and data are distinct is because ones is mutably (exclusively) borrowed -- that's how it knows it can lift the load of data out of the loop.

&mut means exclusivity. If you don't need exclusivity, but just mutation, that's what Cell and friends are for.

1 Like

@vitalyd:
Not evil, per se, but in theory it could move data to another thread (eg to prefetch data), manipulate it there, and send chunk back to you via a (properly synchronized) channel. In such a case, you can hit a data race when printing it, which is unsound.

But, if the function can write to the data in a separate thread, then all bets are off, right? I mean, if I have this code:

let data = iter::repeat_with(rand::random::<u8>).take(48).collect::<Vec<u8>>();
{
    evil_function(&mut data);
}
println!("data is {:?}", data);

evil_function could spawn a thread and be doing all sorts of Bad Things to the data when the print is trying to access it, but the compiler will allow this code, because the &mut goes out of scope. There really doesn't seem to be any way for the compiler to prevent this (which is fine, IMHO, there's only so much it can do), so it seems like trying to prevent it in the other case is kind of useless.

You won't be able to send &mut data to another thread, at least not with the standard library (i.e. thread::spawn requires the closure given to it is 'static, which it won't be if it's capturing a reference to data) or safe code.

There are ways to allow operating on a &mut T on another thread, such as using the crossbeam crate, but those ensure (via their API and implementation) that (using your example) evil_function() doesn't return until the background work is done.

Avoiding data races like this is a core tenet of Rust, so you won't (or shouldn't!) find any violations like that.

@trentj:
&mut means exclusivity. If you don’t need exclusivity, but just mutation, that’s what Cell and friends are for.

That makes sense (thank you!!), but in that case it seems like calling it mut is misleading. Also, it kind of seems like that makes chunks_mut kind of useless - why would you want something which chunks a data structure in a mutable way when you can't access the data while iterating on it? I mean, the idea of a mutating iterator makes sense (and is present in a lot of other languages), but an iterator which controls an exclusive lock on the data, preventing even read-only access to it while iterating, seems kind of useless, unless I'm missing something.

Interesting, I wasn't aware of that, thanks! (and thank you for your patience in explaining this all to me).

I guess I still don't understand, though, in your other example, how having the function modify the data and return it in a channel would be a data race with printing it, as long as the print (or, in general, read access) happened after the data was returned in the channel (which it seems like it would have to be).

Thanks for this (and to @trentj for also mentioning it). I guess, as I mentioned to him, it seems like if all this is true, then chunks_mut seems kind of useless, if I can't access the data (in a read-only fashion) while mutating it. But, maybe that's just how it is. ¯_(ツ)_/¯

Again, thanks for everyone's patience in explaining this to me - I'm still climbing the Rust learning curve it seems. :slight_smile:

The data race would be with that background thread modifying some part(s) of the vector, and your thread trying to print it. In general, let's assume we're dealing with an opaque type rather than a Vec. In that case, we don't know what fields that type reads when its Debug impl is triggered (as part of the println!()). But while some of those fields are being read by its Debug::fmt(), the background thread is modifying them as well, leading to a data race.

In our hypothetical example, chunks_mut guarantees that the only valid thing you can touch (read or write, as it happens here) are the chunks (slices) it's returning to you - the rest of data is not accessible.

Ah, okay, I guess that makes sense.

Inconvenient, but sensible. :wink:

Thanks again!!!!

Yeah, this has been discussed to death and the two main camps seem to be:

  • It would be better named uniq or some other name, but it's impossible to change now
  • mut gives the right intuition most of the time and it's harder to explain uniqueness than mutation, so it's best the way it is.

This is a slight mischaracterization. You can access the data while iterating, you just have to do it through the exclusive borrow taken when you called chunks_mut. You can't do an end-run around an exclusive borrow by using a different reference to the original data; that would defeat the whole point. But you can access the chunk currently being iterated over, and you can access chunks from previous iterations, if you save them.

It's really a fundamental property of iterators: they only give you one thing at a time. In other languages, you may sometimes ignore this by accessing data without going through the iterator. The code you wrote would work translated into C++, for instance. But then C++ allows you to invalidate iterators, leading to data races and memory unsafety, precisely because it has nothing analogous to &mut that would stop you.

1 Like

Yep, that's fair. I guess the hardest thing for me in learning Rust is coming across these programming patterns that I'm used to that Rust disallows, and understanding exactly why they're disallowed. Mostly I get it, but occasionally I come across something like this where I need someone to explain what's wrong with it like I'm a five-year-old. :smile:

Most of us start Rust with a strong aversion to the "inconvenient" borrow checker. Eventually we learn to thank it for saving us countless hours of bug chasing, particularly in multi-threaded programs. Closing the door to most malware attacks is a not-inconsiderable additional benefit.