Question 4
Say we have a function that moves a box, like this:
fn move_a_box(b: Box<i32>) {
// This space intentionally left blank
}
Below are four snippets which are rejected by the Rust compiler. Imagine that Rust instead allowed these snippets to compile and run. Select each snippet that would cause undefined behavior, or select "None of these snippets" if none of these snippets would cause undefined behavior.
Snippet:
1. let b = Box::new(0);
2. move_a_box(b);
3. let b2 = b;
I actually only care about 1 of the 4 snippets, and I omitted the other 3.
My analysis: In the above snippet,
The variable b owns the Box after line 1.
In line 2, ownership of the Box is moved to function's parameter a. When the function ends, the ownership isn't passed on. The Box is deallocated from the heap.
In line 3, b is not the owner of anything. Thus I believe b2 does not receive ownership of any heap allocated Box.
The snippet ends, and the free is performed on any variables that own heap memory. But neither b2 nor b own any heap memory at this point.
Where is the undefined behavior in this? I believe there isn't a free-already-freed memory op, because no variables own any heap memory.
The Brown quizzes are, in many Rust users’ opinion, overly focused on a particular model of what it would mean to “imagine that Rust instead allowed these snippets to compile and run”. There is a sense in which there is no truly correct answer to these questions. But, I can say this about your analysis:
In line 3, let b2 = b;, the right-side expression b must produce a value that is to be bound to b. But, it cannot, because that value was moved. Therefore, it is erroneous. There is no defined way to transport “lack of a value” from b to b2; there is no sense in which “b2 does not receive ownership”; that is not a thing that exists in the language.
In practice, the compiler rejects let b2 = b because b contains no value to move. If we imagined treating every value as if it was Copy, so you never see a “use of moved value” error, then the result would be that b2 contains a copy of the Box value that was in b, and so there is a double free because b2thinks it owns the Box and hence its heap allocation, that was already deallocated in move_a_box.
In addition to the excellent answer above, it may be helpful to recognize:
Since variables must be occupied and moves are effectively no-ops, the value of b must retain its previous value after move_a_box returns in this hypothetical, bad dialect of Rust.
As far as I can tell, this would not be a real issue for a number directly assigned to b. In that case, b assigned to the function parameter is an entirely separate item in the stack.
In this case, with a Box, the data variable is moved, and the data will also be deallocated.
I assume that the issue with double frees is something like this:
b's pointed-to data is freed upon exiting moving
That location can now be used by other data.
Then you'd be deallocating that data in the second free.
This problem cascades down.
You'd also be able to read where b2 points now (which again, can store new data from some other write.)
Maybe that's not the main issue, I'm just exposing my interpretation.
PS: one piece that does not fit the reasoning above, is that given that b is moved then b2 is nothing, because b does no exist anymore. So it's possible that that is the issue instead.
If "Does not receive ownership" doesn't exist in this language, then I'm guessing the compiler marks b2 as the owner of its pointee, regardless of whether or not b "had prior ownership"? And then in this hypothetical situation, b2's value is a pointer to the same memory that b is pointing to. Then it makes sense that there is a 2nd free on the memory.
Could you say there's an "Ownership" system or mechanism within the Rust compiler, but after compilation, the only evidence of "ownership" is in the lack of unsafe assembly? Is Ownership something that can be seen in the source code of the Rust compiler?
Currently Im thinking the compiler has some system of tracking ownership of the various heap objects. Or maybe ownership is tracked within each pointer variable on the stack. In either case, I feel I'd need to look at the compiler code to see what it would actually do on line 3.
It does seem like the double-free is where the undefined behavior comes in. But it's my lack of understanding of the ownership system that's the trouble here.
It seems that maybe the ownership system's behavior is itself undefined here? But certainly there's an implementation of ownership; it exists in the currently available compiler.
If we modified the compiler to allow this to compile, what would the compiler do? Perhaps it would bomb before producing inspectable assembly. But if not, what would the assembly look like?
Thanks for sharing Aquascope. It looks to be quite useful for the journey ahead!
“Ownership” per se is not actually implemented solely by the compiler; ownership is the principle that for every value, something is solely responsible for dropping that value when it is no longer needed,[1] but that thing can be the compiler (in the case of a value owned by a variable) or a library (in the case of a value owned by a data structure containing indirection). The compiler is not tracking “who is the owner of this value”; it is only tracking “at this time in the execution of the code, does this variable currently contain a value or not”, and ownership emerges from that.
The compiler’s part of tracking ownership is the tracking of whether a variable has been moved out of and cannot be accessed. So, if you suppose that you are removing the rule that a variable may not be accessed after it has been moved out of, then that is changing the implementation of ownership. If you make the change of treating every type as Copy, then you get double frees when you make use of those copies. If you make a different change, you get a different result.
One of the most insidious things about UB is that the results are unpredictable -- "anything could happen". It's a common mistake to think that one can apply some relatively straight-forward reasoning about how source code translates to asm or machine code, even in the face of UB, but compilers haven't been that simple for decades. And their algorithms are continually being updated, too.
Keep in mind that UB is defined at the language level. Compilers interpret UB situations as "this is guaranteed never to happen" and aggressively exploit that in multiple ways, which can have emergent, unpredictable results.
(A big problem with the Brown book is that they do not consider everything which is UB in Rust, and they assume that the results of "if this compiled" would be something particular and yet unspecified. The result is that you have to intuit the answer they were looking for, or the narrow point they were trying to make, in order to answer "correctly" according to them.[1] And another big problem is that such an approach gives the impression that you could do various things with unsafe which would in reality still be UB, and/or that you could properly reason about unsafe by only considering the subset of UB which they have decided to focus on.)
Note that this is not the Rust book, it's a fork of the book.
It would totally depend on how one would define the semantics of such a snippet. In Rust it's meaningless. It's like asking "imagine Rust allowed let x = 7 ^^^ 3 to compile and run, what would that result in?".