How to move a Box while holding a reference to the element inside it?

Hello.
As far as I know, Box is a unique pointer to an object allocated in the heap.
Does that mean that when the Box itself is moved, it still points to the same location in the heap? That makes sense to me, because it's not necessary to relocate the object in the heap just because we moved a pointer to it.
If that's correct, then consider the following Rust code:

fn main() {
    let orig_box = Box::new(1);
    let reference: &u32 = orig_box.as_ref();
    let moved_box = orig_box;

    println!("reference: {:?}", reference);
}

The compiler outputw the following error:

error[E0505]: cannot move out of `orig_box` because it is borrowed
 --> src/main.rs:4:21
  |
2 |     let orig_box = Box::new(1);
  |         -------- binding `orig_box` declared here
3 |     let reference: &u32 = orig_box.as_ref();
  |                           ----------------- borrow of `orig_box` occurs here
4 |     let moved_box = orig_box;
  |                     ^^^^^^^^ move out of `orig_box` occurs here
5 |
6 |     println!("reference: {:?}", reference);
  |                                 --------- borrow later used here

For more information about this error, try `rustc --explain E0505`.

In the code above, I create a Box with a number, then get a reference to the number contained in the box with as_ref, try to move the ownership of the Box and then print the referenced element.
I expect the code to work because when I move the Box, I expect that the reference to the element contained in the Box will not be changed.

Why Rust doesn't allow me to use a reference to the element that was contained in a Box that was moved?
Thank you!

1 Like

I would suppose that Rust can't track if the moved_box is dropped (which would free the memory pointed to). So the lifetime 'a of &'a u32 only lives as long until you move the orig_box.

Side note: I would consider it bad practice to use AsRef::as_ref to borrow from the Box. The following works just fine:

fn main() {
    let orig_box = Box::new(1);
    let reference: &u32 = &orig_box;
    //let moved_box = orig_box;
    println!("reference: {:?}", reference);
}

(Playground)

Also see documentation here and/or Semantics of AsRef on IRLO for some more considerations on .as_ref().

There's nothing in the types that would allow the compiler to deduce that the memory is heap-allocated. References to heap-allocated memory have exactly the same type as references to any other memory. As far as the signature of Deref/AsRef are concerned, the returned reference might as well point directly inside the Box-as-a-struct, so the compiler doesn't stand a chance of proving that they are not invalidated.

By the way, self-referential types are not possible to usefully construct and use in safe Rust. You should not try to do that. Instead, you should redesign your data structures by separating long-term owned data from short-term borrowed data.

2 Likes

I think it doesn't matter if it's heap-allocated or not. I think that &orig_box as &i32 should live as long as orig_box isn't mutated or dropped (and in the given example it isn't until after the println! line).

So the explanation for the compiler to reject the example in the OP should be something else, I think.

The Box itself is alive until that line, but orig_box has been moved and thus invalidated.

1 Like

I don't really see what you are getting at. The reason is what I wrote. The lifetime of the reference obtained from the box is tied to the lifetime of the box instance itself, because Deref and AsRef both have signature (&Self) -> &Target, which would allow returning a reference to a direct component of self. So even though moving the box doesn't move the referent (of which heap allocation is the reason), the compiler doesn't know that.

That's clearly wrong. A direct reference would be invalidated by moving the owner. Consider the following useless container:

struct InlineBox<T> {
    value: T
}

impl<T> Deref for InlineBox<T> {
    type Target = T;

    fn deref(&self) -> &T {
        &self.value
    }
}

By your argument, the following should be accepted, but in reality it causes a dangling reference:

let b = InlineBox { value: String::from("hi") };
let r = &*b;
let other = b; // b is moved, reference is no longer valid
dbg!(r); // BOOM
1 Like

For technical reasons, even if the address of the heap part of the box does not change, any derived references to it must be invalidated when moving the box. If you're interested in those reasons, RFC#3336 contains a summary and a plan to offer a sound but unsafe workaround.

4 Likes

I guess the thing I'm wondering is: why can't the Rust compiler (in theory) deduce that the Box is moved but stays alive while the function is running. This is what I tried to say here:

I would argue that in case of "simple" moves like in the example of the OP, there is no fundamental reason why the compiler couldn't figure out that the Box will be around as long as moved_box is around. Afterall, the move isn't involving passing the Box to some other function, but it (the outer, pointer-part of the `Box) is still on the stack.

That would mean that all references would need to be actively updated. That's precisely the "move constructor" territory that Rust is trying to avoid.

Language design shouldn't be based on the "simple" cases, and features should not be added merely because it is possible to add them.

The simple cases are, tautologically enough, simple. A language is well-designed not when it lets you special-case simple things, but when it lets you do many things of arbitrary complexity with a uniform, small core toolset.

Allowing references while moving "in the simple case" would result in a special exception that would very quickly turn code into an unmaintainable mess for at least two reasons:

  1. It's an anti-pattern, along with all the dubious stuff people "need" it for, the chief example being self-referential types. Rust's reason of existence is precisely to disallow this sort of code. It's dangerous even when it is (or could technically be made) memory-safe, because it strongly correlates with an incomplete or incorrect understanding of memory management and a lack of properly designed data structures in the domain model.
  2. If the compiler allows violating the borrowing and ownership rules in the simple cases, then crossing whatever arbitrary complexity threshold the compiler is programmed to understand will suddenly start resulting in compiler errors. Those errors will then be much harder to fix, because at that point, the whole structure of the already existing code will have been organized around the hacky case, and the programmer will have to untangle it all and rewrite it properly, respecting ownership. (This is similar to the reason why Rust typechecks generics upfront instead of doing it at instantiation time.) It's much simpler and better if ownership is respected from day 0.

Not to mention that having to actively update all references in case their owner ever changes would incur an overhead on every reference, basically. Either all references would need to apply double indirection (a reference's address would need to be added to a list related to its owner in order to update it upon every move), or the owners would have to always be indirect (which basically amounts to garbage collection). Neither is a realistic option in this language.

Don't get me wrong, I didn't want to propose it should be added. I was just curious if there are fundamental reasons against it.

I guess you could say similar about non-lexical lifetimes (NLL), but not sure.

There are comparable design errors in the language, unfortunately. For example, the temporary lifetime extension, whereby let val = &expr; is essentially special-cased to mean let ref val = expr;, and which confuses the heck out of beginners who otherwise would correctly grasp the concept of lifetimes and borrowing. It's possible and it exists, but the language would be better off without it.

In those simple cases the compiler could figure out that the reference is still valid... but is that useful and worth the additional complexity in the compiler implementation? The fix is to just not move the box into a new binding, which you should already do because it's mostly useless.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.