Moving out of a function both owned and borrowed entities properly?

I think I finally found a good solution to the original problem. Discussion.

Original problem was to implement a function returning both owned and borrowed entities. The borrowed entity is not a reference, but an object consuming a reference borrowed from the owner. As it was stated such a function produces a self-referential structure.

In the original problem were 3 enitities:

  • dst_buffer : Box<[ f32 ]>
  • dst_bytes : &mut [ u8 ]
  • dst_cursor = Cursor::new( dst_bytes )

dst_bytes are borrowed from dst_buffer and dst_cursor consumes dst_bytes:

dst_buffer <- dst_bytes <- dst_cursor

Breaking association between dst_bytes and dst_buffer fix the problem:

      let dst_bytes : &mut [ u8 ] = unsafe
      {
        let dst_slice = dst_buffer.as_mut_byte_slice();
        std::slice::from_raw_parts_mut( dst_slice.as_mut_ptr(), dst_slice.len() )
      };

So now it looks like that:

dst_buffer -x- dst_bytes <- dst_cursor

But exposing such fields may lead to dangling pointers problem if a user will reallocate memory of dst_buffer somehow. So my solution hides fields and exposes only references so that no reallocation can happen from user-land.

Solution
Note: it requires byte-slice-cast = "1.2.0"

Is something wrong with my solution?

The main problem here is that dst_buffer lets you access the heap allocation while a mutable reference to it exists. Mutable references require exclusive access, so you are not allowed to access the buffer directly until you have destroyed the Cursor.

-pub fn dst_buffer(&mut self) -> &[f32] {
-    &self.dst_buffer[..]
-}

Additionally, you should not have a lifetime on the struct. Lifetimes are for references to things outside the struct.

Another issue here regarding the aliasing rules for boxes. The short story is that a Box asserts exclusive ownership of its heap allocation whenever it is used, including when it is moved, which includes when you move the struct containing it. The long story can be found here. Usually the way to fix it is to replace the Box with a raw pointer (which you usually wrap in another struct that has the right destructor).

struct AliasedBox<T: ?Sized> {
    ptr: *mut T,
}
impl<T: ?Sized> AliasedBox<T> {
    fn new(b: Box<T>) -> Self<T> {
        let ptr = Box::into_raw(b);
        Self { ptr }
    }
    
    /// Safety: Once a reference is created using this method, it is
    /// invalid to access the contents of the box in any way (including 
    /// dropping the box) until the last use of the returned reference.
    ///
    /// Moving the aliased box does not count as a use for the purposes
    /// of the safety of this method.
    unsafe fn get_mut_ref(&mut self) -> &'static mut T {
        &mut *self.ptr
    }

    /// Calling this method counts as a use of the box for the purposes
    /// of `get_mut_ref`.
    fn into_box(self) -> Box<T> {
        let b = unsafe { Box::from_raw(self.ptr) };
        std::mem::forget(self);
        b
    }
}
impl<T: ?Sized> Drop for AliasedBox<T> {
    fn drop(&mut self) {
        unsafe {
            drop(Box::from_raw(self.ptr));
        }
    }
}

Next, fields are dropped in the order they are declared in the struct, so you are dropping the Box before you are destroying the Cursor, which has a mutable reference to the Box. This is not valid, and you should swap the order of the fields.

The next issue is that making the Cursor accessible from outside the Context, you make it possible for the user to grab the mutable reference to the aliased box and hold on to it until after the context is dropped. They can do this by calling Cursor::get_mut and using std::mem::swap to swap out the mutable reference inside the Cursor for some other static mutable reference, e.g. a mutable reference to a global variable or leaked box. To avoid this issue, you should define methods on the Context that use the cursor in only safe ways, and have the user access the cursor only through such methods.

// or perhaps implement the `Write` trait for `Context`
impl Context {
    pub fn write(&mut self, data: &[u8]) -> std::io::Result<usize> {
        std.dst_cursor.write(data)
    }
}

Finally, I assume you want some way to access the box after using the cursor. You can define a method like this:

impl Context {
    pub fn into_buffer(self) -> Box<[f32]> {
        drop(self.dst_cursor);
        self.dst_buffer.into_box()
    }
}

Note that the safety of this relies on the fact that the cursor is destroyed before the AliasedBox is accessed again.

3 Likes

The dst_cursor method is unsound because it exposes a reference with the 'a lifetime, which is choosen by the user. For example this compiles:

let buffer_used_after_free: &'static mut [u8] = {
    let mut context = context_make();
    let cursor = std::mem::take(context.dst_cursor());
    cursor.into_inner()
};
2 Likes

Thank you for the response. I am trying to digest the answer.

I know about the term "unsound" from formal logic. But it seems you use the term in a different meaning. What is it from? What is the origin of the term? Any reading?

Unsound means that it violates one of Rust's safety rules. An equivalent way to say it is that any program that is unsound has "undefined behavior".

2 Likes

What is origin of the word? Is it Rust-specific term? Any reading?

An equivalent way to say it is that any program that is unsound has "undefined behavior".

I thought it is something what could be used inappropriately.

The origin of the word is mathematical logic. It is also used in other languages such as C++. Ralf Jung's blog has several articles that are relevant to this. Here is one of them: "What The Hardware Does" is not What Your Program Does: Uninitialized Memory

2 Likes

So it's not thread safe? But ok if used in single thread?

Could you please elaborate? Any reading?

No it is not. Mutable aliasing is unsound even in a single-threaded setting.


At this point, my advice would be that you should really focus on: 1. learning safe Rust first, and 2. restructuring your problem/solution so that it does not require unsafe. As we have been, perhaps unsuccessfully, trying to demonstrate to you, using unsafe correctly for manual memory management is hard, let alone at the stage when you didn't even grasp the memory model of safe Rust yet. Don't try to eat the whole language in a single bite, because it will only cause harm rather than good – the summary of Alice's answer above could be that your solution is still grossly incorrect and full of undefined behavior. Please, spare future downstream users of your code some bugs. Stay safe.

1 Like

Hm.. Thanks

Mutable aliasing is unsound even in a single-threaded setting.

Why?

2 Likes

There are many reasons. To give you the quickest, archetypical example: "iterator invalidation". When you are iterating over a collection, and you are simultaneously trying to mutate it, its backing heap allocation may move around, and any pointers you hold to the contained elements would then be dangling.

Here's an illustrative example on the Playground. It produces the following output:

timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11:     8 Segmentation fault      timeout --signal=KILL ${timeout} "$@"
2 Likes

Regardless of any concrete example, it's UB by definition in Rust. Which means even if it happens to work in your particular circumstance today, it may not tomorrow. Rust is free to make transformations on assumptions than no aliasing occurs, and the markers it passes on to LVVM convey that it is able to optimize based on this assumption as well. And these transformations can be unpredictable.

Moreover, note that it's not only about optimization, but about semantics -- the guarantee that no one else can visit an object so long as a unique (mutable) reference is live can be crucial to the correctness of many algorithms, multi-threaded or single-threaded. For example, depth-first-search on a graph with an auxiliary interior-mutability marker field for whether or not a node has been visited.

2 Likes

I don't think unsound directly means that something has undefined behaviour. I generally use it to indicate that a set of API could lead to undefined behaviour if misused by safe code, but it doesn't always do.

  • context_make() creates a new Context<'a> where 'a is choosen by the caller, because the function is generic over the 'a lifetime, so the caller can choose which lifetime to use to instantiate it. In practice this means that the borrow checker is free to make it bigger, unless it is further restrained (but this is not the case)
  • Then context.dst_cursor() returns a &mut std::io::Cursor<&'a mut [u8]>. This exposes the 'a lifetime.
  • Then, since std::io::Cursor<&'a mut [u8]> implements Default I can replace it with its default value by using std::mem::take. The fact that it implements Default is not needed, in general there are other ways to do this, but this one was simplier for the example
  • Now I own a std::io::Cursor<&'a mut [u8]>, and I can use .into_inner() to get the inner value, which is a &'a mut [u8].
  • Finally, I didn't have to restrict the 'a lifetime, so I can easily use it as if it were 'static. This is kind of similar to choosing 'a to be 'static when I originally called make_context()

The fundamental problem with exposing 'a is that by definition it outlives the Context, but at the same time you used it for a reference that can't outlive Context, because it points to data it owns. This is in general a problem with self-referential structs and one of the reasons why they're tricky. I suggest you to avoid using unsafe for self-referential structs like this one unless you know all the subtleties involved with them and consider them when exposing any kind of API on that struct.

3 Likes

Very interesting. How to mark the field dst_cursor : std::io::Cursor< &'a mut [ u8 ] > to state it cant outlive the Context?

You can't, that is the point. There's no valid lifetime annotation in Rust that can express a self-referential type.

1 Like