It's everyone's favorite recurring topic: self-referential structs

Unsafe disclaimer: I'm not writing any code for a program or modifying code in production. I'm just hacking around to learn how memory access works under the hood. I promise not to use this information for bad.

I'm asking for a review of the code blocks below and a correction or validation of my stated assumptions.

Borrow checker won't allow construction of self-referential object

    struct Struct { view: &mut u8, stashed: u8 };
    let stashed = 1u8;
    let s = Struct { view: &mut stashed, stashed };

Lets thwart the borrow checker with a 'static lifetime trap

fn bad_self_referential_struct_moving_var_after_reference() {
    #[derive(Debug)]
    struct Struct { view: &'static mut u8, stashed: u8 };

    let mut stashed = 1u8;
    let val_ref: &'static mut u8 = unsafe { std::mem::transmute(&mut stashed) };
    let mut s = Struct { view: val_ref, stashed };

    dbg!(&s); //  Struct { view: 1, stashed: 1 }
    s.stashed += 1;
    dbg!(&s); // Struct { view: 1, stashed: 2 } !!view has wrong reference!!
}

The following bugs are what the checker was stopping

  • s.view references the wrong variable
  • stashed gets copied not moved into Struct { stashed: stashed so s.view at least doesn't reference garbage that doesn't exist
  • if we return s, then s.view will reference garbage because stashed var will get destroyed at the end of the fn

Let's try two-step initialization with the 'static lifetime trap

fn bad_self_referential_struct_with_second_step_init_for_ref() {
    #[derive(Debug)]
    struct Struct { view: Option<&'static mut u8>, stashed: u8 };

    let stashed = 1u8;
    let mut s = Struct { view: None, stashed };
    s.view = unsafe { std::mem::transmute(&mut s.stashed) };

    dbg!(&s); // Struct { view: Some(1), stashed: 1 }
    s.stashed += 1;
    dbg!(&s); // Struct { view: Some(2), stashed: 2 }
}

This appears to work with correct values printing. But we haven't moved s yet so lets test that.

    // .....
    // .....
    dbg!(&s); // Struct { view: Some(1), stashed: 1 }
    s.stashed += 1;
    dbg!(&s); // Struct { view: Some(2), stashed: 2 }

    // now lets move s
     let mut s = s;
    dbg!(&s); // Struct { view: Some(2), stashed: 2 }
    s.stashed += 1;
    dbg!(&s); // Struct { view: Some(2), stashed: 3 } // view has wrong reference

Quick notes about safe referencing with Box. To borrow the box and not borrow the inner value

  • &box
  • &mut box
    - box.borrow()
    - box.borrow_mut()

To get references to the inner box data

  • box.as_ref()
  • box.as_mut()
  • [edit+] box.borrow()
  • [edit+] box.borrow_mut()

I hope those are correct assumptions?

Lets try putting stashed on the heap so it doesn't get to move after it's referenced

fn maybe_bad_self_referential_struct_simple_with_transmute() {
    #[derive(Debug)]
    struct Struct { view: &'static mut u8, stashed: Box<u8> };

    let mut stashed = Box::new(1u8);
    let stashed_ref: &'static mut u8 = unsafe { std::mem::transmute(stashed.as_mut()) };

    let mut s = Struct { view: stashed_ref, stashed };
    dbg!(&s); // Struct { view: 1, stashed: 1 }
    *s.stashed += 1;
    dbg!(&s); // Struct { view: 2, stashed: 2 }

    let mut s = s;
    dbg!(&s); // Struct { view: 2, stashed: 2 }
    *s.stashed += 1;
    dbg!(&s); // Struct { view: 3, stashed: 3 }
}

This looks okay.. stashed_ref refers to the inner pointer of Box thanks to stashed.as_mut(). It's also moved-not-copied into s because it is a &mut T which doesn't implement Copy. stashed is also moved into Box. Drop order of the fields is correct. No garbage is being left behind. Seems like it's all fine now.

If I run Cargo +nightly miri run though

error: Undefined Behavior: trying to retag from <3054> for Unique permission at alloc1643[0x0], but that tag does not exist in the borrow stack for this location

help: <3054> was created by a Unique retag at offsets [0x0..0x1]
let mut s = Struct { view: stashed_ref, stashed };
    |                                ^^^^^^^^^^^
help: <3054> was later invalidated at offsets [0x0..0x1] by a Unique retag
let mut s = Struct { view: stashed_ref, stashed };

The the construction of a single threaded, non Sync Struct in maybe_bad_self_referential_struct_simple_with_transmute should be sound? Not worried about concurrent access yet. I make no assumptions about memory bugs like data races in a multi-threaded execution.

But miri complains because it shouldn't be sound for reasons unrelated to how program execution should occur? The issue is compilation, specifically optimization, right? If the compiler thinks you moved your referenced object, then the references to that object are allowed to be cleaned up.

Bonus: using box leak to do the same as above but with the same aliasing issue reported by miri

fn maybe_bad_self_referential_struct_simple_with_leak() {
    #[derive(Debug)]
    struct Struct { view: &'static mut u8, stashed: Box<u8> };

    let stashed = Box::new(1u8);

    let stashed_ref = Box::leak::<'static>(stashed);
    let stashed_ptr = stashed_ref as *mut _;

    let stashed = unsafe { Box::from_raw(stashed_ptr) };

    let mut s = Struct { view: stashed_ref, stashed };
    dbg!(&s); //  Struct { view: 1, stashed: 1 }
    *s.stashed += 1;
    dbg!(&s); //  Struct { view: 2, stashed: 2 }

    let mut s = s;
    dbg!(&s); //  Struct { view: 2, stashed: 2 }
    *s.stashed += 1;
    dbg!(&s); //  Struct { view: 3, stashed: 3 }
}

I looked around and discovered that there is a crate called aliasable that "addresses" the aliasing issue.

So I tried it

    struct Struct { view: &'static mut u8, stashed: AliasableBox<u8> };

    let mut stashed = AliasableBox::from_unique(Box::new(1u8));
    let stashed_ref = unsafe { std::mem::transmute(stashed.as_mut()) };

    let mut s = Struct { view: stashed_ref, stashed };

    dbg!(&s);
    s.stashed.add_assign(1);
    // dbg!(&s);

Miri wont report UB with this but if I uncomment the last line to read after write then it will report UB again. At the least, constructing the self-referential object no longer reports UB and that was the objective.

How the aliasable crate works and why miri reports UB when I mutate then read in this case is a mystery to me. Without AliasableBox, even commenting the last line in the other examples will still report UB, so AliasableBox is doing something. But I don't know what and I'm still learning so some insight would be appreciated.

Neither &box nor &mut box borrows the inner value. box.borrow(), box.borrow_mut(), box.as_ref(), and box.as_mut() are all equivalent to &*box and &mut *box, both of which borrow the inner value and invalidate existing references per the aliasing rules.

When you move the Box<u8>, you reassert that the box is a unique pointer to its underlying memory, and no other pointers from that point onward are used to access its memory. This aliasing requirement is indeed used for optimizations. The property is special to &mut references and Box. Thus, by replacing Box with AliasableBox, you remove this particular requirement, since AliasableBox is a wrapper over an ordinary raw NonNull pointer with no aliasing requirements.

(Box::from_raw() similarly reasserts uniqueness of the newly-created Box.)

When you mutate through stashed, you invalidate any unique &mut references created from it, including view. Miri no longer reports UB if you replace it with a raw *mut pointer, which is allowed to alias with other pointers:

use aliasable::boxed::AliasableBox;
use std::ptr::addr_of_mut;

fn with_raw_pointer_and_aliasable() {
    struct Struct {
        view: *mut u8,
        stashed: AliasableBox<u8>,
    }

    unsafe fn print(s: &Struct) {
        println!(
            "&s = Struct {{ view: {}, stashed: {} }}",
            *s.view, *s.stashed
        );
    }

    let mut stashed = AliasableBox::from_unique(Box::new(1u8));
    let stashed_ptr: *mut u8 = unsafe { std::mem::transmute_copy(&stashed) };

    let mut s = Struct {
        view: stashed_ptr,
        stashed,
    };

    unsafe { print(&s) };
    *s.stashed += 1;
    unsafe { print(&s) };
}
2 Likes

It does, but then you can't move or otherwise use the original object, and that includes destructors. And not all structures can be constructed thusly without unsafe. And this version couldn't be &mut. It's as limiting as a scoped API and I've never seen it seriously used.

(Given your typical forum question, it's just easier to say "Rust doesn't allow that asterisk" then to go down this path which is surely not what they want. I mention it here as, in contrast, you're exploring.)

@LegionMammal978 put it better than I would, but to emphasize, one you didn't list here and the main one in general is that aliasing something covered by a &mut (or owned by a Box) is UB. That rule is at the heart of Rust's memory and concurrency guarantees.

Alternatively put, once you "use" something, any exclusive references to it must no longer be valid.

Whether something is sound or UB or not is based on the Rust abstract machine. And at least so far, violating the &mut aliasing rule is considered instant UB.

There may be a cultural disconnect here, but I see your disclaimer and I'm not trying to discourage you from exploring at all, so I guess I'll just leave it at that.

3 Likes

I love the writeup in this thread of some important things that can go wrong with self-referential structs in Rust.

IMO the large-ish amount of things that can go wrong also supports the value of safely encapsuled macro based solutions such as ouroboros. They make sure that the borrowed field is boxed (the boxing being implicit is certainly a design decision that can be debated, but that's a minor point) so that none of the moving-related issues can occur, and they also don't allow you any mutable access to the borrowed owner field, so the issues that miri reported are avoided. The crate also avoids some subtle points about Box itself possibly asserting unique access whenever it's moved (as part of the struct being moved), by using the very aliasable same crate you mentioned.

And then of course, there's more subtle issues being addressed in the design of ouroboros that you didn't find yet, or that might only become relevant in different situations than the simple &u8 case, but need to be addressed for the crate to be sound in general.

E. g. one simple but potentially easy to miss point is that the drop order of fields in a struct is in declaration order, so putting the owning field first can be unsound if the borrowing field also has a type with a destructor. (On that note, ensuring dropping is handled correctly during construction of a panic occurs is also a relevant point to pay attention to.) Other more subtle issues can occur in the specific case where the borrowing type is both invariant and has a destructor, although that issue might also be somewhat specific to the kind of general API that ouroboros offers. In any case, the presence of subtle (and only later discovered) issued demonstrates another important advantage of using an existing crate for safely encapsulated self-refentialness, not unsafe code yourself, that is: existing (popular) crates will be better reviewed than a custom self-made solution.

2 Likes

@LegionMammal978 When you move the Box<u8>, you reassert that the box is a unique pointer to its underlying memory, and no other pointers from that point onward are used to access its memory.

I think I understand given

let boxed = Box::new(1);
let boxed_ref = &boxed;
let new_boxed = boxed;

boxed_ref is observed by the compiler to be a null reference. Borrowing rules would normally make this reference inaccessible.

@LegionMammal978 When you mutate through stashed, you invalidate any unique &mut references created from it, including view. Miri no longer reports UB if you replace it with a raw *mut pointer, which is allowed to alias with other pointers

If I understand this correctly, despite the transmute type punning, the compiler would still "observe" the null pointer and the optimizer would do the rest. There is no flag or compiler directive that would stop this for a particular reference? Messing up pointer access is fine, it can simply be reasoned about, but the compilation bit was a pretty a nasty pitfall for me to discover.

@quinedot There may be a cultural disconnect here, but I see your disclaimer and I'm not trying to discourage you from exploring at all, so I guess I'll just leave it at that.

No worries be as critical as would be helpful to teach concepts that are confusing for a beginner like me. The disclaimer and the work in the OP is to show that I am genuinely trying to learn.

@steffahn I have only skimmed that github issue so far but I'll look through it in the next couple days. I have been meaning to study the rental crate too as that appears to be one that got shutdown for soundness issues that never got resolved.

I already have such a structure that works well enough with all safe code by using Rc<RefCell>. Types don't scare me, but the world of memory and pointers is all new to me, hence this exercise outside my comfort zone.

Given what I have shown to learn in this thread and what I learned about Pinning shown in my previous thread, I came up with this.

#[repr(transparent)]
struct ViewWrapper<T> {
    view: NonNull<T>,
}

impl<T: Debug> fmt::Debug for ViewWrapper<T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let value = unsafe { self.view.as_ref() };
        f.debug_tuple("ViewWrapper")
            .field(value)
            .finish()
    }
}

impl<T> Deref for ViewWrapper<T> {
    type Target = T;

    fn deref(&self) -> &T {
        // SAFETY: The view field is guaranteed to point to the stashed field within
        // the same struct instance, and we guarantee that the struct will never be
        // moved with pinning
        unsafe { self.view.as_ref() }
    }
}

impl<T> DerefMut for ViewWrapper<T> {
    fn deref_mut(&mut self) -> &mut T {
        // SAFETY: The view field is guaranteed to point to the stashed field within
        // the same struct instance, and we guarantee that the struct will never be
        // moved thanks to pinning
        unsafe { self.view.as_mut() }
    }
}

#[derive(Debug)]
struct Struct<T> {
    view: ViewWrapper<T>,
    stashed: T,
    _pinned: PhantomPinned,
}

impl<T> Struct<T> {
    fn new(t: T) -> Pin<Box<Self>> {
        let mut boxed_struct = Box::pin(Struct {
            view: ViewWrapper {
                view: NonNull::dangling(),
            },
            stashed: t,
            _pinned: PhantomPinned,
        });

        // SAFETY: We can set the view field here since we have exclusive access
        // to the struct, and we guarantee that the struct will never be moved
        // again with Pin<Box<Struct>> and !Unpin from PhantomPinned
        unsafe {
            let view = NonNull::from(&mut boxed_struct.as_mut().get_unchecked_mut().stashed);
            boxed_struct.as_mut().get_unchecked_mut().view.view = view;
        }

        boxed_struct
    }
}

fn main() {
    let mut pinned_struct = Struct::new(1u8);

    unsafe {
        println!("{:?}", pinned_struct);
        *pinned_struct.as_mut().get_unchecked_mut().view = 42;
        println!("{:?}", pinned_struct);
    }
}

No longer am I boxing the stashed field. I box the container as it's a better api. I could have need to box Struct for dynamic dispatch and if it also boxes it's stashed field then that's double the heap allocation. There are still issues with Struct such as no real pub/priv protection of the stashed field. That can never be touched until dropping. I don't know how dropping would be done safely here. ViewWrapper is just a convenience wrapper for easier deref. Current version can only construct a pinned box, but I think I can alternatively construct a simple pinned version without box and just expose an api that reflects that it is unmovable.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.