Experimental safe-to-use proc-macro-free self-referential structs in stable Rust?

Recently ran into a situation where self referential structs would have been a nice solution, and while I found a decent amount of discussion around the topic I didn't find a satisfying solution. So begrudgingly I unpacked to unsafe hammer and got to work.

https://github.com/Voultapher/once_self_cell

I'd really appreciate any feedback from API to soundness. Note, the documentation is still very rudimentary and I plan to properly document all functions and traits. But before that, it would be great if some more-experienced people than me, could take a look at the unsafe in https://github.com/Voultapher/once_self_cell/blob/main/src/once_self_cell.rs, going through the nomicon I couldn't find a way this could be used to trigger UB, and so far miri agrees with me https://github.com/Voultapher/once_self_cell/runs/1552899668.

One of the core motivations for this was that rental is heavy to compile and no longer maintained, while this is pretty minimal:

Compiling once_cell v1.5.2
Compiling once_self_cell v0.1.0
Completed once_cell v1.5.2 in 0.9s
Completed once_self_cell v0.1.0 in 0.3s

Also note, I've posted this before here Safe-to-use macro-free self-referential structs in stable Rust? - libs - Rust Internals and already incorporated their feedback.

Since not stated, it seems the conclusion from dtolnay's example is that get_or_init_dependent's signature does not enforce that the closure return value is derived from the owner.

Maybe a way to think about redesigning the API is to turn it around and instead of assuming a cooperating user (i.e a "good path"), assume the caller is actively trying to break your rules :slight_smile:, make a signature that enforces the necessary facts. I'm not sure we can enforce what you need. For references we have closures of kind Fn(&Owner) -> &T (forall lifetimes, i.e no external lifetime). Maybe start from there and generalize to your Ast if possible with GAT or an unsafe trait (??).

It also all seems a bit similar to owning_ref.

2 Likes

Just published a new version, where get_or_init_dependent only accepts function pointers, that way there ought to be no way to leak stack borrows now.

Also ran into a little hurdle while using try_build https://github.com/Voultapher/once_self_cell/runs/1558420979 the closure path slashes show up as false positive on windows.

Regarding owning_ref, I remember finding it while looking for solutions, and while it came probably closest only to rental, I already had a dozen nested types with references, so using OwningRef would have been a lot of hassle. I'm want a solution that works on stable Rust with existing lifetime annotations.

Regarding alternatives for rental: did you ever take a look at ouroboros. Not at all macro-free, but it is maintained.

Hmm, there are multiple effects at play here. Any idea how that can be prevented, short of just marking get_or_init_dependent unsafe?

One big problem here is that assert_eq!(type_name::<DependentStatic>(), type_name::<Dependent>()); is true even if DependentStatic and Dependent differ by lifetime. You need them to have the same lifetime or someone will be able to get something with a bigger lifetime or put something with a smaller lifetime.

You don't necessarily need to use it, just take inspiration

No more borrows for you :stuck_out_tongue:. Jokes aside, I just published a new version that should makes this sort of bug impossible. Please try breaking it again.

Well, the whole point is to have something expressible during struct declaration that doesn't involve an outside lifetime. The assert_eq trick is all about faking a sort of higher kinded lifetime.

FYI, Miri only looks at what you test. It doesn't magically construct a formal proof of correctness for your program. If you haven't found an example, that doesn't cause undefined behavior (UB), that doesn't mean there is no UB. It simply means, your tests didn't result in UB. Miri is also experimental and it cannot detect all kinds of UB, yet.

Soundness, that depends on unique lifetimes is extremely hard to get right. Personally, I find it easier to write sound multi-threaded unsafe APIs than dealing with lifetime-based unsafe APIs. :neutral_face:

1 Like

Fully agree with you. At the same time, the intended use case is imo quite easy to reason about and @dtolnay has to beat it with a stick to cause UB. I guess for now he's right and the function should be marked unsafe. I'd still like to attempt to find a way to expose this sort of API in a safe way, one thing that keeps showing up in his examples is calling get_or_init_dependent with 2 different make_dependent functions. If somehow we could limit passing in the function to only once. Actually I've got an idea.

I recommend you to take a look at the LCell of the qcell crate. Any other lifetime-based approach is most likely unsound.

Taking a quick look, it does not seem to meet my wish for defining a struct without 'leaking' the contained lifetime.

Thanks for all your effort. Yes I put a disclaimer at the top of the README. Please try out v 0.5.0, not claiming it's fully sound but at least you'll have to try a new approach for breaking it.

use once_self_cell::sync::OnceSelfCell;
use std::sync::Arc;

fn main() {
    let v = Arc::new([(); 1000000]);
    let v = (|| {
        #[derive(Clone)]
        struct S(Arc<[()]>);
        S(v)
    })();
    fn f<T>(x: T) -> OnceSelfCell<T, T> {
        OnceSelfCell::new(x)
    }
    let x = f(v);
    x.get_or_init_dependent(|v| v.clone());
    let v: Arc<[i32]> = (|| {
        struct S(Arc<[i32]>);
        x.get_or_init_dependent::<S>(|_| panic!())
    })()
    .0
    .clone();

    println!("{}", v[100000]);
}
[1]    22740 segmentation fault (core dumped)  cargo run

Updated to 0.5.0 API:

use once_self_cell::sync::OnceSelfCell;
use std::sync::Arc;

fn main() {
    let v = Arc::new([(); 1000000]);
    let x = (|| {
        #[derive(Clone)]
        struct S(Arc<[()]>);
        let x = OnceSelfCell::<_, S>::new(S(v), |v| v.clone());
        x.get_or_init_dependent::<S>();
        x
    })();

    let v: Arc<[i32]> = (|| {
        struct S(Arc<[i32]>);
        x.get_or_init_dependent::<S>()
    })()
    .0
    .clone();

    println!("{}", v[100000]);
}
1 Like
use once_self_cell::{custom::OnceSelfCell, OnceCellCompatible};

fn main() {
    const X: (*mut u8, fn(*mut u8)) = (std::ptr::null_mut(), |_| ());
    struct S;
    impl OnceCellCompatible<(*mut u8, fn(*mut u8))> for S {
        fn new() -> Self {
            S
        }
        fn get(&self) -> Option<&(*mut u8, fn(*mut u8))> {
            Some(&X)
        }
        fn get_or_init<F>(&self, _: F) -> &(*mut u8, fn(*mut u8))
        where
            F: FnOnce() -> (*mut u8, fn(*mut u8)),
        {
            &X
        }
        fn take(&mut self) -> Option<(*mut u8, fn(*mut u8))> {
            Some(X)
        }
    }
    let x = OnceSelfCell::<_, u8, S>::new(0u8);
    let n = x.get_or_init_dependent(|v| *v);
    println!("{}", n);
}
[1]    26858 segmentation fault (core dumped)  cargo run

Updated to 0.5.0:

use once_self_cell::{custom::OnceSelfCell, OnceCellCompatible};

fn main() {
    const X: (*mut u8, fn(*mut u8)) = (std::ptr::null_mut(), |_| ());
    struct S;
    impl OnceCellCompatible<(*mut u8, fn(*mut u8))> for S {
        fn new() -> Self {
            S
        }
        fn get(&self) -> Option<&(*mut u8, fn(*mut u8))> {
            Some(&X)
        }
        fn get_or_init<F>(&self, _: F) -> &(*mut u8, fn(*mut u8))
        where
            F: FnOnce() -> (*mut u8, fn(*mut u8)),
        {
            &X
        }
        fn take(&mut self) -> Option<(*mut u8, fn(*mut u8))> {
            Some(X)
        }
    }
    let x = OnceSelfCell::<_, u8, S>::new(0u8, |v| *v);
    let n = x.get_or_init_dependent::<u8>();
    println!("{}", n);
}

To add some explanation:

std::any::type_name really is nothing more than a debug tool, or as its documentation puts it:

This is intended for diagnostic use.

The returned string must not be considered to be a unique identifier of a type as multiple types may map to the same type name.

Both S types in my code return a type_name of the form

"name_of_the_crate::main::{{closure}}::S"

As long as that's the case someone will be able to wrongly extend the lifetime of what you put in the cell, and that's unsound. Here's a much simplier example than the others, which shows exactly this problem and how it can be abused without all the other's complex fuckery:

use once_self_cell::sync::OnceSelfCell;

fn main() {
    let v: &'static str = {
        let c: OnceSelfCell<_, &'static str> = OnceSelfCell::new(".".repeat(4000), |s| s.as_str());
        *c.get_or_init_dependent()
    };
    println!("{}", v);
}

This produces an use-after-free bug, and you can also use it to keep 'static references to dropped stack variables.

ps: I've also notices all your crate's versions are still up on crates.io. You should yank them all

6 Likes

I don't agree that versions should be yanked just because there are known bugs. In this case an explanation that it's experimental seems appropriate.