Experimental safe-to-use proc-macro-free self-referential structs in stable Rust?

Recently ran into a situation where self referential structs would have been a nice solution, and while I found a decent amount of discussion around the topic I didn't find a satisfying solution. So begrudgingly I unpacked to unsafe hammer and got to work.

I'd really appreciate any feedback from API to soundness. Note, the documentation is still very rudimentary and I plan to properly document all functions and traits. But before that, it would be great if some more-experienced people than me, could take a look at the unsafe in https://github.com/Voultapher/once_self_cell/blob/main/src/once_self_cell.rs, going through the nomicon I couldn't find a way this could be used to trigger UB, and so far miri agrees with me https://github.com/Voultapher/once_self_cell/runs/1552899668.

One of the core motivations for this was that rental is heavy to compile and no longer maintained, while this is pretty minimal:

Compiling once_cell v1.5.2
Compiling once_self_cell v0.1.0
Completed once_cell v1.5.2 in 0.9s
Completed once_self_cell v0.1.0 in 0.3s

Also note, I've posted this before here https://internals.rust-lang.org/t/safe-to-use-macro-free-self-referential-structs-in-stable-rust/13555 and already incorporated their feedback.

Something is going wrong:

use once_self_cell::sync::OnceSelfCell;

fn main() {
    let c: OnceSelfCell<(), &String> = OnceSelfCell::new(());
    let mut s = Ok(".".to_owned());
    c.get_or_init_dependent(|_| s.as_ref().unwrap());
    let uh: &&String = c.get_or_init_dependent(|_| panic!());
    let p = s.as_ref().unwrap().as_ptr();
    let _ = std::mem::replace(&mut s, Err([p; 3]));
    println!("{}", &uh[..100]);
}
$ cargo run
øbåU!hqBþQø^ÁaåU±
^[[?1;2c^[[?1;2c
$ cargo run --release
ú
 £ÊU!øÿQ q¡ÊU±
4 Likes

Since not stated, it seems the conclusion from dtolnay's example is that get_or_init_dependent's signature does not enforce that the closure return value is derived from the owner.

Maybe a way to think about redesigning the API is to turn it around and instead of assuming a cooperating user (i.e a "good path"), assume the caller is actively trying to break your rules :slight_smile:, make a signature that enforces the necessary facts. I'm not sure we can enforce what you need. For references we have closures of kind Fn(&Owner) -> &T (forall lifetimes, i.e no external lifetime). Maybe start from there and generalize to your Ast if possible with GAT or an unsafe trait (??).

It also all seems a bit similar to owning_ref.

2 Likes

Just published a new version, where get_or_init_dependent only accepts function pointers, that way there ought to be no way to leak stack borrows now.

Also ran into a little hurdle while using try_build https://github.com/Voultapher/once_self_cell/runs/1558420979 the closure path slashes show up as false positive on windows.

Regarding owning_ref, I remember finding it while looking for solutions, and while it came probably closest only to rental, I already had a dozen nested types with references, so using OwningRef would have been a lot of hassle. I'm want a solution that works on stable Rust with existing lifetime annotations.

Regarding alternatives for rental: did you ever take a look at ouroboros. Not at all macro-free, but it is maintained.

Here is another UB crash even with the new get_or_init_dependent signature:

use once_self_cell::sync::OnceSelfCell;
use std::cell::Cell;

fn main() {
    let mut s = Ok(".".to_owned());
    let c: OnceSelfCell<_, Cell<&String>> = OnceSelfCell::new(&s);
    c.get_or_init_dependent(|s| Cell::new(s.as_ref().unwrap()));
    let cell = c.get_or_init_dependent(|_| Cell::new(&STRING));
    static STRING: String = String::new();
    let uh = cell.replace(&STRING);
    drop(c);
    s = Err(());
    let _ = s;
    println!("{}", uh);
}
$ cargo run
Segmentation fault (core dumped)
2 Likes

Hmm, there are multiple effects at play here. Any idea how that can be prevented, short of just marking get_or_init_dependent unsafe?

One big problem here is that assert_eq!(type_name::<DependentStatic>(), type_name::<Dependent>()); is true even if DependentStatic and Dependent differ by lifetime. You need them to have the same lifetime or someone will be able to get something with a bigger lifetime or put something with a smaller lifetime.

You don't necessarily need to use it, just take inspiration

No more borrows for you :stuck_out_tongue:. Jokes aside, I just published a new version that should makes this sort of bug impossible. Please try breaking it again.

Well, the whole point is to have something expressible during struct declaration that doesn't involve an outside lifetime. The assert_eq trick is all about faking a sort of higher kinded lifetime.

Here is a segfault still:

use once_self_cell::sync::OnceSelfCell;

fn main() {
    type First = for<'a, 'b> fn(&'a String, &'b String) -> &'a String;
    fn f<'a>(s: &'a String, _: &String) -> &'a String { s }
    type Second = for<'a, 'b> fn(&'a String, &'b String) -> &'b String;
    fn g<'b>(_: &String, s: &'b String) -> &'b String { s }
    let c: OnceSelfCell<_, First> = OnceSelfCell::new(());
    c.get_or_init_dependent(|_| f as First);
    let first = c.get_or_init_dependent(|_| g as Second);
    let mut s = Ok(".".repeat(100));
    let empty = String::new();
    let uh = first(s.as_ref().unwrap(), &empty);
    s = Err([1u8; std::mem::size_of::<String>()]);
    println!("{}{:?}", uh, s);
}
Segmentation fault (core dumped)

I think at this point it would be good to put a disclaimer in the readme that so far we have no reason to believe this approach is sound or can be made to be sound.

4 Likes

FYI, Miri only looks at what you test. It doesn't magically construct a formal proof of correctness for your program. If you haven't found an example, that doesn't cause undefined behavior (UB), that doesn't mean there is no UB. It simply means, your tests didn't result in UB. Miri is also experimental and it cannot detect all kinds of UB, yet.

Soundness, that depends on unique lifetimes is extremely hard to get right. Personally, I find it easier to write sound multi-threaded unsafe APIs than dealing with lifetime-based unsafe APIs. :neutral_face:

1 Like

Fully agree with you. At the same time, the intended use case is imo quite easy to reason about and @dtolnay has to beat it with a stick to cause UB. I guess for now he's right and the function should be marked unsafe. I'd still like to attempt to find a way to expose this sort of API in a safe way, one thing that keeps showing up in his examples is calling get_or_init_dependent with 2 different make_dependent functions. If somehow we could limit passing in the function to only once. Actually I've got an idea.

I recommend you to take a look at the LCell of the qcell crate. Any other lifetime-based approach is most likely unsound.

Taking a quick look, it does not seem to meet my wish for defining a struct without 'leaking' the contained lifetime.

Thanks for all your effort. Yes I put a disclaimer at the top of the README. Please try out v 0.5.0, not claiming it's fully sound but at least you'll have to try a new approach for breaking it.

use once_self_cell::sync::OnceSelfCell;
use std::sync::Arc;

fn main() {
    let v = Arc::new([(); 1000000]);
    let v = (|| {
        #[derive(Clone)]
        struct S(Arc<[()]>);
        S(v)
    })();
    fn f<T>(x: T) -> OnceSelfCell<T, T> {
        OnceSelfCell::new(x)
    }
    let x = f(v);
    x.get_or_init_dependent(|v| v.clone());
    let v: Arc<[i32]> = (|| {
        struct S(Arc<[i32]>);
        x.get_or_init_dependent::<S>(|_| panic!())
    })()
    .0
    .clone();

    println!("{}", v[100000]);
}
[1]    22740 segmentation fault (core dumped)  cargo run

Updated to 0.5.0 API:

use once_self_cell::sync::OnceSelfCell;
use std::sync::Arc;

fn main() {
    let v = Arc::new([(); 1000000]);
    let x = (|| {
        #[derive(Clone)]
        struct S(Arc<[()]>);
        let x = OnceSelfCell::<_, S>::new(S(v), |v| v.clone());
        x.get_or_init_dependent::<S>();
        x
    })();

    let v: Arc<[i32]> = (|| {
        struct S(Arc<[i32]>);
        x.get_or_init_dependent::<S>()
    })()
    .0
    .clone();

    println!("{}", v[100000]);
}
1 Like
use once_self_cell::{custom::OnceSelfCell, OnceCellCompatible};

fn main() {
    const X: (*mut u8, fn(*mut u8)) = (std::ptr::null_mut(), |_| ());
    struct S;
    impl OnceCellCompatible<(*mut u8, fn(*mut u8))> for S {
        fn new() -> Self {
            S
        }
        fn get(&self) -> Option<&(*mut u8, fn(*mut u8))> {
            Some(&X)
        }
        fn get_or_init<F>(&self, _: F) -> &(*mut u8, fn(*mut u8))
        where
            F: FnOnce() -> (*mut u8, fn(*mut u8)),
        {
            &X
        }
        fn take(&mut self) -> Option<(*mut u8, fn(*mut u8))> {
            Some(X)
        }
    }
    let x = OnceSelfCell::<_, u8, S>::new(0u8);
    let n = x.get_or_init_dependent(|v| *v);
    println!("{}", n);
}
[1]    26858 segmentation fault (core dumped)  cargo run

Updated to 0.5.0:

use once_self_cell::{custom::OnceSelfCell, OnceCellCompatible};

fn main() {
    const X: (*mut u8, fn(*mut u8)) = (std::ptr::null_mut(), |_| ());
    struct S;
    impl OnceCellCompatible<(*mut u8, fn(*mut u8))> for S {
        fn new() -> Self {
            S
        }
        fn get(&self) -> Option<&(*mut u8, fn(*mut u8))> {
            Some(&X)
        }
        fn get_or_init<F>(&self, _: F) -> &(*mut u8, fn(*mut u8))
        where
            F: FnOnce() -> (*mut u8, fn(*mut u8)),
        {
            &X
        }
        fn take(&mut self) -> Option<(*mut u8, fn(*mut u8))> {
            Some(X)
        }
    }
    let x = OnceSelfCell::<_, u8, S>::new(0u8, |v| *v);
    let n = x.get_or_init_dependent::<u8>();
    println!("{}", n);
}