Experimental safe-to-use proc-macro-free self-referential structs in stable Rust?

To add some explanation:

std::any::type_name really is nothing more than a debug tool, or as its documentation puts it:

This is intended for diagnostic use.

The returned string must not be considered to be a unique identifier of a type as multiple types may map to the same type name.

Both S types in my code return a type_name of the form

"name_of_the_crate::main::{{closure}}::S"

As long as that's the case someone will be able to wrongly extend the lifetime of what you put in the cell, and that's unsound. Here's a much simplier example than the others, which shows exactly this problem and how it can be abused without all the other's complex fuckery:

use once_self_cell::sync::OnceSelfCell;

fn main() {
    let v: &'static str = {
        let c: OnceSelfCell<_, &'static str> = OnceSelfCell::new(".".repeat(4000), |s| s.as_str());
        *c.get_or_init_dependent()
    };
    println!("{}", v);
}

This produces an use-after-free bug, and you can also use it to keep 'static references to dropped stack variables.

ps: I've also notices all your crate's versions are still up on crates.io. You should yank them all

6 Likes

I don't agree that versions should be yanked just because there are known bugs. In this case an explanation that it's experimental seems appropriate.

Known bugs is an understatement, this is a soundness hole and even really simple to use.

The fact that it's experimental doesn't mean someone won't use it by mistake. For example not everyone reads the description on crates.io. If someone stops at the search page all they'll see will be "Safe-to-use macro-free self-referential structs in stable Rust.", not exactly an "experimental unsound crate". Even worse if they're searching on docs.rs, where opening the link will lead to the docs, entirely skipping the crates.io description.

If op 100%, definitely, don't want to yank them, at least it should change every description it can to reflect the fact that it's experimental and unsound, and add the unsound versions on the rustsec advisory db.

Edit: didn't realized you weren't op

3 Likes

@bluss @SkiFire13 personally I'm fine with yanking them, with the minuscule amount of users I doubt I'm breaking much by that. At the same time I'd appreciate it if this thread stayed focused on the original topic. I have a new idea that looks very promising on the way.

Ok, here go, attempt number 6, now without type_name hackery. @SkiFire13 I've yanked the previous versions and made it more clear that the project is still experimental.

At a first look I think it should be sound (the famous last words...)

Some problems/downsides I noticed:

  • Not being able to use a custom function is a bit restraining, it makes a lot of things a lot more verbose. I guess since now Dependent is fixed you could take a fn(&'a Owner) -> $Dependent or possibly even an FnOnce.
  • You accidentally made the comma after the Dependent type required

Regarding taking a 'make' function in get_or_init_dependent, I'm worried someone clever will figure out a way to abuse that to make it unsound by calling it with different one's, I thought about requiring a function in the macro definition to avoid having to store it, but From and Into seemed like a natural choice here that play nice with other mechanisms.

  • You accidentally made the comma after the Dependent type required

You mean in the tests? No that's required because of macro rules otherwise not knowing when to stop parsing, because it can take 0-N meta attributes.

Since now you know the type at compile time I think using a fn(&'a $Owner) -> $Dependent or for<'b> fn(&'b $Owner) -> $Dependent (or even an impl FnOnce) should be fine.

The problem with From and Into is that you can't implement them for third party structs. For example I can't make a struct that stores a String and a str::Chars<'_> without creating newtypes and boilerplate From implementations.

No, I mean that I can't write unsync_once_self_cell!(NewStructName, Owner, Dependent<'_>);, I always need to add a trailing comma after Dependent<'_>, even if I don't specify any $StructMeta.

No, I mean that I can't write unsync_once_self_cell!(NewStructName, Owner, Dependent<'_>); , I always need to add a trailing comma after Dependent<'_> , even if I don't specify any $StructMeta .

Thought that was a macro quirk, how would I define the macro in a way so that doesn't happen?

Since now you know the type at compile time

Do I though? The docs describe it as 'Inferred anonymous lifetime ; asks compiler to 'figure it out' if obvious.', will that always resolve to the same lifetime, or could it not resolve to different lifetimes depending on what it is called with? In my head the macro is doing a sort of text replacement, a function with that annotation is generic on the lifetime, so it could have different implementations, at least that's my guess.

@dtolnay hi David, this last iteration seems promising. @SkiFire13 thinks its safe-to-use, I'd really appreciate a second opinion before removing the experimental warnings again.

You could try with ($StructName:ident, $Owner:ty, $Dependent:ty $(, $StructMeta:meta)* $(,)?) or ($StructName:ident, $Owner:ty, $Dependent:ty $(, $($StructMeta:meta),*)?)

It should always resolve to the same lifetime, it's just that you don't have to name it in the function definition.

Also on a second thought, this almost looks like owning_ref, except the function doesn't have to return a reference but can return a struct, and the dependent value is calculated when first access occurs instead of when the self-referential struct is created. You could try to aim for a owning_ref-like API, while also lifting the requirement of returning a reference through the use of macros.

1 Like

Ah neat! Thanks for the help.

Regarding owning_ref my understanding is that it requires you to design your types from the ground up with it in mind, this works with existing lifetime dependent types out of the box without leaking the lifetime. Is that what you mean with 'doesn't have to return a reference but can return a struct'?

What I mean is that owning_ref takes a FnOnce(&T) -> &U to generate the self-referential part, while your crate actually allows for a fn(&'a T) -> U<'a> which isn't possible with generics because rust lacks HKT.