OwnedCell, self referential type

it would be great if someone who knows more about rust could tell me if this is valid: possum/src/owned_cell.rs at main · anacrolix/possum · GitHub. i made some changes to use it in a recursive type where there are recursive self references in a struct possum/src/handle.rs at 543f0591e8bf32c0938adcd86e49e6154bca7a5e · anacrolix/possum · GitHub.

OwnedCell pairs an owner and a dep that references its Deref. it ensures the dep is dropped before the owner. In the example above I recursively build dependent types, including at one point putting a value into an Rc and sharing it between different cells.

From a quick glance:

  • if you allow the dependent type to be created using a mutable reference to the owned data then it's unsound to later expose a shared reference to the owned data (your fn owner), since you have to consider the owner as mutably borrowed as long as the dependent value exists;

  • your try_make/try_make_mut are unsound, the caller can simply choose 'a = 'static (which is likely what's happening in practice when you use it!) and you'll just give safe code a reference with an invalid lifetime.

Self-referential types are extremely tricky, I would suggest you to avoid rolling your own solution and instead go for one of the battle tested crates like self_cell, yoke or ouroboros.

5 Likes

I would go further and actually suggest that you don't even use these libraries. There have been countless soundness issues discovered in them, some of which have been fixed, while others remain unaddressed.

If your code seems to require self-references, you should restructure its ownership architecture so it doesn't.

3 Likes

"Unaddressed" is incorrect. ouroboros, at least, has zero currently known soundness issues.

It's entirely reasonable to avoid this class of thing out of concern for undiscovered issues, but please don't mix that up with certainty of unfixed issues.

4 Likes

I didn't specifically say anywhere that ouroboros is the one has this sort of issues currently. These crates have a long and worrisome history of "it's unsound – we fixed it – oops well actually it's unsound in another way"; my advice was a general statement describing a general pattern.

The worrisome history indicates that the problem is hard, which applies to all such libraries. Unfixed soundness issues indicate poor maintenance, which has to do with the skill and available time of the maintainer of that particular library. It is appropriate to generalize over the former (for a given problem domain), but not the latter (unless you're considering using a different library by the same author).

That's a false dichotomy – both factors contribute. (I am not even going to address the issue of "appropriateness", it's irrelevant, as this is not an ethics or philosophy forum. If professional matters suggest refraining from a whole class of approaches, libraries, or even particular authors, then I am going to be very upfront about those. The only inappropriate behavior is papering over problems in the interest of not hurting some library author's feelings. If you put code on the internet for other people to see, and your code is full of bugs, then you can't expect people to just avert their eyes and pretend it isn't.)

Oh, and two more things:

  1. The crate history matters, as a long history of soundness issues which are fixed-but-not-really puts a strong prior on "there'll be more subtle soundness problems later on", even if some particular piece of code doesn't currently have known holes.
  2. For this reason, discussing whether bugs stem from the inherent complexity of the problem or due to "poor maintenance" (or in general, a poor solution) is unimportant. The important point is that self-reference as a pattern (however bad or good one particular implementation may be) is overall a lot more dangerous than not doing self-referencing, so avoiding it is a good way to improve the probability of correctness altogether.

Okay so if it's hard to get right, what's the alternative?

I want to package up temporaries so I can hand references off to C (and Go). I don't want to expose the entire API, and it wouldn't help either since I'd have to break lifetimes to return handles to C. How could I build the dependents when I know the inputs (owners) are safe to move around?

I did just recently add the owner access. I think it's fine in my particular case but I can remove it and work around it, thank you.

For FFI you would normally use raw pointers instead of references. And with FFI of course you're completely responsible for ensuring safety according to Rust's rules. If you haven't already, see:

https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html?highlight=FFI#calling-an-unsafe-function-or-method

Here's the main part of the implementation that's affected: possum/src/c_api/ext_fns/handle.rs at main · anacrolix/possum · GitHub

I pass raw pointers, but to minimize those pointers I combine stuff into the self-referential structs. The alternative seems to be having pointers to every single intermediate and exposing that in the C API. I don't see how that improves things, since I still need to throw away lifetimes on those pointers to export them to C.

What I'm trying to do (and correct me if there's a better way), is group up the resources where I can, and when resources pointed to from C depend on other resources that can also be pointed to externally, they are ref-counted on the Rust side.

Incidentally I also added ref counting on the Go side. I haven't figured out if you can get away with just ref counting on one side vs the other: possum/go/handle.go at main · anacrolix/possum · GitHub

I think for what it's worth I started with one of the libraries and couldn't figure out how to use it. So I looked into the internals and pulled out the useful bits (mainly just how they use StableDeref and handle the pointers).

Thank you, this was the most helpful comment so far.

I think I was able to reproduce the owner issue, I made a test case with miri at someone's suggestion.

Would a fix be to not expose owner() -> &O if a mutable reference was used to construct the dependent?

To have this exposed on a conditional basis, would I create a new type say MutOwnedCell that doesn't have owner, or can I use some kind of constraint to expose owner()?

Here's the test case possum/src/owned_cell.rs at main · anacrolix/possum · GitHub, does that properly test it (yes it fails and I think for the reason you gave, but it's not easy to read miri's output).

There is no drawback to converting to raw pointers when you know they will no longer be accessed on the Rust side. I strongly recommend against a self-referential struct simply to avoid raw pointers, since you'll need them anyway.

I don't follow. Even if they're not followed on the Rust side, on the C side I still have to do the equivalent of Drop (calling a function with the pointer then promising not to use it again), and maintaining liveness of resources in the correct order. Combining resources where I can into a single pointer on the C side minimizes the bookwork, and lets me do more checks on the Rust side.

Yes

Yes, ultimately you'll need two different types for this.


Btw a good example to not follow is the owning_ref crate. It was a popular self-referential crate that ultimately was riddled by unsound issues. You could look at those issues and see how many you can reproduce with your library.

Sorry, I misunderstood.