New crate: qcell, a statically-checked alternative to RefCell

#1

There was some discussion on this on Reddit, so I’ve put together two possible implementations, and linked to a third (GhostCell). Please raise issues if you see anything unsafe or broken.

Briefly, this allows you to statically check mutable access to data behind Rc and other similar types. Where RefCell would give a runtime panic, these types give you a compile error. The cost is that borrowing isn’t as flexible. But if you can fit the pattern that’s allowed, then it is a huge win for code safety IMHO.

This is my first crate! Finally this year I have some time to contribute something to Rust.

8 Likes

#2

Cool, I am a bit far into my current project to see if this would fit in it but I am definitely going to check it out.

0 Likes

#3

That is a really cool feature. I wonder if it is possible to improve the ergonomics of creating an passing things. Perhaps by creating a new interface and internalizing the RC? For Tcell, a macro might help.

It’s an impressive testament to the type system that a seemingly impossible task has been able to be performed three different ways at compile time.

2 Likes

#4

I had a look at some ways of improving the ergonomics for use, e.g. using owner[cell], but the operator traits just don’t seem to allow what is needed with the lifetimes. At least, I couldn’t see a way to make it work.

Yes, it would certainly be possible to make an RcTCell type for example, but there are other uses for these cells apart from Rc. So they are kind of interchangeable parts.

You could also have other specialist versions of Rc, e.g. one that allows only strong references, or one that has only one strong reference plus many weak references (impossible to create cycles). Personally I’m planning to see what patterns emerge in my projects before trying to make any special versions.

0 Likes

#5

Interesting idea, to split unique referencing and dropping logics!!

Although I still see it as a potential internal feature for stuff like Rc or other Gc stuff, as you said, since I can’t think of other use cases.

  • You should check for overflow on your monotonic id (and panic), else it would be possible within safe code to get undefined behavior after 2^32 owners.

    impl AQCellOwner {
        fn new () -> Self
        {
            static AQCELLOWNER_ID: AtomicUsize = ...;
    
            let id: usize = AQCELLOWNER_ID.fetch_add(1, Ordering::Relaxed);
            assert!(id < core::u32::MAX, "Monotonic id overflow");
            Self { id }
        }
    
  • Ordering::Relaxed should suffice for your use case (the monotonic property of the id is too restrictive; injectivity suffices, which means that the ordering does not need to be sequentially consistent);

  • Finally, within a single threaded model (e.g., when targetting Rc instead of Arc), Cell<u32> should be used instead of an AtomicUsize

    struct QCellOwner {
        id: u32,
        not_send: PhantomData<&'static Cell<u32>>,
    }
    
    impl QCellOwner {
        fn new () -> Self
        {
            thread_local! {
                static QCELLOWNER_ID: Cell<u32> = Cell::new(0);
            }
    
            let id: u32 = QCELLOWNER_ID.with(Cell::get);
            QCELLOWNER_ID.with(|slf| slf.set(
                id.checked_add(1)
                  .expect("Monotonic id overflow")
            ));
            Self { id, not_send: PhantomData }
        }
    
1 Like

#6

You need to think about how this is going to be used. Every piece of code that does a borrow from a cell must pick the right owner to use. There is going to be just one owner that can do that job. Perhaps maximum there would be a few owners active in each area of code, perhaps using a different one each different type of data. So that is going to be hard-coded, i.e. a static choice. The ID just provides a sanity check. As soon as the code has received any coverage at all, the check will be essentially redundant. In fact I was initially thinking of having a feature option to make the ID () in non-debug code, to use once it passes basic tests.

The only case where you’re likely to get anywhere near 2^32 active at once, is if you’re trying to make one owner for each cell, which isn’t how QCell is supposed to be used at all. Then you wouldn’t be storing them directly in your context structure with fixed names – they would have to be in a container. This is really not useful at all, as now you’re limited by the borrowing supported by the container, i.e. one at a time by default, or more with splitting. Also you’d need some kind of key to link the cell and the owner. Once you’ve reached that point QCell is no longer the right tool for the job, because it would be better to use that key as the owner ID instead.

So in the pattern of use QCell would have, 2^32 is plenty. Compared to an accidental misuse of RefCell, where perhaps it works fine 99.9% of the time, and then panic 0.1% of the time due to two events happening at the same time (or whatever), this instead will panic immediately if the user chose the wrong owner in the code, or at least it will in 2^32-1 times out of 2^32, i.e. it will panic in 99.99…% of the times, which is more than enough to detect the bug.

I know that theoretically it could give unsafe behaviour, but the exploit would have to come from within the source code, the very same source code that is using the QCell. So not even a library user could trigger this unsafety if the library was using QCell correctly (which if it wasn’t would panic very early in coverage testing). So I don’t think this is a practical risk. The wrapping u32 behaviour was intentional.

Shall I document all the reasoning behind that design decision in the code?

(I’ll look at the other points just now)

0 Likes

#7

This is not my strong area. Looking at the docs now, yes Relaxed would probably be okay, but since I wasn’t certain I went for the safest choice. The explanations aren’t that clear. The name “Monotonic” in LLVM suggests that this is the right choice. I’ll read up some more on it, and then I’ll probably put this change in, thanks.

I’d rather avoid thread locals because they add a lot of complexity on some platforms. There’s not a strong enough reason to use them, to limit the reach of the crate like that.

0 Likes

#8

Well, this is the problem. The very key of Rust is that no programmer should be able to write Undefined Behavior (unless they use unsafe), even when they misuse libraries (e.g. your library). When that’s the case, it is obviously okay to crash, to loop indefinitely or to give extraneous results; but only as long as there is no UB.

I still think that someone could build an high-level library built on top of QCells, and end up writing something along these lines:

use ::qcell::{
    QCell,
    QCellOwner,
};

fn entangle<T> (value: T) -> (QCellOwner, QCell<T>)
{
    let owner = QCellOwner::
        new()
    ;
    let value = QCell::new(&owner, value);
    (owner, value)
}

It might dumb and inefficient and not the way you had intended the lib to be used (I think that there could definitely be use cases for the entangle function), but since you have not marked any of the API functions as unsafe, users are allowed to write this code; any undefined behavior would not be their responsibility but yours (that’s what unsafe fn vs fn means).

And obviously once a function such as entangle becomes popular it is only a matter of time until someone writes code equivalent to the following sketchy run function:

fn run (count: usize)
{
    let (mut first_owner, value_1) = entangle(42);
    (0 .. count)
        .for_each(|_| { entangle(1); })
    ;
    let (mut second_owner, value_2) = entangle(27);
    dbg!(first_owner.get(&value_1));
    dbg!(second_owner.get(&value_2));
    swap(
        first_owner.get_mut(&value_1),
        second_owner.get_mut(&value_1), // wops, wrong copy_paste
    );
}

fn swap (x: &mut usize, y: &mut usize)
{
    let (old_x, old_y) = (*x, *y);
    *x ^= *y;
    *y ^= *x;
    *x ^= *y;
    assert!(
        *x == old_y && *y == old_x,
        "Only UNDEFINED BEHAVIOR could break this assertion",
    );
}

There is obviously an error in the code, here, since accessing a variable with two different owners is just plain dumb and absurd. But since the users have not written any unsafe code, Rust guarantees that all this will do is either panic!, loop indefinitely, or give extraneous results.

Given your get_mut implementation with the id check, the code does panic 99.99999% of the time, and that’s great … Except for that not being 100%. If count == core::u32::MAX, then the code is UB!

  • Playground (it uses u16 instead of u32 for faster demo purposes)

Now, if you feel like you wouldn’t like the overflow check, then you can add an unsafe fn new_unchecked constructor for QCellOwner, so that the responsibility of misuse falls onto the user instead of you.

That is a very good attitude and I strongly encourage that :smiley:

Now, your program’s invariant is based on the unicity of the owners’ ids, which can indeed be ensured with a (strictly) monotonic function. But when we think about it, all we need is a strictly injective function; for instance a function that randomly sampled integers and checked their not having been already sampled using a memoized set would be a valid (although stupidly slow) implementation. Monotonic counters are used for injectivity since they are the simplest implementation, but the ordering is actually not important in your case.

  1. I never said not to use AtomicUsize, I suggested you give users a choice by having two distinc QCellOwner-QCell pairs:

    • QCellOwner : !Send + !Sync (with QCell):

      • thread_local! { static Cell<u32> } in QCellOwner::new(),

      • PhantomData<&'static Cell<u32> as one of QCellOwner fields,

      • (as shown in my previous post);

    • AQCellOwner : Send + Sync (with AQCell):

      • static AtomicUsize in AQCellOwner::new(),

      • (+ a PhantomData<&'static AtomicUsize> field for consistency if you want, alhough it would give no constraints).

  2. by the way your QCell is not Sync, which means it can still not be used with an Arc to be sent across threads, you should fix that:

    • unsafe impl<T : Send + Sync> Sync for AQCell<T> {}

    • unsafe impl<T : Send> Send for AQCell<T> {} (in case you plan on adding an .into_inner() method for {A,}QCell)

1 Like

#9

What you’re talking about is protecting against malicious programmers getting unsafe code past a reviewer using QCell. If the aim is to protect against that, then yes I need to fix this hole. The chance of a non-malicious programmer finding this unsafe behaviour by accident is so close to zero it makes no difference. No actual non-malicious code would ever fall into this trap, because in your example all the count values except 2^32-1 would have caused a panic, and so someone would have fixed the bug.

Considering that we might create and destroy owners often, e.g. in a function that perhaps builds a temporary graph of Rcs and then destroys them, we could potentially exhaust 2^32 in a single run of an executable. So I think this means switching to u64 for the owner ID, and panicking on exhaustion, or else as you say keeping a set of active IDs.

So the question is whether protecting against malicious coders using QCell to sneak unsafe behaviour past a reviewer is worth this extra runtime cost (e.g. 4 bytes per cell extra). I can’t use usize because it wouldn’t be big enough on 32-bit platforms. Atomic u64 isn’t stable yet, and even then might be inefficient or unsupported on 32-bit platforms (e.g. be a mutex). Or use a set. So I don’t think there is an efficient solution available.

Considering that this hole has absolutely no impact on non-malicious coders, it seems better to just document it.

Alternatively can you suggest an efficient solution?

0 Likes

#10

You could do as @Yandros suggested and have a safe slow checked version as well as a unsafe fast unchecked version. This way if performance becomes an issue, then the user can swap over to the unsafe version. But the safe default is checked and is safe even in the presense of malicious users.

1 Like

#11

Okay, I’ll do a 100% safe but slow version. Having an unsafe version wouldn’t be a simple change-over, though, as the caller would have to wrap all the calls with unsafe{} all through their source.

What is there right now is 99.999999% safe in theory, but in practice 100% safe as soon as someone has done even a little bit of testing. Even fuzz-testing could not trigger the unsafety hole once the caller’s bugs are fixed.

However, I will do as suggested and make it 100% safe.

1 Like

#12

Using ::num-traits, I have managed to make the Id type generic over integers, so that the user can choose the size of the id field (u8, u16, u32, usize). But until AtomicU64 gets stabilized, support for 32-bit architectures will be poor.

EDIT: I completely forgot about the static, which prevents us from using generics;

0 Likes

#13

You can also add an attribute unsafe-no-monotonic-overflow-checks in your crate that disables the overflow check:

# Cargo.toml
[features]
default = []
unsafe-no-monotonic-overflow-checks = []

And then guard the overflow check in a #[cfg(not(feature = "unsafe-no-monotonic-overflow-checks"))].

As long as the feature is not there by default and contains the word unsafe in it, it is an acceptable thing to do, imho.

0 Likes

#14

Looking at the playground link, I think on a 32-bit architecture the ID would wrap undetected in this code if a u64 ID was used, so the code as written would still be unsafe.

Also, I think your code is sharing a single COUNTER between all the different generic types, so perhaps we go from 65535 to 65536 with a u32 call, and then do a u16 call and now we’ve wrapped and got into unsafe behaviour again.

If we’re going to panic on overflow, I think a u64 ID is required to avoid exhausting IDs in the most general case, so this should be the default for a fully safe version. So really what I need to do is to detect at compile-time when u64 is bigger than usize and switch to using a mutex there.

Or maybe stick with u32, panic on overflow, and suggest that they use the “unsafe wrapping” feature you mentioned in the other comment if that is exceeded.

0 Likes

#15

You’re totally right, that was my bad; we can’t use static with generics (yet) so the only option would be to explicitly create a struct for each type, which looks like a horrible idea. I have edited out that suggestion from my previous post.

There is an ::atomic64 crate out there that does exactly that! (although committing a whole Mutex makes me real sad, maybe a double AtomicUsize with manual bit-carry arithmetics and properly synced could be more efficient).

Do whatever you prefer, I agree that this 0.0001% unsafety detail is getting a little bit out of hand :sweat_smile:

0 Likes

#16

What this ends up is 3 versions of the ID-allocating code: wrapping u32, atomic u64 and mutex u64, defaulting to the u64 version, according to 64/32-bit platform. I still think the wrapping u32 version is the best balance of speed and safety, but I can certainly see the point of being 100% safe. I have some code done but it needs testing more thoroughly. I’ll get this finished in the next few days.

However, I wondered whether I was wasting my time considering that the GhostCell might be better so I had a quick look at that. I ran all the same tests as QCell and TCell against it and it works just the same. Here’s some sample code using GhostCell:

    use qcell::ghost::{Cell as GCell, Set as GCellOwner};
    use std::rc::Rc;
    GCellOwner::new(|mut owner| {
        struct Context<'id> { owner: GCellOwner<'id>, }
        let mut ct = Context { owner };
        let c1 = Rc::new(GCell::new(100u32));
        let c2 = Rc::new(GCell::new(200u32));
        fn test<'id>(ct: &mut Context<'id>, rc: &Rc<GCell<'id, u32>>) {
           *ct.owner.get_mut(rc) += 1;
        }
    
        test(&mut ct, &c1);
        let c1mutref = ct.owner.get_mut(&c1);
        *c1mutref += 2;
    });

It’s really neat the way you just create cells without mentioning the owner, and Rust figures out the owner from the first time you access it (even when there are multiple owners active). However that only works within the same function. All functions called from there need lifetime annotations. So it’s a bit noisy, but maybe worth it if you don’t want the overheads and need the flexibility. I’ve asked the source project about licenses to see if I can pull the code over (or if they want to make their own crate).

This means, though, that QCell and TCell still both have a place. So making QCell 100% safe is worth doing.

1 Like

#17

Ghost In The Cell

nuff’ said :smile:

2 Likes

#18

More seriously, I hadn’t had the time to look at GhostintheCell, it does look like a cumbersome but clever hack: I like it!

0 Likes

#19

I fail to understand the point of qcell.
Remove the cell pub struct QCell { owner: QCellOwnerID, }
Place T on pub struct QCellOwner { id: QCellOwnerID, value: T, }
Then you almost have the same thing, no unsafe code in site.

0 Likes

#20

I fail to understand the point of qcell.

See the examples on the Reddit link, or else see the docs on the crate. QCell exists for the same reason that RefCell exists. So you could read up on RefCell as well.

0 Likes