Why does Cell require Copy instead of Clone?

Hi all,

I've been dabbling with Rust in my spare time for over a year now, and I think I've come across something that might indicate some level of misunderstanding on my part.

If I am defining an interface with interior mutability, and want to use Cell as part of that, why must the contained type be Copy instead of just Clone? For example, why would it be a bad thing to store a Cell<Ref<T>>, assuming that's what you actually wanted?

I've done some initial searching about, and have only found vague references to it maybe not being safe, but I don't think I understand why. Which of the differences between Copy and Clone mean that Copy is safe in this case, whereas Clone wouldn't be?

5 Likes

The important difference between Copy and Clone in this context is that copying a Copy type doesn't run any code, whereas Clone::clone() can run arbitrary code. A demonstration:

use std::cell::RefCell;
use std::rc::Rc;
struct CloneCell<T>(RefCell<T>);
impl<T:Clone> CloneCell<T> {
    fn new(t: T) -> Self { CloneCell(RefCell::new(t)) }
    fn get(&self) -> T { self.0.borrow().clone() }
    fn set(&self, t: T) { *self.0.borrow_mut() = t; }
}
struct X(Option<Rc<CloneCell<X>>>);
impl Clone for X {
    fn clone(&self) -> X { self.0.as_ref().unwrap().set(X(None)); X(None) }
}

fn main() {
    let x = Rc::new(CloneCell::new(X(None)));
    x.set(X(Some(x.clone())));
    x.get();
}
1 Like

Cell is a tool to extend the normal Rust rule of sharing == no mutation, but being able to be mutated while shared comes with a trade-off: fewer things can be mutated. Some sort of restriction is necessary, or else one could get dangling pointers caused by doing a mutation that invalidates a reference that has previously been retrieved. Cell is designed to never give any references to its interior (the value that can be mutated while shared) to external code, and thus ensures that there's no risk of invalidation. (RefCell is designed to allow "references", aka Ref and RefMut, but it needs those fancy references so that it can track them.)

The problem with Clone is that it requires &self, that is, Cell would need to pass a reference to its interior into the external/arbitrary/user-defined code of a Clone implementation. Once this reference exists, the Clone implementation could theoretically use another path to the Cell (this is where the sharing comes in) to invalidate it. On the other hand, T: Copy is safe because it means T can be duplicated in a known, controlled way (a byte copy) without having to run any user-defined code, and thus no references to the interior can escape.

Having a concrete example is good:

use std::cell::UnsafeCell;

// version of Cell with Clone
struct Cell<T> {
    data: UnsafeCell<T>
}

impl<T: Clone> Cell<T> {
    fn new(x: T) -> Cell<T> {
        Cell { data: UnsafeCell::new(x) }
    }
    
    fn get(&self) -> T {
        unsafe {(*self.data.get()).clone()}
    }
    fn set(&self, x: T) {
        unsafe {
            *self.data.get() = x;
        }
    }
}


// the "magic" type
struct BadClone<'a> {
    data: i32,
    pointer: &'a Cell<Option<BadClone<'a>>>,
}

impl<'a> Clone for BadClone<'a> {
    fn clone(&self) -> BadClone<'a> {
        // grab a reference to our internals
        let data: &i32 = &self.data;
        // what is it?
        println!("before: {}", *data);
        
        // now clear out the cell we point to...
        self.pointer.set(None); 
        
        // print it again (should be no change!)
        println!("after: {}", *data);

        BadClone { data: self.data, pointer: self.pointer }
    }
}

fn main() {
    let cell = Cell::new(None);
    cell.set(Some(BadClone {
        data: 12345678,
        pointer: &cell,
    }));
    

    cell.get();
}

A reference like &i32 shouldn't change---while it lives the i32 is immutable---meaning printing it multiple times should always be the same (no matter what happens between them), but this code prints:

before: 12345678
after: 0

Whoops!

The problem is that the Clone implementation can kill the exact BadClone value it is being called on, by having a circular references.

This is undefined behaviour, and could easily lead to memory corruption e.g. if data included some pointers, they'll get set to an effectively random value (Option is nice enough to zero in this case, but there's no guarantee about what value will be there), meaning code could segfault---if you're lucky---or end up writing to random parts of memory.

11 Likes

Ah, I understand. Thanks. Coming from a C++ background, I hadn't taken into account that Copy can't be overridden like that. That's a subtle distinction I hadn't expected to be attached to the primary semantic difference (copy vs move) of the two traits.

Would it help matters to have a 'static restriction as well as Clone? If I'm reading it correctly, I would expect the first example from eefriedman to panic instead of corrupting data. (Which seems like a reasonable thing to be risking if you're performing an interior mutation during a clone, really...)

For my project, that might actually be "safe enough" such that I'll use it in my project with appropriate caveats in place. (I totally understand now why that's not OK for the standard library.)

'static doesn't help at all.

Yes, the CloneCell based around a RefCell has well-defined behavior. That said, you might as well just use RefCell directly.

Sorry, I wasn't clear. The use case in my project is a re-implementation of Cell restricted to Clone instead of Copy, rather than a wrapper around RefCell. At that point, it has the semantics implied by your example implementation above, but doesn't have the extra memory and performance overhead that come along with RefCell. If you're storing Copy data in it, then it has the same overhead as Cell, and it doesn't panic unless the Clone implementation does.

Oh. Panics aren't the problem; the problem is the uniqueness of &mut references. RefCell panics on a bad borrow_mut, but if you replace it with an UnsafeCell you just end up with undefined behavior.

Ok, now I'm back to confused again. If we assume CloneCell<T:Clone + 'static>, how can one end up with an extra &mut reference without unsafe code in the Clone implementor?

The semantic difference is really mainly a consequence of the Copy trait's guarantees: both a copy and a move are identical (always byte copies), so a type implementing Copy says that the source of such a "move/copy" is safe to continue using. That is, that Copy is a valid way to duplicate the value.

Unfortunately that isn't quite enough. For instance, my example fails in the same way with Rc instead of &'a:

// the "magic" type
struct BadClone {
    data: i32,
    pointer: Rc<Cell<Option<BadClone>>>,
}

impl Clone for BadClone {
    fn clone(&self) -> BadClone {
        // grab a reference to our internals
        let data: &i32 = &self.data;
        // what is it?
        println!("before: {}", *data);
        
        // now clear out the cell we point to...
        self.pointer.set(None); 
        
        // print it again (should be no change!)
        println!("after: {}", *data);

        BadClone { data: self.data, pointer: self.pointer.clone() }
    }
}

fn main() {
    let cell = Rc::new(Cell::new(None));
    cell.set(Some(BadClone {
        data: 12345678,
        pointer: cell.clone(),
    }));
    

    cell.get();
}

(In fact it is slightly worse on the playpen: it also segfaults.)

I think (but don't know, since there's no description from them) @eefriedman's example was using the panic as a proxy for "bad stuff would happen", if the RefCell was replaced with UnsafeCell (as in the real Cell or my reimplementation)---which doesn't have the dynamic checks of RefCell---the problem wouldn't have been caught in such an obvious way, leading to undefined behaviour/memory corruption as in my versions.

I see. So it basically need a mechanism for preventing a looped reference on the stack to prevent the set() call from resolving to another set() call to the same object.

Thanks for explaining that to me all the way. :slight_smile: As an experiment, I changed my CloneCell to store UnsafeCell<Option<T>> and made it expect a value before performing the clone() (enforced via set() accepting a &T instead of just T) which introduces that barrier, and got it to panic instead of segfaulting. I'm not sure if that's any more efficient than just using RefCell<T>. It might be in some cases due to some of the optimizations that Option<T> gets to take advantage of on some types, but I doubt it.

I always thought it was because Copy types cannot be Drop. Interesting that there's another reason.

In fact, I don't think Drop comes into it at all: I can't think of any problems in Cell specifically if we had types that were both Copy and Drop.