CheapClone vs ExpensiveClone?

  1. I know that I can define my own traits -- but I am wondering if there is an idiomatic way to solve this problem. I also can't rely on Copy, as it involves Drop / destructors.

  2. One of the great things about Rust is that it is so explicit with regards fo how much resources something takes.

  3. However, when I see .clone(), it's not obvious to me whether it's cheap or expensive -- am I just cloning a few Rc's -- is these some O(n) Vec behind this? Not explicit when I look at the code statically.

  4. Is there any idiomatic solution to this? This sounds silly, but I almost want to see distinctions of .this_is_constant_time_clone() vs .there_is_a_vec_here_clone() vs .probably_dont_want_to_do_this_clone()

One pattern I've seen is to clone 'Rc-like' types using a more explicit syntax:

let bar = foo.clone(); // no
let bar = Rc::clone(&foo); // yes

This makes it a bit clearer that you're cloning a pointer type, not the actual underlying data. There's a Clippy lint for this, but it's off by default since it's a stylistic choice rather than a correctness thing.

8 Likes

Indeed, the book also recommends this: read the paragraph just above the linked heading.

This only helps so much, though. I can understand the frustration when a struct contains Rcs in it. Oftentimes you may have to look at a struct's definition to determine how expensive it is.

My code base currently has a type that can contain up to 30 GB of data. At some point it used to look like

struct Eigenbasis3(Vec<Ket3>);

I tried to remove all clones of it, but there was one that I just couldn't get rid of. So I finally changed it to

struct Eigenbasis3(Arc<[Ket3]>);

and had to document at the remaining clone site that we aren't actually copying the data, just for my sanity's sake.

You can also create you own custom trait for things like this:

pub(in crate) // or you can seal the trait
trait RefCounted : Clone {
    #[inline]
    fn inc_refcount (self: &'_ Self) -> Self
    {
        self.clone()
    }
}

impl<T : ?Sized> RefCounted for ::std::rc::Rc<T> {}
impl<T : ?Sized> RefCounted for ::std::sync::Arc<T> {}

and then use x.inc_refcount()

3 Likes

I totally agree - my main project at the minute is a game engine, and there's no great way of exposing the fact that "hey you can clone Texture pretty much for free please don't tie yourself in knots trying to pass around references" other than adding it to the docs.

In this extreme, it probably shouldn't implement Clone at all, but rather just have a regular method that makes the cost clear.

1 Like

Would the general Rust community, when reading my code, be angry if I did the following:

  1. For O(1) cost clone, pass argument to function BY REFERENCE -- function calls .clone() on its own.

  2. For expensive clone, pass the argument as a CLONED argument.

I.e. something like:

pub struct CheapToCopyObj {};
pub struct ExpensiveToCopyObj {};

blahblah(&cheap_to_copy); // function itself does clone()
blah2blah(expensive_to_copy.clone()); // the calling function does the clone()

Does this make any sense, or just stupid/silly looking?

Generally, if a function always needs to clone an input, it should just take ownership instead of taking a reference, regardless of whether the clone method is cheap.

5 Likes

This might sound heretical -- what is the rationale behind this? I see a far greater distinction between "constant time clone vs huge time clone" than "pass by ref vs ownership"

Because if the user already has a value that they aren't going to use anymore, then they still have to pay the cost of cloning. If this matters depends on context, but in the vast majority of cases it is better to pass in a value rather than force a clone.

2 Likes

If you take a &str and .to_owned() it immediately, it means that you're forcing a copy on the caller that they can do nothing about. If you take a String instead, then they might be able to move it in instead, and if they can't, the cost is properly attributed in the profile to the caller.

6 Likes

To the best of my knowledge, &str -> String is NOT O(1) ... so we wouldn't do this. I'm suggesting pass by ref + having called-function-clone only in situations where the clone() is guaranteed to be O(1).

O(1) may still be larger than move. Cheap clone can still be more work than no clone.

For example, if you have a function that takes fn foo(Arc<T>) by move, then the caller may do foo(existing_instance) without touching the refcount.

3 Likes

If cloning the type is so cheap you’re willing to do it unconditionally, why not just implement Copy and benefit from simpler semantics?

Copy is a memcopy, most of the time that's not what you want in your Clone.

In some cases you can't implement Copy, but you can implement Clone

AFAIK, a single object can't implement both Copy and Drop.

After reading all this, it seems the simplest approach is to just add a new trait CheapClone, then add a procedural macro where a struct can derive CheapClone iff all members are:

  1. primitives (i.e. have copy)
  2. Rc or Arc
  3. im_rc:: some immutable data structure
  4. already implement CheapClone

FastClone probably sounds better than CheapClone.