Optimizing Clone of Copy Types?

Hello everyone,

I have a question about the semantics of Copy and Clone. (If there is a better topic for this, please let me know or move it.)

Consider this trivial, rather silly program:

#[derive(Copy, Clone)]
struct Point {
    x: f32,
    y: f32,
}

fn euclidean_distance(p1: &Point, p2: &Point) -> f32 {
    let p1 = p1.clone(); // unnecessary clone
    let p2 = p2.clone(); // unnecessary clone
    ((p2.x - p1.x).powf(2.0) + (p2.y - p1.y).powf(2.0)).sqrt()
}

fn main() {
    let p1 = Point { x: 10.0, y: 1.0 };
    let p2 = Point { x: 1.0, y: 1.0 };

    println!("The distance is {}", euclidean_distance(&p1, &p2));
}

Silly though it is, I have similar code repeated over 150 times in a code base, because it is the output of a procedural macro.

Imagine that euclidean_distance is generated on any type defined by macro inputs. That type may or may not be Copy, but it will always be Clone.

The generated function always takes references, because sometimes it can avoid copies. But when it needs copies, it must clone() them.

Now that I'm starting to use clippy more, I discovered that it has a lint that fires on all my Copy types. I was surprised: wouldn't rustc optimize out Clone on Copy types?

Well, looking at the MIR of the example, the answer is no. The explicit call to clone is still in there, even in release mode. It even uses extra stack slots just to make the call!

Is this a missed optimization, or preserving a language semantic? In other words: does deriving Copy always imply that Clone is a trivial copy?

Sorry for the slow reply over on IRLO. For category you could just use help (as in, asking for help to understand the semantics of Copy and Clone).

To address your actual question.
AFAIK, rustc doesn’t do too much optimization itself but mostly relies on LLVM. I see you tried looking at the MIR for answers, but I think the way to actually find out whether clone has any overhead (and I’d strongly suspect it doesn’t) is by compiling a version with the clone and a version without it, both with optimization, and e.g. comparing the assembly, or maybe just the performance. In this case, comparing cloning and copying, I wouldn’t be surprised if the generated assembly was identical.

In general, there is no guarantee enforced by the compiler that the Clone implementation of a type that also implements Copy does nothing but just copy the value. However, it should be very common practice, to the point that I would consider it a bug if a library provided a type where clone has observably different behavior from copying. (And I’m not counting differences like Option<T> on clone calling clone recursively (unconditionally) on the inner T vs. copying not doing such a thing, because this difference is only observable if the inner type T makes clone and copying behave differently.)

However this means that the compiler cannot simply omit calls to clone. The way this would get optimized in LLVM is probably that after inlining everything (recursively) the compiler understands that what’s going on is the same as just copying.

Edit: I clicked your link the clippy lint. It saying, “It is not always possible for the compiler to eliminate useless allocations and deallocations generated by redundant clone() s.”, makes me wonder what kind of example they had in mind where clone() does extra allocations. In particular I’m curious whether that’s possible with any types from the standard library or popular crates. In any case, for your use case, it sounds like you know/control which kind of types get cloned and as long as nothing weird is going on with your types it should all get properly optimized, I guess.

For example for your euclidean_distance, if you test it on godbolt.org (click me),
both versions do generate the same assembly (and then the compiler unifies them to avoid code duplication, that’s why there’s only one function in the generated assembly).

I think the way to actually find out whether clone has any overhead (and I’d strongly suspect it doesn’t) is by compiling a version with the clone and a version without it, both with optimization, and e.g. comparing the assembly, or maybe just the performance. In this case, comparing cloning and copying, I wouldn’t be surprised if the generated assembly was identical.

On this example, and in most simple cases, LLVM does indeed optimize that to single move when I look at the assembly, as you noted.

Perhaps I'm over thinking it, but this code is generic, so that lint made me think about more complex cases. Namely, Copy types that (a) don't fit in machine registers and/or (b) cannot be easily destructured by LLVM.

For example, something that is #[repr(C)] and has sixteen fields, including perhaps packed ones, to do some terrible FFI.

In general, there is no guarantee enforced by the compiler that the Clone implementation of a type that also implements Copy does nothing but just copy the value. However, it should be very common practice, to the point that I would consider it a bug if a library provided a type where clone has observably different behavior from copying.

I consider this the answer to my original question. Thanks.

From https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html,

Manual implementations should be careful to uphold this invariant; however, unsafe code must not rely on it to ensure memory safety.

So you can't assume, for example, that just because the type is Copy that calling Clone::clone won't panic.

That said, if you do have a type that's Copy, one is allowed to bitcopy it instead of calling Clone::clone -- this is such a strong guarantee that there's even an RFC explicitly allowing it, https://rust-lang.github.io/rfcs/1521-copy-clone-semantics.html

(So if you have a T: Copy where <T as Clone>::clone does weird things, you're allowed to misbehave -- garbage in garbage out -- just not be unsound because it.)

As a sidenote, you could figure this out yourself easily in the playground.

Interesting. Thanks for the context, @scottmcm. That implies perhaps an optimization could be written someday that eliminates the calls to clone.

Either way, it's still something that seems worth a lint, and I should test for if I hit a case where I need to know.

An update: @CAD97 pointed out on the other thread that my problem was even simpler than I thought.

I removed the clone() from the macro completely, and everything still works.

1 Like