# Optimizing Clone of Copy Types?

Hello everyone,

I have a question about the semantics of `Copy` and `Clone`. (If there is a better topic for this, please let me know or move it.)

Consider this trivial, rather silly program:

``````#[derive(Copy, Clone)]
struct Point {
x: f32,
y: f32,
}

fn euclidean_distance(p1: &Point, p2: &Point) -> f32 {
let p1 = p1.clone(); // unnecessary clone
let p2 = p2.clone(); // unnecessary clone
((p2.x - p1.x).powf(2.0) + (p2.y - p1.y).powf(2.0)).sqrt()
}

fn main() {
let p1 = Point { x: 10.0, y: 1.0 };
let p2 = Point { x: 1.0, y: 1.0 };

println!("The distance is {}", euclidean_distance(&p1, &p2));
}
``````

Silly though it is, I have similar code repeated over 150 times in a code base, because it is the output of a procedural macro.

Imagine that `euclidean_distance` is generated on any type defined by macro inputs. That type may or may not be `Copy`, but it will always be `Clone`.

The generated function always takes references, because sometimes it can avoid copies. But when it needs copies, it must `clone()` them.

Now that I'm starting to use clippy more, I discovered that it has a lint that fires on all my Copy types. I was surprised: wouldn't rustc optimize out `Clone` on `Copy` types?

Well, looking at the MIR of the example, the answer is no. The explicit call to clone is still in there, even in release mode. It even uses extra stack slots just to make the call!

Is this a missed optimization, or preserving a language semantic? In other words: does deriving `Copy` always imply that `Clone` is a trivial copy?

Sorry for the slow reply over on IRLO. For category you could just use `help` (as in, asking for help to understand the semantics of `Copy` and `Clone`).

To address your actual question.
AFAIK, rustc doesnâ€™t do too much optimization itself but mostly relies on LLVM. I see you tried looking at the MIR for answers, but I think the way to actually find out whether `clone` has any overhead (and Iâ€™d strongly suspect it doesnâ€™t) is by compiling a version with the `clone` and a version without it, both with optimization, and e.g. comparing the assembly, or maybe just the performance. In this case, comparing cloning and copying, I wouldnâ€™t be surprised if the generated assembly was identical.

In general, there is no guarantee enforced by the compiler that the `Clone` implementation of a type that also implements `Copy` does nothing but just copy the value. However, it should be very common practice, to the point that I would consider it a bug if a library provided a type where `clone` has observably different behavior from copying. (And Iâ€™m not counting differences like `Option<T>` on `clone` calling `clone` recursively (unconditionally) on the inner `T` vs. copying not doing such a thing, because this difference is only observable if the inner type `T` makes `clone` and copying behave differently.)

However this means that the compiler cannot simply omit calls to `clone`. The way this would get optimized in LLVM is probably that after inlining everything (recursively) the compiler understands that whatâ€™s going on is the same as just copying.

Edit: I clicked your link the clippy lint. It saying, â€śIt is not always possible for the compiler to eliminate useless allocations and deallocations generated by redundant `clone()` s.â€ť, makes me wonder what kind of example they had in mind where `clone()` does extra allocations. In particular Iâ€™m curious whether thatâ€™s possible with any types from the standard library or popular crates. In any case, for your use case, it sounds like you know/control which kind of types get cloned and as long as nothing weird is going on with your types it should all get properly optimized, I guess.

For example for your euclidean_distance, if you test it on `godbolt.org` (click me),
both versions do generate the same assembly (and then the compiler unifies them to avoid code duplication, thatâ€™s why thereâ€™s only one function in the generated assembly).

I think the way to actually find out whether `clone` has any overhead (and Iâ€™d strongly suspect it doesnâ€™t) is by compiling a version with the `clone` and a version without it, both with optimization, and e.g. comparing the assembly, or maybe just the performance. In this case, comparing cloning and copying, I wouldnâ€™t be surprised if the generated assembly was identical.

On this example, and in most simple cases, LLVM does indeed optimize that to single move when I look at the assembly, as you noted.

Perhaps I'm over thinking it, but this code is generic, so that lint made me think about more complex cases. Namely, Copy types that (a) don't fit in machine registers and/or (b) cannot be easily destructured by LLVM.

For example, something that is `#[repr(C)]` and has sixteen fields, including perhaps packed ones, to do some terrible FFI.

In general, there is no guarantee enforced by the compiler that the `Clone` implementation of a type that also implements `Copy` does nothing but just copy the value. However, it should be very common practice, to the point that I would consider it a bug if a library provided a type where `clone` has observably different behavior from copying.

I consider this the answer to my original question. Thanks.

Manual implementations should be careful to uphold this invariant; however, unsafe code must not rely on it to ensure memory safety.

So you can't assume, for example, that just because the type is `Copy` that calling `Clone::clone` won't panic.

That said, if you do have a type that's `Copy`, one is allowed to bitcopy it instead of calling `Clone::clone` -- this is such a strong guarantee that there's even an RFC explicitly allowing it, https://rust-lang.github.io/rfcs/1521-copy-clone-semantics.html

(So if you have a `T: Copy` where `<T as Clone>::clone` does weird things, you're allowed to misbehave -- garbage in garbage out -- just not be unsound because it.)

As a sidenote, you could figure this out yourself easily in the playground.

Interesting. Thanks for the context, @scottmcm. That implies perhaps an optimization could be written someday that eliminates the calls to `clone`.

Either way, it's still something that seems worth a lint, and I should test for if I hit a case where I need to know.

An update: @CAD97 pointed out on the other thread that my problem was even simpler than I thought.

I removed the `clone()` from the macro completely, and everything still works.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.