Does anybody else try to avoid deriving Copy?

A bit of a philosophical question here:

I've found that I set a rule for myself to avoid deriving Copy on any structs I create, and I'm wondering if my reasoning is sound or other people have thoughts on the topic.

I guess I have two main reasons, the first is because of issues I've had with code maintainability. For example recently I had a struct that derived Copy, that was used all over the place in a client's project. Then specifications changed and I needed to stick a Vec into the struct, and I couldn't derive copy on the struct any more. (I suppose I could have replaced the Vec with a fixed length array sized for the worst case scenario, but this would have come with a huge memory penalty.) So I put the Vec in the struct, removed the derived Copy, and unfortunately this broke my code in many, many places. It wasn't terribly difficult to fix, but nevertheless seems like I would have been better off never deriving Copy in the first place.

The bigger reason, though, is that I think one of the advantages of Rust is that you're prompted to think about how your program is managing memory and explicitly guide it to safe and (hopefully) efficient memory management. (A lot of people think this is a disadvantage but I think it's the opposite...) When you have to explicitly call .clone() it prompts you to think about whether some rearrangement of your code could allow the function to take ownership of the variable, or whether the function should instead accept a reference, etc. When I was refactoring the code after I couldn't derive copy any longer, I noticed that in most places that some simple changes allowed me to avoid calling .clone() altogether. Thus in hindsight I felt like the Copy construct was a crutch letting me get away with writing worse code, by making things too easy.

12 Likes

Generally I derive Copy for structs like Pixel {x: usize, y: usize, color: Rgb} or SomethingId([u8; 32]). For anything bigger or more complex I derive Clone.

14 Likes

There are sometimes people who ask why #[derive(Copy)] doesn't just happen automatically and has to be explicit. The things you describe here are exactly why. You should only derive Copy if you are confident you wont run into trouble later because removing it is backwards incompatible.

23 Likes

I don't actively avoid deriving Copy; in fact the Rust API guidelines state that you should implement many of the often-used std traits if possible, for better interoperability, Copy included.

Furthermore, I do personally find code easier to write and less noisy if copiable types are in fact Copy. I don't worry about its performance; most types eligible to be Copy at all are small and trivial in the first place. I still do think it's a good idea that copying is opt-in and explicit, because sometimes I may not want it for reasons of correctness (e.g. "token" types that ensure unique ownership of a resource, possibly involving unsafe code).

Breaking changes are I think unavoidable when requirements change. There is always the possibility of a major semver bump, so unless you are developing a widely-used crate, I would not worry about such changes. As you mentioned, Copy -> Clone is trivial to fix, even if it leads to some annoyance.

12 Likes

To me it sounds like the problem was not that you derived Copy for your struct but that the code which uses that struct took it by value whereas it should have taken it by borrow in the first place (assuming that taking it by value wasn't necessary).

I think any is probably too strict, but deriving it anywhere possible is also too loose.

There's a bunch of related things here. Like do you offer a new that takes everything needed to construct it? That could also be a pain if it needed to get a new field from updating those calls, so maybe the extra annoyance from a semver break to no longer be Copy isn't that much different, so it might as well be derived.

Whereas if you're using a builder API and making the struct #[non_exhaustive], then not deriving Copy seems like the right default because the struct is already in the "I'm not sure exactly what I need here" world.

4 Likes

Function parameters of type &T where T: Copy are considered somewhat unidiomatic (outside of generic contexts), there's even a Clippy lint that flags them.

6 Likes

I don't think I'd assume that. There are a bunch of possibilities like a &[SomethingCopy] parameter where that's the right way to pass it, but where removing the Copy impl is very likely to break the code consuming it.

1 Like

Interesting, not sure I like that. I usually try to decide between owned or borrowed parameter based on semantics, not the actual low level details of copy vs reference (only exception are basic types such as u32 where I almost always use an owned parameter).

The idea is that most (though not all) Copy types are also basic "plain data" types like the built-in numeric types.

Just as an example, consider struct UUID. It's just a number with a few extra semantics; you should be passing it around by-copy because there's no reason to add the logical complexity of passing a reference for a plain data type. The same goes for things like linalg math types (e.g. Mat44), though there you start running into the physical codegen tradeoff.

I personally stand by both "if a type can be Copy, it probably should be" and "if a type is Copy, it probably shouldn't be passed by shared reference." (There are obviously always exceptions.)

The important part is how you define "can be Copy". It's not that #[derive(Copy)] works; that's just a necessary but not sufficient precondition. I'm actually concerned with a "logical Copy" first and foremost. Does identity matter? Not Copy. Does it manage a resource? Not Copy. Etc.

There is still a bit of predicting the future involved, sadly. It might happen that your plain data type actually ends up not being plain data a few months down the line, and that's just a refactor you're going to need to deal with. But there's also the possibility that with a good initial design, what you actually want is a (&Resource, Data) pair, not to make your data into a resource.

Good program engineering is difficult. At least we have the type system to guide us :slightly_smiling_face:

17 Likes

Right, as I went through the code a lot of these cases should have passed by reference to begin with, or the code should have been rearranged to allow the function to take ownership of the variable, but that's actually the point I'm getting at--if you don't derive copy you are forced to think about it and write it correctly the first time, or else it won't compile. But if you make a new struct and the first thing you do is derive copy (like I did), you can write a bunch of code using that struct that compiles fine and runs fine but is suboptimal because there are a bunch of unnecessary copies happening.

In particular people who are newer to the language might not have a good sense of what they are asking the processor to do, and when you are forced to manually type out .clone(), it's a little reminder that maybe you have an opportunity for some more optimization. The thing I like about Rust is the principle that code with potential problems gets stopped by the compiler, so philosophically I don't like that copy constructor can allow you to compile code that could be improved.

2 Likes

I don't see how forcing the programmer one way would be better than the other. Sometimes a copy is better. Sometimes a reference is better. Copies often optimize very well, so you're not actually asking the processor to do more work.

Note that this tradeoff can be subtler than you expect: Most Rust code is run on an operating system with virtual memory. On these systems, copying large, page-aligned values will often be quite fast: Instead of physically copying the bytes, you might get a simple page table update instead.

7 Likes

Hmm... but then what happens when you write to that new page table? Surely it would have to then copy the page and make the change, COW.

If you are not going to write to that huge thing then why is it being passed around as copy rather than a reference.

I'd rather not assume any thing about virtual memory in general. That is totally out of my control even if it does exist on my platform.

I've never heard of a memcpy that does this. Do you have a demonstration? I'd love to play with it.

2 Likes

glibc does, for one. I'd expect most mature memcpy implementations to work similarly, if the platform supports it.

7 Likes

I feel like deriving copy for things that aren't obvious value newtypes enables me to make more logic errors with inadvertently creating copies.
On the other hand, the memory impact of the inadvertent copies doesn't worry me at all, since non-cloning moves are the exact same memcpy, and the only difference is whether the type system considers the old location invalidated.

2 Likes

Still, anything bigger than liek 2-8 usizes does begin to marginally benefit from being passed to stuff by & instead of copying, right?

1 Like

I am one of those people, but now it makes complete sense. I guess the same argument applies to all the traits that need to be derived. It makes me wonder then why we have any auto traits at all. Is there a simple explanation? Would it simply be to onerous on the dev to derive them for almost all their custom types?

No. Not necessarily. As several others have been continuously trying to explain, it is practically impossible to make meaningful, general statements of the form "passing values bigger than N bytes is faster by reference than it is by value".

This does not only depend on the size of the value. It also depends on how it is used. A copy of arbitrary size can be optimized away if it's only used in a local, limited context, and the optimizer can prove that it can be elided.

By the way, it is exceedingly rare that you can significantly speed up your code by merely changing how you pass some types. In the overwhelming majority of cases, your programs will spend most of their time either doing I/O, or doing number-crunching. If your program is not fast enough, you'll have to figure out how to use less I/O or do less computational work; fiddling with trivially copiable types likely won't help at all.

10 Likes