Is a Type That's 1024 bytes More Efficient to Reference or Copy?

I've got a struct that apparently takes 1024 bytes to store, but I can still derive Copy on it because all of the types in it are Copy. Am I right assuming that, in general, it is more efficient to pass a reference to a struct of that size?

I know you don't usually pass references to most primitives such as floats, or int's because the overhead of the reference is more than the copying, but when does it become more efficient to have a reference?

The ultimate answer to this sort of question is to profile both ways. With a type that big, you could consider tagging it #[repr(align(4096))]. On most architectures, that will align it to memory pages which might reduce a copy to a single-word page table update.


Of course it depends on how often you pass that thing around, and it doesn't matter if the called functions take ages anyway, and you should try to measure on the actual platform if you want to be sure.

But if you're looking for gut feeling, I recently benchmarked a vaguely related situation: I complicated an already optimized algorithm, so that it would move objects more efficiently, i.e., fewer times for the same end result. For your typical tiny i32, the overhead made it slower, as feared. Around hundred bytes, saving the copies broke even. For 666 byte monsters (I found that devilishly big already), I saw a factor of improvement, as if the amount of copying was all that mattered.

1 Like

It really depends on access patterns and probably also on a bunch of platform-specific stuff like ABI and cache line size. There are no guarantees.

However, in many real-world examples it's unlikely to make a major difference, if any at all. Which of these functions makes fewer copies?

struct Foo([u8; 1024]);

pub fn with_foo(callback: fn(Foo)) {
    callback(Foo([0; 1024]))
// vs.
pub fn with_foo(callback: fn(&Foo)) {
    let foo = Foo([0; 1024]);

Trick question! These compile to the exact same assembly.

Which of the following functions makes fewer copies?

pub fn get_foo() -> Foo {
    Foo([1; 1024])
// vs.
pub fn get_foo(out: &mut Foo) {
    *out = Foo([1; 1024]);

Trick question again! Both versions are passed an out pointer and compile to a single memset. (Due to ABI limitations, the second one is actually slightly better by an O(1) term, but not in a way that survives inlining.) Check it out on Compiler Explorer.

(Note that implementing Copy has no effect on code generation. Copy determines what kinds of code you can write, but it doesn't influence what valid code compiles to once written.)

In conclusion, there are no guarantees, but if you're fiddling about with references and out pointers in order to save copies, you should definitely be profiling every change to be sure the difference is what you think.


Thanks all for your replies! Sounds like I'm gonna profile it. :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.