The Copy trait - what does it actually copy?


#1

Today I read a great article about how ownership works in rust. Nicely written and illustrated: https://medium.com/@thomascountz/ownership-in-rust-part-1-112036b1126b

Disclaimer: I am neither the author nor a friend of his and this is not a cheap attempt to catch attention for it. The author tweeted about this article and I responded with a question. We got into a little discussion about it and the man/woman behind the rustlang twitter account advised to open a discussion here to clarify it.

The author puts a figure:

and adds the words:

We can see here, that when using a string literal, Rust is copying the value of hello into hello1

and that statement conflicts with my understanding. Maybe it’s just a wording issue. My understanding is that hello and hello1 point to the same piece of data (not sure whether it’s on the stack or the heap in this case) and the operation
let hello1 = hello;
does not produce a copy of the data (his wording “value”) but a copy of the reference to it (I’d call it the memory address).
I’d argue it’s fine having several references to the same piece of data because it is immutable, so it can be shared and it would make no sense to have it twice in memory. At least this holds true for a string literal. As a result, the illustration should be showing both green guys holding on to one “Hello World”-block, not two different ones.

In one answer to the tweet he explains what makes him think there’s really a duplication happening and it might be true for integers, but I’d think it’s a different story for a str which is probably an array of characters behind the scenes and copying that could become quite expensive (especially for large strings).

Please enlighten me (and/or the author) to make sure that, apart from wording issues, we have the right understanding of what’s happening. Also, he might want to correct his article in case there’s something wrong, so others don’t learn it wrong.


#2

You are correct, the first hello and hello1 pair have type &str, not str, so it’s only the reference that is getting copied. Actually copying the underlying characters to a new buffer is not the sort of thing implicit copies (i.e., the kind where you don’t need to write .clone()) are ever allowed to do.

You can demonstrate this by printing the address they’re pointing to:
https://play.rust-lang.org/?gist=d188ccaccb8d750a2e86cd3345999aba&version=stable&mode=debug&edition=2015

fn main() {
    let hello = "Hello, world!";
    let hello1 = hello;
    println!("{}", hello);

    println!("{:?}", hello.as_ptr());
    println!("{:?}", hello1.as_ptr());
}
Hello, world!
0x560ab60c7c40
0x560ab60c7c40

#3

In both cases the “value” is moved; the difference is what is the value. In the &str case (i.e. “hello world” literal), the value is a (fat) ptr. In the String case, the value is the 24 bytes (on 64bit) container object that holds the ptr to the heap data, length, and capacity of the backing heap allocation (a String is actually a Vec<u8> internally, as it happens).

References (i.e. &T) are copy types, whereas values are not necessarily (String isn’t). So a “copy” and a “move” are, mechanically, the same thing - bitwise copy. The sole difference is, with Copy, the source value is still usable after the copy is taken.

And as @Ixrec mentioned above, in Rust you cannot override a copy or a move operation. Instead, you implement Clone and callers explicitly request that - what it does is up to your implementation.


#4

If it helps, it’s worth pointing out the rules here are exactly the same for integers. Copying a &i32 is also just copying a reference and doesn’t move the underlying i32 anywhere:

https://play.rust-lang.org/?gist=225486aa71ed7c3de343ee987584d45b&version=stable&mode=debug&edition=2015

fn main() {
    let a = &42;
    let b = a; // copies a reference

    let copy = *a; // copies the value to a new location

    let x = &copy;
    let y = x; // copies a reference

    println!("{:?}", a as *const i32 as usize);
    println!("{:?}", b as *const i32 as usize);

    println!("{:?}", x as *const i32 as usize);
    println!("{:?}", y as *const i32 as usize);
}
93938811677568
93938811677568
140732296324756
140732296324756

The actual difference between integers and strings is just that with integers, you normally deal with values, while with strings it’s a lot more common to deal with references. Especially since str (without the &) is a dynamically sized type, so you can’t even put one in a stack variable until https://github.com/rust-lang/rust/issues/48055 happens.


#5

Hi, All! My name is Thomas and the author of the blog post in question. Thank you so much for taking the time to help clear this up!

I want to update the post ASAP based on @vask brought up!

Two questions, in the following example, does hello default to &str, as opposed to str?

...
    let hello = "Hello, world!";'
...

If so, from what I now understand, we’re not copying the Hello, world!, the actual “value”, if you will, of Hello, World!, instead, we’re copying memory address that points to Hello, World!. i.e. the value is a memory address and Hello, World! is only being stored once?

Thank you so much for all of your help!


#6

Yes. String literals are &str (specifically, &'static str), not str or String.

EDIT: might as well “prove” that one too: https://play.rust-lang.org/?gist=afe2ed1bb282664b8ce5f3393e067c57&version=stable&mode=debug&edition=2015

fn main() {
    // These compile
    let hello = "Hello, world!";
    let hello1: &str = "Hello, world!";

    // None of these compile
    let hello2: str = "Hello, world!";
    let hello3: String = "Hello, world!";
    let hello4: &String = "Hello, world!";
}

You might find the exact compile errors interesting.

Correct.


#7

So interesting! Thank you, thank you! And thank you, @vask for bringing this up!

Last question, are &strs stored in the stack? Or does the memory address lead to Hello, World! stored somewhere on the heap?


#8

Any &str actually a “fat” pointer with an address and a length, roughly (*const u8, usize) internally.


#9

Incidentally, the fact that references implement Copy is an interesting design choice, and one obvious downside of it is that it leads to this sort of confusion. This is why some wrapper/reference-like types such as Cell deliberately do not implement Copy.

The reference/fat pointer is entirely on the stack. That’s why copying it around is a trivial memcpy and why it’s possible for Copy to be implemented for it at all. If it was on the heap, copying it would require allocating a new place in the heap, which rules out Copy.


#10

Ah! Yes, thank you! That makes sense! The fat pointer is on the stack, but where are the actual characters that make up the string stored?


#11

The actual “Hello, World!” data for a literal string will be in the program binary, like the .rodata section in ELF files. For an arbitrary &str though, you don’t know where the underlying memory came from. Whoever gave you that reference must know, and you only know the lifetime for how long it is valid.


#12

I’d say most of the confusion is specifically due to “strings” as people come from languages that don’t draw the distinction that Rust does. In addition, Rust has builtin syntax for string literals and you don’t see the &str for it.

If one was to instead use, say, Vec<i32> vs &[i32] I think the difference would be a bit more apparent, at least visually.


#13

So cool.

Thank you! I’m going to update those blog posts ASAP!


#14

Does an image like this better express what’s going on? When we Copy &str, are hello and hello1 grabbing onto the same fat pointer, or is the fat pointer copied and placed on the stack?


#15

The fat ptr is copied. hello and hello1 themselves have a different location (on the stack), but they’re the same value. It’s like having 2 ints on the stack with the same value - each has its own address but same value.


#16

The fat pointer is on the stack, so yeah that’s not quite right either.

So if I grok the visual metaphor you’re going with, the pointer and the length go inside the green shape, and the characters of the string go inside the purple shape.


#17

If you want to get fancy showing the effects of fat pointers, you could try slicing:

let hw = "Hello, World!"; // the stack value is a pointer and length, e.g. (0x1234f00, 13)
let hello = &hw[..5]; // same pointer, different length: (0x1234f00, 5)
let world = &hw[7..12]; // (0x1234f07, 5)

(Note that str indexing is on byte offsets, not Unicode codepoints, but here it’s just ASCII anyway.)


#18

Another playground that may or may not help show how much is in the &str (on the stack) versus how much is in the str (on the heap or read-only memory): https://play.rust-lang.org/?gist=b74869d847cb760faad1c8e74d38594a&version=stable&mode=debug&edition=2015

fn main() {
    let hello = "Hello, world!";
    let empty = "";
    let long = "The quick brown fox jumps over the lazy dog.";

    // size_of_val(&T) returns the size of T, so...
    
    // size_of_val(&str) returns the size of the str being referred to,
    // i.e. the underlying character buffer (typically in read-only memory)
    println!("{}", std::mem::size_of_val(hello));
    println!("{}", std::mem::size_of_val(empty));
    println!("{}", std::mem::size_of_val(long));

    // size_of_val(&&str) returns the size of the &str being referred to,
    // i.e. the fat pointer (pointer and length) on the stack.
    println!("{}", std::mem::size_of_val(&hello));
    println!("{}", std::mem::size_of_val(&empty));
    println!("{}", std::mem::size_of_val(&long));
}
13
0
44
16
16
16

#19

This is great!! Thank you!


#20

Does this image make sense?

The pointer to "Hello, World!" is stored on the stack. hello is stored on the stack with a fat pointer whose memory address points to the place in the stack that points to "Hello, World!". When we let hello1 = hello;, we copy the fat pointer (incl. the memory address), and pop that onto the stack as well.