Trying to understand again: string copy

So I am working through the Rust book I found, and now I try to understand ownership. The example used is with a string:

    let s1 = String::from("hello");
    let s2 = s1;

    println!("{}, world!", s1);

I do understand this gives an error at compile time, because s1 was no longer valid after the declaration of s2. I do understand why this makes the code memory-safe.

But what I am trying to understand is how this would be useful in a real life situation. If s1 goes out of scope as soon as s2 is declared, why would one want to declare s2 and not just work on with s1? Why not always use clone if you want to copy, and not use the move option at all?

(I am NOT trying to be the wise guy here, I truly try to understand the real life usefulness of this)

Obviously, this is example code and not real-life code. IRL you would mostly give up ownership by passing a value to a function, not directly like this.

Again, assigning to another variable is not the only way to move. If an API needs an owned value, then there has to be a way to give away ownership, otherwise we would end up copying everything unnecessarily.

1 Like

The particular example isn't useful on it's own, it's just demonstrating that once you move something that's not Copy without replacing it, the moved-from place is uninitialized / unusable. They're trying to get you to understand and register that axiom, not demonstrating code you're likely to write.

A perhaps more real example of where you might run into this is if you pass s1 to some function without it clicking that you're passing by value.

use std::fmt::Display;

fn some_far_away_function<T: Display>(t: T) {
    println!("{t}");
}

fn your_program(s1: String) {
    some_far_away_function(s1);
    // ...

    // Oops, I've given `s1` away
    println!("{s1}");
}

A possible fix:

-    some_far_away_function(s1);
+    // `&String` a(nd `&str`) also implement `Display`
+    some_far_away_function(&s1);
+    //                     ^
     // ...

-    // Oops, I've given `s1` away
    println!("{s1}");

More generally, you're going to run into situations where you need to move something in order to pass it to a function or capture it in a closure or whatever, but doing so conflicts with some other part of your code. Every time you're passing or assigning something non-Copy by value, you're moving it.

Here's a look at a more involved situation, where you only have a &mut Container but the API demands you pass an owned field. You can't move the field out:

The problem here is that we need to take ownership of self.chain, but you can only take ownership of things that you own. In this case, we only have /borrowed/ access to self, because add_link is declared as &mut self.

To put this as an analogy, it is as if you had borrowed a really nifty Lego building that your friend made so you could admire it. Then, later, you are building your own Lego thing and you realize you would like to take some of the pieces from their building and put them into yours. But you can’t do that – those pieces belong to your friend, not you, and that would leave a hole in their building.

You need to understand the axiom of "moved it => old place is uninitialized" to understand why this leaves a hole in their building.

(The technical or academic terms would be "move semantics" and "affine types".)

3 Likes

Take a look at Arc and Mutex in Rust vs C++. Or enable_shared_from_this in C++. There are a lot of pitfalls in C++ because you cannot enforce that your data is accessed from the appropriate smart pointer or when a lock is held. The Rust API is much cleaner and safer because it can enforce ownership through moves.

1 Like

Obviously, that is why I asked :slight_smile:

I do understand the other uses you mention, I think. So if I understand you correctly, there is no obvious real life scenario for this example, this is just instructional to understand what is happening. Thanks!

Thanks, I think I understand this.

I will need some time, I think, to forget the way I understand pointers and variables in C (from an assembly perspective they are much more intuitive to understand, than from a “higher language” position, I think) as the Rust way looks a lot like the C way, but the mental image is a bit different.

Anyhow, thanks for helping me understand, very much appreciated!

The reason I skipped C++ and moved from C to Rust is exactly this, I don’t want to learn an unclean language :wink:

It's expanded from what C did but in very natural way. You would have need some more adjustments if you used C++, but way from Assembler to C to Rust is quite straighforward.

Let's look on this C example first:

char* print_string_and_return(char* s) {
    puts(s);
    return s;
}

This function works perfectly for both owned and not owned strings:

int main() {
    print_string_and_return("Hello, not owned string!");
    free(print_string_and_return(strdup("Hello, this is owned string!")));
}

But what if we don't want to return that string, but, instead, just receive it?

Now, suddenly, we have two different functions:

void print_unowned_string(char* s) {
    puts(s);
}
void print_owned_string_and_free_it(char* s) {
    puts(s);
    free(s);
}

This works perfectly as long as we don't forget which function is which:

int main() {
    print_unowned_string("Hello, not owned string!");
    print_owned_string_and_free_it(strdup("Hello, this is owned string!"));
}

But these functions have the exact same prototypes! What would happen if we would mix them? Program crashes, of course:

int main() {
    print_owned_string_and_free_it("Hello, not owned string!");
    print_unowned_string(strdup("Hello, this is owned string!"));
}

Enter Rust. Here we couldn't just write one function. We have to write two functions from the very beginning:

pub fn print_unowned_string_and_return(s: &str) -> &str {
    print_str(s);
    s
}

pub fn print_owned_string_and_return(s: Box<str>) -> Box<str> {
    print_str(&*s);
    s
}

It works and if you would look on assembler, you would see that both functions are byte-to-byte identical:

pub fn main() {
    print_unowned_string_and_return("Hello, not owned string!");
    drop(print_owned_string_and_return(Box::from("Hello, this is owned string!")));
}

What's the point then? The point is that if you move deallocation in the function itself, then there would be a difference:

pub fn print_unowned_string(s: &str) {
    print_str(s);
}

pub fn print_owned_string_and_free_it(s: Box<str>) {
    print_str(&*s);
    drop(s);
}

pub fn main() {
    print_unowned_string("Hello, not owned string!");
    print_owned_string_and_free_it(Box::from("Hello, this is owned string!"));
}

If you would try to mix them up… compile-time error:

pub fn main() {
    print_owned_string_and_free_it("Hello, not owned string!");
    print_unowned_string(Box::from("Hello, this is owned string!"));
}

But more importantly, if you try to swap deallocation and print… doesn't compile, too:

pub fn print_owned_string_and_free_it(s: Box<str>) {
    drop(s);
    print_str(&*s);
}

And that's the whole point. In Assembler you had constant confusion: pointers and integers are one and the same and the only way not to mix them up are comments which explain which register contains address, which register contains integer and which register contains garbage.

C splits pointers and integers and now there are less confusion. Rust splits pointers into more groups: now there are owning pointer (Box), not-owning pointers with the ability to mutate (&mut) and non-owning pointers without ability to mutate.

More: when you have owning pointer compiler keeps track of it's state: owning pointer can be valid (in the beginning of print_owned_string_and_free_it) and invalid (after call to drop).

You are doing all that tracking in your head when you work with C, Rust haven't invented ownership, it was already a thing in C.

Only in C it was in comments for functions, while in Rust it's part of the language.

P.S. It's not all the roses and sunshine, of course. Compiler is not human, you couldn't write complicated ownership explanations for it (this pointer is owning if it's address is even and non-owning is it's odd e.g.), and there are more complications, but in general the idea is to make compiler (and not human) responsible for keeping track of when pointers are valid and usable and when they are invalid and unusable. C++ tried to do the same, but, well… it failed. And in a pretty spectacular fashion if you'll ask me.

6 Likes

Thanks, this helps my mind accepting what I am doing :slight_smile:

One of the reasons I turned from C to Rust and not to C++ was someone told me about the willingness of Rust-people to help. I am not disappointed. Really, thank you for your effort!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.