String + String Behaviour

fn main(){
    let x = String::from("o");
    let y = String::from("k");
    let z: String = x + &y;
    println!("{}", z);
}

Is it true that the reason my code required to borrow &y because of Rust's rules related to memory safety and efficiency?

Actually, that makes sense, but is there any other explanation? Thank you.

The + operator uses the implementation of the Add trait for the given type. In the case of String, you can see that it uses &str as its generic argument. This is why you had to use &y (which derefs to &str).

1 Like

Which is a bit silly; there could well be a consuming String + String overload too. I guess the reason there isn’t is that concatenating strings with + is somewhat frowned upon in Rust and the operator is there mostly because it’s familiar from other languages.

3 Likes

The problem is that that would consume the String, which is not what one would expect from such API.

3 Likes

Hmm, I’d certainly expect it to consume in Rust. Most of the use cases I can think of have the rhs be something ephemeral anyway, not meant to be reused.

2 Likes

There is no advantage to consuming it because it can't reuse the allocation. One String needs to be extended and the other one copied into it with both APIs.
With the current API it is clear which String is extended. When both are String you don't really know which one is extended and which one is dropped. That could be bad when you want to reserve space before adding multiple strings.

7 Likes

There is an advantage. For example if you're adding a 1-byte string to a 1000-byte string that already has an allocation for 1001 bytes, it may make more sense to shift the 1000-byte string and drop the small allocation.

Also &str + String could be implemented.

4 Likes

Oh right i didn't think of letting the function decide which allocation to put the result into.
Although there is the disadvantage that pushing to the front has to move the String that already exists in that allocation. I would guess that that is also the reason that &str + String doesn't exist.

1 Like

Not really a disadvantage because it also happens with String + &str in that scenario: bytes from both strings have to be moved (to a fresh third allocation) because the left allocation doesn't have enough space.

3 Likes

The advantage would be ergonomic, not getting an error and having to type the &. Yes, it’s a small thing, but still. I get the desire of not wanting to "waste" an allocation, but that happens with String + &String too in the (common) case where the rhs is not used for anything else. Just the fact that you have to add a sigil likely isn’t a useful pedagogical tool, either, it’s just a tiny papercut.

2 Likes

I think the point is that if you have a 1-byte, 1-cap string + a 1000 byte, 1001-cap string, you don't need to reallocate the right side, but you do need to move 1000 bytes to the right to fit the left string in. The existing String+&str impl is essentially equivalent to push_str(), and is therefore O(n) asymptomatically in the final string length if you're doing this in a loop, while any other operation is O(n²)

I know. But that's not a problem because you will have to move those 1000 bytes no matter what you do, so that's not an extra cost.

This makes no sense. The optimization I described is not asymptotically slower than adding String + &String.

1 Like

I think you're forgetting about string capacity? You only need to move if you're out of capacity when you keep the left string, while you always need to move when you keep the right string.

Yes I guess I didn't say this explicitly, but the idea behind the example was that the left string does not have enough capacity, the right string does. If the left string has sufficient capacity, use that.

So this is a situation where the inner loop has a tight left string and a reallocating right string everything is getting prepended to? That's O(n²) to continuously move the same characters every iteration, you're better off reversing every string, appending, then reversing again! (Better would be a Deque or other container intended for this sort of thing)

My example was a single operation: 1-byte string + 1000-byte string. There is no loop there.

I agree that in your example it's better to reverse strings.

In general any single operation is so fast on modern computers as to essentially not matter; you can only reasonably consider their cost as a factor in the processing of some large input, modeled as simply a loop of some arbitrarily large amount.

If you have an unusual situation where prepending to the right string is more efficient, then you can do so easily with right.insert_str(0, left); - it looking unusual and possibly expensive is a positive, because it is unusual and expensive!

String + String would not be more expensive than String + &String. It would be at least equally efficient, and in some cases more efficient.

Another example where it can be more efficient is when the left side is empty.

1 Like

No. It's just that too many people whine about using + on strings so it wasn't added, even though String+&str exists. IMHO string_a + string_b should just work, since forcing people to string_a + &string_b, doesn't make that better, and it's strictly worse sometimes since if string_b has enough capacity it could be better to pass over its ownership.

1 Like

If you were just talking about always appending and deallocating the right string, sure, but you're arguing for adding logic that checks if it can reuse the right allocation (which maybe by itself adds since branch misprediction, but that's small fry), which if it can is notably more expensive than being able to reuse the left allocation, despite looking the same. (Handling the left being empty seems viable on the other hand, it's fairly transparent to the optimizer even, but I doubt it would move the needle much)

Given that this only matters when there is capacity, and the only time there is likely available capacity is when either you're iterating the append/prepend or you're taking special care to precompute a capacity up front, you as the caller are nearly always going to be significantly better off with a different approach (assuming it matters), and even in the unusual cases where prepending does make sense performance-wise, it's better to not subtly depend on what's normally a footgun but isn't here, instead of the obviously different call to prepend_str()

To recap, the cases that matter for String+String are:

  • You're not processing enough data for this to matter
  • You're only appending to the same string
  • You're only prepending to the same string
  • You're both appending and prepending data to a string, or, equivalently (to a single operation), separately concatenating strings in a tree like fashion

For the first, who cares, the second, you're making the implementation very slightly worse, the third, you're significantly better off even with reversing the strings, and for the fourth (the case I think you're thinking of) you're already doing it terribly wrong for performance, and at worst would be better off with a proper rope structure, or the poor man's rope: Vec<String>

Again the thing that's important here for Rust is that a performance issue isn't a hidden footgun, and adding an extra order of n behind a single character, especially only sometimes (depending on where the left and right came from), really matters