Using += for concatenation

I wrote this little routine:

n main() {

    let ext = vec!["cbl", "cpy", "rs", "py"];
    let mut pattern = String::from("\\.");

    if ext.len() == 1 {
        pattern = pattern + ext[0];
    } else {
        pattern = pattern + "(" + &ext.join("|") + ")";
    }

   pattern = pattern + "$";
   println!("{}", pattern);

}

cargo clippy nicely suggested:

7 |         pattern = pattern + ext[0];
  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace it with: `pattern += ext[0]`
  |

So I changed everything to

fn main() {

    let ext = vec!["cbl", "cpy", "rs", "py"];
    let mut pattern = String::from("\\.");

    if ext.len() == 1 {
        pattern += ext[0];
    } else {
        pattern += "(" + &ext.join("|") + ")";
    }

   pattern += "$";
   println!("{}", pattern);
}

Now the compiler goes

9 |         pattern += "(" + &ext.join("|") + ")";
  |                    --- ^ -------------- &String
  |                    |   |
  |                    |   `+` cannot be used to concatenate two `&str` strings
  |                    &str

That does not look right to me.

pattern = pattern + "(" not being the same as pattern += "(". I'm sure there is a very good reason for this. :slight_smile: Thanks.

I think you need to use format!() to concatenate strings.

let str_a = "abc";
let str_b = "def"'
let str_c = format!("{}{}", str_a, str_b); // abcdef

Thanks. Looking at the docs they say + is a perfectly valid way to concatenate strings.

There's no Add impl for adding strs together, only types which own a string, like String implement it.

If you concatenate a bunch of string slices together, you potentially need multiple intermediate allocations to store the results. By instead requiring that there has to be an owned strings type which already manages it's own allocation be the base type, we can just accumulate all of the string slices into that single allocation (growing it if needed)

4 Likes

It's a different order of operations, and Add and AddAssign are only implemented for String on the left side and &str on the right. So if we annotate types and add parenthesis, here's the comparison:

pattern = pattern + "(" + &ext.join("|") + ")";
// String = (((String + &str) + (&String as &str)) + &str);

pattern += "(" + &ext.join("|") + ")";
// String += ((&str + &String) + &str);
8 Likes

Personally I feel Clippy is too opinionated here.


Here's a version that avoids intermediate or all-new allocations.

        pattern += "(";
        pattern.extend(itertools::intersperse(ext, "|"));
        // or:
        // pattern.extend(itertools::intersperse(ext.iter().copied(), "|"));
        pattern += ")";
8 Likes

So to make it clear, your code doesn't do pattern += "(". It does

pattern += ((a + b) + c)

which is equivalent to

pattern = (pattern + ((a + b) + c))

with parentheses added to make the order of operations clear. This is different from pattern = pattern + a + b + c which, with added parentheses, is

pattern = (((pattern + a) + b) + c).

Now, it is very arguable that both orders should Just Work (tm). After all, string concatenation is associative. But Rust has two primary "stringy" types, and &str + x is not implemented, only String + x is.

IMO this is a wart in the language and makes easy things difficult in the name of protecting the user from accidentally writing slightly inefficient code. But Rust doesn't provide an ergonomic, efficient alternative; code that avoids those extra allocations looks clumsy. (Note that using the format macro for string concatenation is also nonoptimal!)

I think this is a case where convenience trumps performance concerns, and the extra Add impls should be added. A performance warning lint could be added to Clippy to let the user decide if they really want that extra bit of oomph.

5 Likes

You shouldn't use + for your string formatting/concatenation needs, ever. It was a bad idea to add it to the language in the first place. I guess the devs wanted to appeal to Python/Java/JS/C++ devs who used + for string concatenation. But by the time Rust released, it was no longer an idiomatic or recommended way even in those languages. Nowadays all of them support better APIs, something like Python's str.format() method, or f-strings (string interpolation), or std::format in C++.

If you need to concatenate strings, use the format! macro. Like this:

format!("{}{}{}", first, second, third)

format! is strictly more powerful, efficient and ergonomic than + operator. You don't need to bother with the String vs &String vs &str distinction, you can easily format non-string objects (if they implement Display or Debug), you can set formatting options such as pretty-printing or alignment. Your example would look something like this:

let pattern = if ext.len() == 1 {
    format!("\\.{}$", ext[0])
} else {
    format!("\\.({})$", ext.join("|"))
};

In the more rare case where you would already hold a String and want to reuse its heap allocation, you would use neither + nor format!, but rather methods like String::push_str.

4 Likes

Note that if you literally just need to concatenate existing strings, then go for

[first, second, third].concat()

(https://doc.rust-lang.org/std/primitive.slice.html#method.concat)

But absolutely use format! for more complicated or less regular cases where it's helpful.


Though, personally I don't think there's anything wrong with using Rust's String like you would a StringBuilder (Java Platform SE 7 ) or StringBuilder Class (System.Text) | Microsoft Learn and +=ing a bunch of stuff into it. Like a Vec, it's designed to support that well -- it's not like a Java String or C# string where repeated appending is an anti-pattern.

3 Likes

There is nothing inherently wrong with doing s += stuff, but neither is it better than s.push_str(stuff). It's slightly shorter, but it also encourages use of the + operator on strings, which is a bad idea.

Even for in-place appending, most of the time I would prefer to use the write! macro for the formatted buffer writing into a string. Like this:

write!(s, "{}, {}!", hell_0, w0r1d).ok();

The ok() at the end is, unfortunately , required. Otherwise you'll get a warning about unused Result, even though writing into a String is infallible. Alternatively, you can use the let _ = ... idiom.

3 Likes

I hope someday it will support reporting io::ErrorKind::OutOfMemory.

3 Likes

String isn't io::Write, it's fmt::Write, which deals in a zst fmt::Error. fmt::Write only supports writing UTF-8, where io::Write supports writing arbitrary bytes and deals in the richer io::Error. If fmt::Write reports a fmt::Error when writing to io::Write via write! and Display/Debug, it currently gets translated to (io::ErrorKind::Uncategorized, "formatter error").

It's not possible for fmt::Error to carry any information to distinguish between an OOM and some other error (e.g. a Display implementation failed to format an object), so any fmt::Write errors are almost certainly going to stay as io::ErrorKind::Uncategorized.

1 Like

Thanks. Now I understand why I get the compiler error.

Thanks. It is a bit clearer if I simplify the code.

fn main() {

    let mut pattern = String::from("z");
    pattern += "a" + "b";
    println!("{}", pattern);
}
4 |     pattern += "a" + "b";
  |                --- ^ --- &str
  |                |   |
  |                |   `+` cannot be used to concatenate two `&str` strings
  |                &str
  |
  = note: string concatenation requires an owned `String` on the left
help: create an owned `String` from a string reference
  |
4 |     pattern += "a".to_owned() + "b";
  |                   +++++++++

But there is still fun to be had if one follows the compiler recommendation.

fn main() {
    let mut pattern = String::from("z");
    pattern += "a".to_owned() + "b";
    println!("{}", pattern);
}

3 |     pattern += "a".to_owned() + "b";
  |                ^^^^^^^^^^^^^^^^^^^^
  |                |
  |                expected `&str`, found struct `String`
  |                help: consider borrowing here: `&("a".to_owned() + "b")`

Then if I fix that I end up going round in circles :).

OK This bit of code needs to just stay as:

     pattern = pattern + "(" + &ext.join("|") + ")";

I shall look into the other suggestions from the other replies to improve my code.

Thanks for all the replies.

BTW Clippy never told me to change the line which now gets an error to use += I just applied the recommendation to all my lines in that function. Silly me :slight_smile:

fn main() {
    let mut pattern = String::from("z");
    pattern = pattern + "a" + "b";
    println!("{}", pattern);
}

And I noticed Clippy does not recommend you change this to use += when there is more than one string slice.

3 Likes

Thank you. I like it. I don't appreciate the comment about unwanted allocations yet.

I've had a quick read of the docs for itertools and noticed a join() as well. Multiple ways to do things is always fun. Is one better than the other in any significant way?

vuse itertools::intersperse;
use itertools::join;

fn main() {

    let ext = vec!["cbl", "cpy", "rs", "py"];
    let mut pattern = String::from("\\.");

    if ext.len() == 1 {
        pattern = pattern + ext[0];
    } else {
        pattern += "(";
        // pattern.extend(intersperse(ext, "|"));
        pattern += &join(&ext, "|");
        pattern += ")";
    }

   pattern = pattern + "$";
   println!("{}", pattern);
}
        pattern += "(";
        pattern.push_str(&ext.join("|"));
        pattern += ")";

push() did not work out but push_str() looks good! :slight_smile: .

The itertools join is going to do more or less the same thing as the slice-based join. The slice-based one might not use the iterator trait as there's an inherent way to visit slice elements in order (but I didn't check). Both create a new String.


So here:

// new `String`            vvvvvvvvvvvvv
pattern = pattern + "(" + &ext.join("|") + ")";
// temp ref to `String`   ^^^^^^^^^^^^^^

You create a new String (storing the contents in the heap), only to more or less immediately append the contents into pattern's String. The String that join created then gets dropped (deallocated).

In this version:

pattern = format!("{pattern}({})", ext.join("|"))

ext.join creates a new String like before, and format! does too. You keep the version created by format!, discard the one created by ext.join, and also discard your original pattern.

In the version I suggested, you never create a new String. You just append a bunch of &str into your existing String.


Which is best? Well, it's based on your use case and habits. I instinctively gravitate towards the non-newly-allocating version and the notionally-always-allocating ones peeve me a little, due in large part to my particular programming history no doubt. But my version might have to grow the String (reallocate) too.

Additionally the compiler may be able to optimize away some of the supposed allocations in the examples above, and even if it doesn't, it just might not be a concern in your application. In that case, might as well go with whatever you find easiest to maintain (e.g. some in this thread clearly prefer format!).

So on the whole I wouldn't suggest going out of your way for any particular approach unless some sort of measurements indicate you need to make a change. Find a comfortable default and stick with that. If performance does become a concern, don't assume which is better, measure and see.

4 Likes

Thank you quinedot for your explanation. In my little app it is just code run at start up. But if it was in a loop or used more frequently I now see the benefit of using pattern.extend(intersperse(ext, "|")); in such a scenario.

To use this method also means a dependency on itertools. So I will keep things simpler for now. I've seen itertools been mentioned several times, so now I've read the docs I'm more aware and can check it out to see if it helps with something else I'm doing in the future.

There's the option of just writing format strings directly to the string: this generalizes well (ie. it always works), avoids most unnecessary allocations, and doesn't need any 3rd-party dependencies:

let mut pattern = String::with_capacity(64);

for ext in subpatterns {
    write!(pattern, "|{}", ext).unwrap();
}

You should probably prefer this approach because it's easy to remember, works in most cases, and has good performance (insofar as string comparison can have "good" performance – std's built-in formatting is certainly not the absolute most optimal way of writing things, but it's good enough in the overwhelming majority of programs).

2 Likes