String concatenation best practices/performance?

Hello everyone
This is often asked question if Google search is to be believed...

I have done some research on the topic of performance best practices in Rust for workloads where there is a lot of String/str concatenation is involved.

Some of the material I found appears to be relatively old, in Rust age terms, 3-5+ years ago.
Perhaps it is still valid?

Some of what I found looks interesting: here is one example

One thing for sure - there is a lot of ways of handling String concatenation in Rust which means there is a lot of chances for writing relatively poorly performing code that uses a lot of memory.

So now with Rust 1.50+ version in mind, can anyone offer some wisdom on how to code in Rust if you need to build and concatenate a lot of String variables - focus on performance and memory consumption.

A couple of use-cases I have are (1) processing very large text files - ASCII and UTF-8 and (2) building a lot of Strings to pass as parameters to calls to HTTPS/REST-API crates.
I am sure there are others too.

If I have missed a good resource to help me with this topic, please link as a reply here.

Many thanks for your wisdom

Meh. String concatenation ain't gonna be a problem if you are calling an HTTP API. The network round-trip will dominate the string building by a couple orders of magnitude.

The one thing people get wrong all the time in other languages is repeatedly adding strings using the + concatenation operator. This results in quadratic running time. Rust fixes this by taking ownership of the LHS and unconditionally reusing its buffer, so you can't accidentally achieve quadratic runtime unless you try really-really hard.

So what remains is at most a constant factor of about 5 (29 vs 164 ns for the fastest and slowest method, if I exlude cheating by transmutation). That's probably important if you are re-writing grep, and likely completely unimportant in web development.

By the way, there aren't many great surprises or deep insights in that benchmark. The results are pretty much what I would have expected:

  • The fully general formatting mechanism is slowest, along with the "unnecessarily allocate a vector and unnecessarily make owned copies" technique.
  • Providing a pre-allocated buffer and pushing onto it is fastest.
2 Likes

thanks,

I am removing HTTP API for now as a problem suspect.
Focus on the text files.

If I have a text-file based workloads (process a lot of very text large files) and (say) I need to do a lot of String concatenations as part of workload - do you suggest I simply use the "+" Plus operator to concatenate multiple strings or I am better off to use push_str() repeatedly ?

I am happy to write a prototype and benchmark it but I am curious what you think and if I have misunderstood your reply.

But you're not benchmarking any of them. It only tests concatenating few small strings which would be the least performance-critical case in this domain.

Tip: 10^6 ~ 10^9 or 2^20 ~ 2^30 are considered large enough number in many cases.

2 Likes

If you know the total amount of bytes your final
string will take up, then use String::with_capacity() and push_str(). If you don't know the final
size beforehand, and you can't guesstimate at least its order of magnitude, then the above is likely equivalent with repeatedly using +.

Some people will frown upon using + at all, since string concatenation is conceptually different from addition, and using + has historically resulted in bugs in other (primarily dynamically-typed) languages.

But then again, the best thing you can do is try both ways, and see if there is any difference.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.