`Arc<str>` is better than `String`? Was I using the wrong string type?

I've recently seen https://www.youtube.com/watch?v=A4cKi7PTJSs and I keep thinking about it. It seems 75% of my String use should have been Arc<str> all along. I think I remember seeing people thinking about these things in the past, but can't find it now, and life goes on, I just keep doing String because of the habit.

Am I missing something? Seems like if you're not going to be mutating, it's generally (not always, but generally) preferable to go with Arc<str>.

Would it be a good idea to have Arc<str> aliased/newtyped in the stdlib? Seems like the biggest problem with Arc<str> is that is not well known, doesn't have semantic name and is harder to type.

For the most part, it doesn't matter. String is a well-supported, easy to use type without the shenanigans of multiple ownership or needless atomicity. It's fine to just use String as the default owned string type.

7 Likes

The performance difference between String and Box<str> is tiny enough that it doesn't need more discoverability. Arc<str> would be the counterpart to Arc<String>, which should always replace it except maybe if you're unwrapping Arc<String>.

The standard library has methods for converting between String and Box<str> which is probably enough.

Nope. Without benchmarks it's really hard to say whether Arc<str>, String or compact_str is better.

Naïvely it feels as if Arc<str> should be able to easily beat other alternatives. And that was even true. Many, many, many years ago. When CPUs had one core and LOCK prefix was cheap.

But I know how Google jumped from gcc 2.95 to gcc 4.2 specifically to avoid Arc<str>. Well… what they wanted to avoid, actually, was gcc 3 provided std::string (and they switched to gcc 5 str::string which was called __gnu_cxx::__versa_string before gcc 5), but the fact remains: they saved quite a few million by using strings which used cheap small-string-optimization instead of expensive atomic refcounting.

Small-string-optimized strings win if you have lots of tiny strings which you copy around often and Arc<str> wins if your strings are long while String in neither here not there and makes you code a bit smaller, instead.

Strings are hard, there are no silver bullet.

10 Likes

Have you considered using Cow<'a, str>? It's the more lenient option if you prefer borrows but may need to mutate it as a String. It's also still Send + Sync if that is important from your use of Arc.

After I got some response I realized there's a lot of tradeoffs to consider.

One thing is small string optimization - it's probably always good to have, but requires bringing in custom implementation.

The other one aspect is memory use. Arc<str> will re-use the string, which if you have a lot of copies that you keep for a longish time, might be worthwhile, especially due to cache use.

Another is cost of cloning and dropping. Atomic counter vs extra allocation. My bet would be that Arc<str> will win anyway, but it's just a guess.

Performance wise I would think that "custom with small-string opt." is best, Arc<str> is going to be second, and String last.

You can't make an Arc<str> from a Display (format!) without extra copy due to Arc having two counters, and probably without extra allocation too, unless you're careful to format into an SSO or stack-based string type first.


I'm trying to use Box<str> wherever I don't need to grow the string (e.g Error fields), but it's a PITA to use, since Rust lacks things like PartialEq or .as_str() for &Box<str>, so it ends up with ugly syntax like &**s == "wtf".

String type with small-string-optimization is neat if you need to return small formatted strings.

But if you need to use a lot of strings, and copy them around, you probably want string interning. 4-byte Copy "strings" beat any other string type. There's also a special magic ustr for interning global identifiers and hashmap keys.

5 Likes

I have tried to use Box<[T]> in place of Vec<T> in similar contexts (when I don't need to change the length of the collection) and found myself hampered by the lack of IntoIterator<Item = T>.

BTW, Vec::from(box).into_iter() is zero-cost.

4 Likes

And String::into_boxed_str().leak() gives you a cheap Copy, easy ready to use &'static str!

Just don't use it in a loop...

Just FYI the video creator just chose to use Arc<[T]> throughout the video because it always works but specifically mentions that Rc<[T]> should be used whenever no Sending between threas is needed.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.