I've recently seen https://www.youtube.com/watch?v=A4cKi7PTJSs and I keep thinking about it. It seems 75% of my String use should have been Arc<str> all along. I think I remember seeing people thinking about these things in the past, but can't find it now, and life goes on, I just keep doing String because of the habit.
Am I missing something? Seems like if you're not going to be mutating, it's generally (not always, but generally) preferable to go with Arc<str>.
Would it be a good idea to have Arc<str> aliased/newtyped in the stdlib? Seems like the biggest problem with Arc<str> is that is not well known, doesn't have semantic name and is harder to type.
For the most part, it doesn't matter. String is a well-supported, easy to use type without the shenanigans of multiple ownership or needless atomicity. It's fine to just use String as the default owned string type.
The performance difference between String and Box<str> is tiny enough that it doesn't need more discoverability. Arc<str> would be the counterpart to Arc<String>, which should always replace it except maybe if you're unwrapping Arc<String>.
The standard library has methods for converting between String and Box<str> which is probably enough.
Nope. Without benchmarks it's really hard to say whether Arc<str>, String or compact_str is better.
Naïvely it feels as if Arc<str> should be able to easily beat other alternatives. And that was even true. Many, many, many years ago. When CPUs had one core and LOCK prefix was cheap.
But I know how Google jumped from gcc 2.95 to gcc 4.2 specifically to avoid Arc<str>. Well… what they wanted to avoid, actually, was gcc 3 provided std::string (and they switched to gcc 5 str::string which was called __gnu_cxx::__versa_string before gcc 5), but the fact remains: they saved quite a few million by using strings which used cheap small-string-optimization instead of expensive atomic refcounting.
Small-string-optimized strings win if you have lots of tiny strings which you copy around often and Arc<str> wins if your strings are long while String in neither here not there and makes you code a bit smaller, instead.
Have you considered using Cow<'a, str>? It's the more lenient option if you prefer borrows but may need to mutate it as a String. It's also still Send + Sync if that is important from your use of Arc.
After I got some response I realized there's a lot of tradeoffs to consider.
One thing is small string optimization - it's probably always good to have, but requires bringing in custom implementation.
The other one aspect is memory use. Arc<str> will re-use the string, which if you have a lot of copies that you keep for a longish time, might be worthwhile, especially due to cache use.
Another is cost of cloning and dropping. Atomic counter vs extra allocation. My bet would be that Arc<str> will win anyway, but it's just a guess.
Performance wise I would think that "custom with small-string opt." is best, Arc<str> is going to be second, and String last.
You can't make an Arc<str> from a Display (format!) without extra copy due to Arc having two counters, and probably without extra allocation too, unless you're careful to format into an SSO or stack-based string type first.
I'm trying to use Box<str> wherever I don't need to grow the string (e.g Error fields), but it's a PITA to use, since Rust lacks things like PartialEq or .as_str() for &Box<str>, so it ends up with ugly syntax like &**s == "wtf".
String type with small-string-optimization is neat if you need to return small formatted strings.
But if you need to use a lot of strings, and copy them around, you probably want string interning. 4-byte Copy "strings" beat any other string type. There's also a special magic ustr for interning global identifiers and hashmap keys.
I have tried to use Box<[T]> in place of Vec<T> in similar contexts (when I don't need to change the length of the collection) and found myself hampered by the lack of IntoIterator<Item = T>.
Just FYI the video creator just chose to use Arc<[T]> throughout the video because it always works but specifically mentions that Rc<[T]> should be used whenever no Sending between threas is needed.