Most convenient type for immutable references to string

Hi,

Rust has a developed type system to work with different type of string.

However to build new rust library from scratch that intensively works with references to immutable strings in thread safe way, what is most convenient type that suggested by rust standards for that?

Per my research I've found couple alternatives for that:

Arc<str>
Arc<String>
Cow<String>
Cow<str>

Cow is for when you want to mix references to pre-existing strings with new heap-allocated strings, e.g.

if random {
    return "literal";
} else {
    return format!("{}", value);
}

This doens't work without Cow, because literals and heap-allocated strings have a different type. It's not useful otherwise, because Cow holding a reference is as limited as &str, and Cow holding a String is as fat as String, so it's worst of both &str and String.

I'd recommend Arc<str>, because it can be cloned without copying the string.

There are also plenty of string interning libraries on crates.io. Use them if you deal with lots of short, repeated strings (e.g. identifiers, property names).

1 Like

Arc<str> is like a (pointer, length) tuple. Arc<String> is like a pointer to a (pointer, length, capacity) tuple. So if you have an Arc<String> you have to dereference it twice to get to the actual str data. Arc<str> only needs to be dereferenced once, so Arc<str> will generally be better for performance.

However, Arc<str> is actually larger because the size is part of the reference. If you need many references to the same string, you'll be spending 2 usizes for each Arc<str> as opposed to just one usize for each Arc<String>. I'd normally expect this to be better than having to double-dereference, but there will be use cases where you want the smaller reference.

Even cheaper than Arc<str> is just &str. If you don't need to manage the lifetimes of the strings yourself, just don't. This is the most general way to make an API that handles different types of strings and has the additional advantage that you can take cheap substrings of the same type. The drawback is that you can't pull one out of thin air; references have to come from somewhere. Whether and how you can deal with this will depend on what kind of library you're writing.

&'static str is super convenient because it's a &str that lasts forever, making it truly immutable. (Arc<str> is safe to mutate if you have the only reference.) You can safely leak a Box<str> to turn it into a &'static str. This is certainly not something you would want to do all the time, but it's an alternative to string interning when you don't care whether the interned strings will ever be deallocated (before program termination).

1 Like

I'd have to say the most convenient type for immutable references to immutable strings is:

internment::Intern<String>

using my internment crate. :slight_smile: I gives you fast comparisons for equality and fast hashing (both pointer comparisons without dereferencing the string itself). An Intern is copy as well as clone, sync and send, so you can treat it like you would a usize or something. It also leaks memory (which is how it gets all the above), so it's not so great if you have a whole lot of different strings you need to work with.

That said, it's not in the standard library, which makes it less convenient...

2 Likes