What's the difference between &String and &str?

While we’re on the topic, I always hear the ‘resizable/heap vs fixed-size/stack’ explanations (and that’s helpful), but I’m more curious as to why the Rust didn’t just go with, say, String and give the implementation a small-string optimization to avoid unnecessary heap allocations for the typical use cases?

I have my own theory, but I’m wondering if anyone knows the real reason(s)?

See here https://internals.rust-lang.org/t/small-string-optimization-remove-as-mut-vec/1320

1 Like

I think of String as a specific allocation and grow policy for string data. Slice / owner separation is very powerful that way, though of course what we enjoy as power makes a hill that learners have to climb.

2 Likes

@bluss, yes, it’s the slice / owner separation pattern that the Programming Rust book I’m reading called out in &[T]/Vec, Path/PathBuf and one other place that escapes me at the moment. Anyway, that enabled me to see S/OS as a pattern employed throughout Rust, not just as a one-of (er, two-of) thing for &str/String and &[T]/Vec.

You’ve captured both the motivation and the cost very succinctly–that’s helpful, thank you!

this is helpful.what I am confused is ,what kind of struct should str(not &str) to be.thanks!

what kind of struct should str(not &str) to be

It would be a dynamically sized type looking something like this:

struct str {
  contents: [u8],
}

Notice how the contents contains a slice directly, not a reference to a slice. The nomicon explains this pretty well if you want to find out more.

Due to their lack of a statically known size, these types can only exist behind some kind of pointer. Any pointer to a DST consequently becomes a fat pointer consisting of the pointer and the information that “completes” them (more on this below).

It would be just the raw bytes of the string.

But because the length is held in the reference, not str itself, such type is mostly an unusable abstract concept in Rust.

1 Like

I’ve got a question related to this one: Why does a String slice have a special type &str, whereas a slice of a vec of integers doesn’t? (There the slice type is &[i32]).

What are you looking for? Why doesn’t &[T] fit the bill?

[T] is as special as you can get. [T] and str are both primitive, unsized types you can’t construct. I mean, str is basically just [u8] with a UTF8 invariant.

And one time, long, long ago, Vec<T> and String were known as ~[T] and ~str.

If you think of str and [] as the fundamental types, their owned forms are both different than these, namely String and Vec, respectively. This seems consistent, unless I am misunderstanding your question?

Because [u8] doesn’t have to be valid UTF-8, str does.

Thanks for your responses. I should’ve clarified further. Here’s more context:

When I write:

let my_string = String::from("Hello");
let my_sliced_string = my_string[0..3];

the type for my_sliced_string is &str (and my_string is String).

However, when I write:

let my_integers: [u32] = vec!(1,2,3,4,5);
let my_sliced_integers = my_integers[0..3];

the type for my_sliced_integers is &[u32] (and my_integers is [u32]).

Similarly if I slice any Vec of type [T], the slice’s type is &[T]. So why does String get the special treatment? Why does a String's slice get its own alias &str?


I read through some of the discussion on the ‘remove as_mut_vec’ thread that Steve linked to but I don’t see any suggestion in that thread that &str exists because Rust needs string literals and somehow implements Short String Optimisations?

The best answer I have to my question–as Daniel pointed out–is slicing a String needs to result in a valid ‘sub-string’. Examples in the Rust book show that slicing a string can result in a run-time panic.
Whereas slicing an array/Vec of elements of type T (integers in my example) won’t ever panic (ignoring out-of-bounds indices).

So the special slice type for String makes sense since Rust is trying to distinguish between the two types and convey safety?

None of that is true. Aside from the very first line, none of that code compiles.

First of all, [T] isn’t a Vec, it’s a [T]. The type of vec!(1,2,3,4,5) is Vec<i32>. You can’t have a value of type [T]; it always has to be behind an indirection of some kind.

Let’s step back.

Rust wants to be able to efficiently take and pass around slices of arrays. A good way of doing this is to pass around bundles of (&T, usize): a pointer to the first element, and the number of elements. You can trivially create one of these from any contiguous storage, no matter where it comes from. For example, &[T] is a borrowed slice of some other storage. This raised the question of “if &X is a pointer to X, then what does [T] by itself mean?”

[T] then is the type of “some number of Ts stored contiguously”. It says nothing about where they’re stored, who owns them, how they’re managed, etc. Just that some number (including zero) of T exists somewhere. You want [T] because you can build on it to create other, more useful types that do have something to say about where/how those Ts are stored. &[T] says they’re borrowed and owned by someone else (possibly in the static data segment as a literal). Box<[T]> says you own them and have exclusive access to them. Rc<[T]> says ownership is shared. [T; 5] is exactly five owned Ts.

The reason Vec<T> exists is because a really common thing is to build up arrays element-by-element. It’s basically Box<[T]>, but it keeps some extra space around to make appending to it more efficient. You can slice it to get &[T] because you can slice everything that’s just contiguous storage to get &[T].

String and str is exactly the same thing except for the whole “must be valid UTF-8” thing.

So [T]/str are fundamental building blocks. Vec<T> and String are just things built on top of them, because [T] and str don’t do much by themselves.

7 Likes

You seem to be wondering about the Index operator. Like other operators it is generic so usage can be overloaded, it also supplies an associated type Output. Each structure supplies it’s own implementation.

An important point that Daniel brings up: contiguous memory.

Slices and Vecs are homogenous: every element in them is the same type, and the same size.

String/str are different, because they use UTF8 encoding; not all characters in UTF-8 are equally big; the basic ASCII is just one byte, but everything else is between two to four bytes: a variable length encoding.

You could also have a Vec<Char>, but that would always use 4 bytes per character, which, depending on your locale/language, is between 2x to 4x too big (utf8 is quite brilliant that way)

The “different” types are needed to deal with both UTF8 checking, and the variable item length.

P.S. related reading:

That’s pretty much it. String owns the underlying buffer and is valid UTF-8 encoding; &str is reference (i.e. non-owning) and is also a valid UTF-8 encoding. The only difference is ownership, and the things it implies (e.g. String can be grown).

As it happens, a String internally is a Vec<u8>. But via its API, it maintains the invariant that it’s always valid UTF-8, despite internal storage just being a bunch of bytes. When you slice a String to get a &str, String will panic if the range would end up producing an invalid UTF-8 sequence - this is important to maintain the invariant that if you have a &str, you know it’s valid UTF-8.

A Vec<T> is just a bunch of Ts, which also happens to allow dynamically changing the number of them. But each T is its own thing - there’s no added semantic between the different Ts - a range of them doesn’t need to maintain any invariant (beyond being inbounds of the Vec). So slicing to yield &[T] is perfectly fine.

This is why there’s a str type.

2 Likes

To reiterate what @DanielKeep said more succinctly:

The way you’re seeing it:

  • a &str is a slice of a String

The reality:

  • str and [T] are fundamental parts of the very language itself.
  • Vec<T> and String are plain old library types that anyone could define:
    • a Vec<T> is a resizable, heap-allocated [T].
    • a String is a resizable, heap-allocated str.
8 Likes

Thank you all so much for explaining this (and for the links). I understand it better now.

@ExpHP Yes I suppose that was the confusion I had. Also, when you refer to [T] being a fundamental part of the language what does [T] stand for? I’m guessing not an array of type T, since arrays have fixed sizes?

[T] means a “slice” of Ts. See https://doc.rust-lang.org/book/second-edition/ch04-03-slices.html. It’s basically an array of Ts whose length is only known at runtime, i.e. it’s a “dynamically sized type”, so you can generally only use it behind a reference &[T]. The main reason it’s a fundamental part of the language is that Rust understands that &[T] needs to be a “fat pointer” that contains not only a pointer to some Ts but also the number of Ts being pointed at.

1 Like

Best explanation I have seen yet (as a beginner).