String vs &str - why?

Looking to understand what are the advantages of having 2 types?

  • Is &str is always stored on the stack?
  • If conversion from one to another moves the data from the heap to stack and vise-viz?
  • What are the different use case which only one can do and another can not?
  • I believe I have started to understand a little bit of ownership and borrow concepts, but having these 2 string types has nothing to do with it. - please correct me.

I believe most of us who all are new to Rust would quit as soon as they encounter a string concatenation scenario this is so confusing for me being a seasoned developer and hence I need your help to understand the logic/reason behind it.

In most of the places, I found an explanation on how it works and how to use it, but no one telling why it's made like this?

Thanks in advance

  1. The reference part & will be on the stack, the actual string view str it points to might be to anywhere, stack, heap, .data segment.
  2. Conversion to String allocates on the heap.
  3. &str is an immutable reference, maybe think of String as the StringBuilder you might be used to. Check their respective doc to see what you can do with them.
  4. &str very often is a borrow of a String.

Rust (standard library) has 6 string types, they exist because of many reasons too long to list, imo the major reason is to interface safely with all the string badness out there. I also think strings is a major explanation for the famed learning curve of Rust, you actually have to learn why strings have inherited badness, strings are complex but they have been treated as dumb containers for close to 50 years by languages and platforms, and Rust tries to not do that.

This article is generally seen as a good introduction to Rust viewpoint on strings :

6 Likes

I am new to rust and I faced the same questions and issues. I think developers new to rust would benefit if the community and documentation team could focus a bit more on developers coming from languages that include memory management. Strings are so simple in other languages and relatively complicated in rust. Just because I haven't thought much about memory management in a decade or two does not mean that I am not qualified to program with rust (I appreciate talking to the CPU almost directly, but unused memory is wasted memory...).

I would definitely spend some time reading about str slices and String. I would spend some time writing simple practice programs that do common operations on str and String. I have found that it is often easer to use String, but str has performance advantages (stack/heap, like you wrote), though for many applications, some of those advantages look almost theoretical. Once you get the difference between str and String, other ownership and even lifetime topics may make more sense.

My understandings, which may be wrong:

  1. I thought always, but I guess only typically.

  2. The data does not get moved, but copied.

  3. Use str whenever you can.

  4. Strings are owned, str are not. So this difference is related to ownership.

Once you've figured something out, it's hard to remember all of the stumbling blocks. Since I have already addressed some of my issues with strings, ownership, and lifetime (though I still don't really get it all and it feels like the compiler could be smarter sometimes), it might help if you could provide some specific scenarios and the community could provide tips.

The only other common languages that I consider are comparable to Rust, being performant, compiled code suitable for "systems" programming from bare-metal embedded controllers to large scale application development are C, C++, Ada, and Pascal. Support for strings in these languages is very poor.

C has no string type at all. It has char arrays and pointers to char and a bunch of badly thought out library functions to support a simple notion of strings. Which the programmer has to support by following convention.

C++ does a bit better. It supports C style strings if you like and it has a String type. But support for Unicode is miserable. Not that C++ also has String views, which kind of sort of do what &str does in Rust but without the lifetime safety.

Not sure about Ada and Pascal now a days, having not used them for many years. But they are hardly in common use.

All in all I would suggest that when it comes to proper string handling including Unicode Rust makes rather simpler job of it than you would find in other languages.

Yes, Rust string mechanisms are more complex and take some getting used to but strings are a much more complicated thing than most people realise.

2 Likes

String is roughly equivalent to Vec<u8>, str is roughly equivalent to [u8], except String and str both enforce that the sequence of bytes is valid UTF-8 encoding. You may want to first understand regular slices like [i32] and vectors like Vec<i32> before understanding strings.

str is "string slice", a sequence of bytes that is valid UTF-8. This type is of unknown size, so you can't store it directly in a variable. You can store a reference to a str, &str, which is a "fat pointer" storing a pointer and a length.

A String is a handle to a sequence of UTF-8 bytes on the heap, like Vec<i32> is a handle to a sequence of integers on the heap. So, you can think of a String as manager of a str allocated on the heap that will reallocate and free the memory as necessary.

A &str can point to something owned by a String, or it can point to a sequence of bytes not owned by a String. It can point anywhere: to static memory, to the stack, or to the heap.

If have a string in static memory, you can use a &str to point to it:

let a: &str = "abc";

If you want to hold dynamically allocated memory on the heap that stores a string, String is a good way to manage that allocation.

1 Like

In Rust, str is the most basic type for strings, but it’s a bit complicated because str itself is just the string data itself without any indirection. (A concept called an “unsized” type.) Comparing with e.g. Java, Rust’s type called String is actually most similar to something like Java’s StringBuilder. A growable object that contains some string data and that can have some additional unused capacity it can grow into without re-allocating its internal buffer. This means that you can append stuff to it rather efficiently. The str type on the other hand can be used flexibly with all kinds of pointer types, e.g. Box<str> or Arc<str> or &str. Java doesn’t have this intricacy about ownership and memory management, I suppose judging by the garbage-collected immutable nature of Java’s string, they’re indeed most similar to Arc<str>, but many use cases of java’s string type – namely use cases that fit the idiom of “borrowing” – would be using &str in Rust.

Some rules: Owned strings are String or Box<str> or Arc<str>/Rc<str>. The last ones are shared ownership (with reference counting) and thus immutable, Box<str> is owned and can be mutated, but since it’s not a growable buffer with extra capacity you cannot change it’s length without having to copy all the data every time. This is why String is most commonly used for owned strings. There’s also sometimes situations where you’re pretty much only working with string literals, so it’s useful to know the type of those. The type of string literals is &'static str (but not every &'static str is a string literal), so if you’re planning on only storing string literals in some structs or whereever, working with &'static str can be a good and efficient choice.

If you’re working only or mostly with String, then you’ll sometimes want to – as you do with any other type – pass the string as a borrow, e.g. use the &String type. Using this type is an antipattern because it can be cheaply converted to &str. The type &str has one less level of indirecition and if you use it in your function arguments, they cannot only be called by borrowing a String, but also by borrowing a Box<str> or Arc<str> or using a string literal or re-borrowing a &str you got from somewhere else, etc. So in a sense &str is “more general” than &String which is why you should prefer it if possible. Noteworthy is that the conversion from &String to &str can happen implicitly. If you have some variable x: String and a function fn f(arg: &str) you can just call f(&x) and it works fine! Regarding the conversion cost, the internal buffer in the String type already contains properly encoded data corresponding to the str type; internally a String is a triple consisting of a pointer to the buffer, an integer for the length of the str data in that buffer and another integer indicating the total capacity of the buffer which can be larger than the length. A reference/borrow of type &str is just a pointer and some length information, so all the conversion does is temporarily forgetting about the unused capacity. It’s kind-of zero cost if you will, just calling it “cheap” makes it seem more expensive than it is.

Mutable borrows, so &mut String are common though. Note the difference to &mut str: You can grow the string buffer through a &mut String reference, but the size of a &mut str can’t be changed… well you can split it into multiuple parts and in a sense make it smaller but you can’t grow it in place.

As others already mentioned the situation is very much related to Vec<T> and slices. Vec<T> corresponds to String and [T] corresponds to str, so behind an indirection, it’s types like &[T], Box<[T]>, Arc<[T]>, etc. &mut Vec<T> is growable, &Vec<T> is an antipattern, etc…

In fact String is just a wrapper around Vec<u8> and str is a wrapper around [u8] but both of them make sure to always contain valid UTF-8 data.


TL;DR Comparing to Java: Rust String is like Java’s StringBuilder, Java’s String is like Arc<str> in Rust if you’re talking about strings that can be garbage-collected at runtime, or like &'static str if you’re talking about string literals (or other strings that will be valid for the duration of the whole program run), or they are like &str if you’re in a setting that fits borrowing in Rust so that you don’t need any garbage collection at all. So the reason for the distinction of String vs anything-involving str is very similar to the distinction of StringBuilder and String in Java, and the different versions of types involving str are due to Rust’s memory management.

Unlike Java where calling a function expecting String means you’ll eventually have to convert your StringBuilder into a String (which has a cost of copying the data), in Rust, if the function expects &str, you can leave your data in the buffer it was built in and just pass a pointer/reference into it. A nice win.


Side-note: Taking garbage-collection more seriously, StringBuilder would be Rc<RefCell<String>> and StringBuffer would be more like Arc<Mutex<String>> or maybe Arc<RwLock<String>>.


Edit: Another thought on Box<str>: A Box<str> can be cheaply converted into a String. Converting the other way is only cheap if the String had no unused extra capacity. In this sense a function expecting String can be called with a Box<str> argument by converting it first (though this has to be an explicit conversion, e.g. using .into()), so that using String for such argument types is the “more general” and thus the preferable type. Really, the only disadvantage of using String over Box<str> is that you might end up having a string with lots of unused extra capacity. But if that ever is a problem, you can also modify the String in-place with something like .shrink_to(capacity) or .shrink_to_fit().

While &String is an antipattern, Arc<String> is something you might encounter. Using it saves the need for paying the conversion cost of String to Arc<str> which is – like with Box – not “cheap” if the String had unused excess capacity.

While I’m at it throwing around string types I’ll also mention that Cow<'_, str> is a thing, but I’m not going to explain what it does in this comment.

8 Likes

It's the contrary. str vs String has only to do with ownership and borrowing. String is an owning type: it owns the underlying byte array and destroys it when the String object goes out of scope. Meanwhile, a str doesn't own anything; it's just a pointer to a range within another string.

It doesn't matter where the pointer inside a &str points: it can be the heap, the stack, the data segment (in the case of string literals), the heap of an external memory allocator (when you obtain a string through FFI), or really, anywhere.

String does have its buffer on the Rust heap, but this definitely doesn't mean you can only create str slices that point to the heap.

3 Likes

This will be easier to understand if you think in terms of Vecs and slices. People tend to have trouble as soon as you mention strings and strs.

Maybe this is because of many languages that essentially treat strings like primitives?

Note that even in GC'd languages it's very common to have two types. For example, C# has String and StringBuilder. The former cannot be modified or extended (like Rust's &str) but the latter can (like Rust's String).

Their use communicates two different things.

4 Likes

I disagree. If this was only about ownership and borrowing, we’d be comparing &str with Box<str>. Or String with &String.

Box<str> and String are equally valid owned string types (note that <str as ToOwned>::Owned = String); the difference between them is whether the buffer is growable. Meanwhile, &String is not purely a borrowed type and is strictly less powerful than &str.

What does “purely a borrowed type” mean?


Are you perhaps writing str when you’re actually talking about &str?


In my view this only reflects that in Rust String and &str are the most common types for strings, in particular Box<str> and &String are both often avoided (with good reason and the latter even more than the former). Thus, String is the most common/idiomatic/basic owned string type and &str is the most common/idiomatic/basic borrowed string type. This doesn’t mean that ownership is the only difference between the two. But ownership is the most significant difference between them. The ToOwned trait then reflects this by making it easy to use these two most idiomatic types as a sort of “borrowed type”-”owned type“ pair.

In the case of &String, the object behind the reference is an owned String; this means that if you want a &String, you have to make an owning allocation at some point. There's also double indirection involved, of which you can strip off the outer layer, because a String‘ can exist in itself; this is not the case with &str. And since you can create a reference to basically any type, calling Ta "borrowed" type just because one can create&T` is not helpful in my opinion.

Another pair of types like String/&str or Vec<T>/&[T] is &Path and PathBuf. Which has pretty good naming IMO; if I were to re-design Rust today, I’d advocate for renaming &str into &String and naming what’s currently called String into StringBuf. Also while we’re at it Vec would be renamed, too, because it has absolutely nothing to do with vectors, “vectors” should stay part of linear algebra. Maybe SliceBuf<T>? Or maybe do a Buf<…> type family (Haskell terminology), i.e. something like: have a trait

trait Bufferable {
    type Buffered
}

and a

type Buf<T> = <T as Bufferable>::Buffered;

in order to be able to write Buf<String> instead of StringBuf and Buf<Path> instead of PathBuf, and Buf<[T]> instead of SliceBuf<T>, i.e. today’s Vec<T>.

2 Likes

I like that Rust’s concepts of ownership and borrowing work independent of questions about allocations and stack vs heap. Which is why I don’t like your terminology, talking about “owning types” and “borrowed types” and apparently also stuff inbetween. I think these terms are hard or impossible to define properly and they can be confusing. AFAICT, you’re talking about whether owning a value of some type implies owning a heap allocation when you’re calling String an “owning type” and you’re talking about the fact that you cannot create a value of type &String without having some part of you program previously create some heap allocation before when you’re calling &String “not purely a borrowed type”. Read the side note below!

Fortunately in Rust, it very often doesn’t matter whether you’re dealing with Box<T> or T; even though their behavior typically differes w.r.t. heap allocations, their behavior w.r.t Rust’s “ownership and borrowing” model as far as I understand it is the same.

Ownership is about values. You can own a value of any type T, and you can let someone borrow that value by creating a reference to it (of type &T) and passing them either that reference or any value containing that reference. You can sometimes split or refine borrows e.g. by creating references to fields or subslices. Owning a value means that you can create shared and unique borrows to it and use API that requires ownership of the value. Borrowing a value means that you can re-borrow it.

It’s always important what the subject is when talking about ownership. I can own a &T which – because it’s a shared reference, means that I’m borrowing a T. I can own a Vec<&T> which means I could be borrowing zero, one, or many T’s, and if you’re interested about performance, allocations, heap vs stack, then – yes– owning a Vec<&T> also means that I’m holding onto the memory in which the &T values live in.


Site note on this point:

you’re talking about the fact that you cannot create a value of type &String without having some part of you program previously create some heap allocation before when you’re calling &String “not purely a borrowed type”

or as you put it,

is not even true. An empty String doesn’t involve any allocations at all, in fact String::new() is a const fn.

fn give_me_string_reference<'a>() -> &'a String {
    static EMPTY: String = String::new();
    &EMPTY
}

Intersting.

Apparently Alex Stepanov has regrets about naming Vector in the C++ Standard Template Library.

I get the point but I don't think he should fret so much. Even in mathematics common symbols take on different meanings in different contexts.

When it comes to Rust we are in the clear. In Rust there is no vector. There is Vec. A totally different thing to a linear algebra vector'. One is free to define Vector` however one likes.

Haha, I’ve seen that point before, it doesn’t work, read the docs carefully:

Struct std::vec::Vec

A contiguous growable array type, written as Vec<T> and pronounced ‘vector’.


With the same logic, Rust doesn’t have any of these either

  • functions
  • modules
  • trait implementations
  • public items or fields

because really it’s only

  • fns
  • mods
  • trait impls
  • pub items or fields

If there was no type named String, we would all be pronounding str as “string” as well. Really, this kind-of breaks the pattern of Rust using abbreviations to save typing and screen space without meaning something different. Well actually, I guess while str itself is already fully contained in the word “string”, you’re supposed to pronounce it “string slice”, so it’s not the same (^_^). But don’t you dare calling &str a “string slice”, too!

3 Likes

You've just pointed out that Vec means "contiguous growable array type". That's what it means. Doesn't matter how you pronounce it. It's a vector by weak analogy -- but so are most programming terms. Did you ever navigate a sailboat by looking at a hash map?

1 Like

Dang. It was good try :slight_smile:

But wait. What if the Rust documentation were translated into some other language. Klingon say. Then that might read:

'ej tera'Daq ghotpu' Vec<T>a'chuqtaHvIS"

See: english - klingon translation: English - Klingon translation | TRANSLATOR.EU

Anyway, I still stand by the fact that even in Mathematics symbols take on different meanings in different contexts.

Mathematicians deal with that, so can we.

I don't recommend this -- non-negligible chance of collision.

8 Likes