Understanding when to use String vs str

Hey everyone, new to Rust and just checking my high level understanding of String vs str.

A String type is a container for a str that is stored on the heap. String keeps the ownership, str is simply a reference and will most commonly be seen as &str.

The heuristic I have in my head is that a String type should only be seen as part of the type that owns it. For example, as part of a struct definition.

When being passed into methods, or returning data from a method, then use an &str type as this will work as a reference to the original ownership.

I imagine it's a little bit more nuanced than that, but is that a good general rule of thumb?

1 Like

A String type is a container for a str that is stored on the heap.

Exactly correct.

str is simply a reference and will most commonly be seen as &str.

&str is a reference, yes, but str is not. str is the actual text itself, no kind of pointer involved. The reason for all the complications is that str is intrinsically variable-sized, which makes it hard to work with, but when you do, str is definitely not a reference.

(An example of where you might use neither String nor &str is Arc<str>, a reference-counted string. If you don't know what that means yet, don't worry about it.)

The heuristic I have in my head is that a String type should only be seen as part of the type that owns it. For example, as part of a struct definition.

When being passed into methods, or returning data from a method, then use an &str type as this will work as a reference to the original ownership.

Generally, you should use &str in function arguments and returns when you can use &str, but you can't always. In particular, any function which creates a new string that did not previously exist must return String rather than &str, because in order to continue existing past the function returning, the string needs to be owned by the return value.

The way I would suggest looking at it is: use &str when you know there is an owner of the string already, and they will hold still for you to borrow it as long as you need it. If there is no existing owner, or if the owner has its own business that is incompatible with you borrowing it, then you need to use String.

13 Likes

Thanks for the detailed (and quick) response @kpreid, this really clears things up and actually makes a lot of sense. I've just been working through a codebase trying to optimize this and found exactly

In particular, any function which creates a new string that did not previously exist must return String rather than &str

In the function itself, I was generating a new DateTime representing the current date and ran into all sorts of issues trying to return an &str.

The way I would suggest looking at it is: use &str when you know there is an owner of the string already, and they will hold still for you to borrow it as long as you need it. If there is no existing owner, or if the owner has its own business that is incompatible with you borrowing it, then you need to use String .

This makes so much sense, thank you.

3 Likes

You can grow a String. @kpreid alludes to this. There are other owned types, e.g. Box<str>. They are less common than String. You can append to a String, but you can't append to a Box<str>. There are some weird things you can do to both (I think you can make all the characters uppercase, but it's pretty limited).

Despite that, I would recommend using String and Vec over the alternatives with Box even when you aren't going to grow the data because they are more ergonomic.

4 Likes

Box<str> is the direct equivalent of &str, where Box owns the data (on the heap), and &str doesn't own the data (it's borrowed from somewhere, which could be on the heap, or the stack, or the executable, or wherever).

String is (Box<str>, len) for amortized growth. It's almost always used for the owned version of &str, since it's more versatile thanks to ability to be resized easily.

Not always! This has a lot of caveats! You can be returning a string without passing ownership (lend a temporary permission to view some string that has existed before the function call), or return with passing ownership (make a new string and give it to the caller).

Specifically, you can't make a new string inside a function, and return it as &str. That's because a newly-created string will be owned by a variable inside the function, and if the variable doesn't give up its ownership, the variable and the content it owns will be destroyed before the function returns. You can't return a permission to view a string that will be destroyed.

5 Likes

To add a clarification, the len noted here refers not to the length of the represented string, which Box<str> already stores, but the capacity or string allocation size of String, which is guaranteed to be at least the length of the string and possibly a bit more to allow for amortized string growth mentioned by @kornel. So I'm more inclined to write it as (Box<str>, capacity) or (Box<str>, allocation_len).

Edit: Nevermind. @kornel's metaphor was a bit different than I interpreted it to be. See his and my clarification below. He intended the Box<str> to include the uninitialized bytes, and box_str.len() actually acts as the capacity. So the additional field is the length.

4 Likes

Reusing a related post I wrote a while ago:

And yes, until you're comfortable with everything else, don't try to store &strs in structs.

I'll also link my favourite article about String vs str in Rust:

https://chrismorgan.info/blog/rust-fizzbuzz/

(It looks like a FizzBuzz article, but it's actually about strings.)

Thanks everyone for the responses, this has really helped to firm up my understanding of the difference and when to use both.

I'd heard the Rust community was awesome, this just proves it :tada:

3 Likes

No, your clarification is incorrect. The length I wrote refers to used length of the string. Technically it's (RawVec, len) and RawVec tracks the allocation with capacity together, and the actual used length is an extra field. It can be expressed as (Box<str>, len) if you initialize str's bytes. It would be invalid to have (Box<str>, capacity), because then the Box wouldn't carry enough bytes for the unused capacity, and Drop of the Box would report invalid size to the allocator.

4 Likes

Oh I see. So you're using box_str.len() as the capacity. Got it. Updated my post.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.