String, &str, when to use Vec<String> or Vec<&str>

Hi,

Usage scenario: capturing a variable number of string tokens input from users, and this list is immutable and alive during the entire life of the application.

I understand that (from Rust: Conversion Between &str and String Types):

&str is a string slice, once bound it is an immutable reference.

String is a growable, mutable string type that is stored on the heap. It can be modified and reallocated in memory as needed.

to_string() converts a &str to a String. It creates a new String from &str.

as_str() and operator & get a reference to a String as a &str. (The String is still available/valid after the conversion call.)

The below program, str_list_1 and str_list_2 produce the same result:

fn main() {
    let str_list_1: Vec<String> = vec!["abc".to_string(), "def".to_string(), 
        "ghi".to_string(), "jkl".to_string()];

    for s in str_list_1 {
        print!("{} ", s);
    }

    println!("\n\n----------------\n");

    let str_list_2: Vec<&str> = vec!["abc", "def", "ghi", "jkl"];

    for s in str_list_2 {
        print!("{} ", s);
    }    
}

Given the usage scenario above, should I use Vec<String> or Vec<&str>, please? And what are the advantage of one over the other?

Thank you and best regards,

...behai.

The point is not really mutability. The important difference between &str and String is that &str is borrowed and String is owned.

Borrowed data can only point to something that's already owned somewhere else. In your example, you only have string literals, which are baked into the executable, and thus are "owned" for the entire duration of the program by static memory.

Therefore this isn't at all representative of your use case. If you want to accept user input, you will have to do so using a String. You can't read into a &str. Borrows are temporary views into existing data; you can't use them to "store something by reference". They don't keep their referent alive.

18 Likes

There isn't a general answer for this, but:

  • If you're getting these individually from IO, they need to live somewhere, so if you have Vec<&str> you'll also have a separate Vec<String> somewhere to own them. Vec<String> could be better since it's only one Vec, but Vec<Box<str>> could be better than both, since it's the same layout as Vec<&str> and doesn't need a separate owner.
  • If you're able to have all the pieces in a single String, such as when doing deserialization, then it may be better to have Vec<&str> since it's only one big allocation (plus the Vec) instead of many small allocations (plus the Vec). It may even be viable to use the big String as the container, and expose the small &str pieces through an iterator, without a Vec at all.
  • Whatever you choose, after the collection is constructed, Vec<&str> and Vec<Box<str>> should be the fastest, but by an extremely tiny amount. This is because each element of Vec<String> is larger than each element of Vec<&str>/Vec<Box<str>>, and because String needs to construct the &str from its separate parts (pointer and length). Box<str> to &str should be free, since they're both normal fat pointers at runtime.
  • If you're using string literals, then there's no downside to Vec<&'static str>, since there's no need to own them or allocate them. But then you should also put them in an array and pass around &[&str].

You can test these different methods (Vec<String>, Vec<&str>, and Vec<Box<str>>) by making your functions generic like so:

fn f<S: AsRef<str>>(list: &[S])

and if you want to include the iterator as well:

fn f<I: IntoIterator<Item = S>, S: AsRef<str>>(list: I)
3 Likes

Hi drewtato,

Thank you very much for your explanations. I will need to study your suggestions. They are quite advanced for me.

In the meantime, I will go with Vec<String>, I will refactor it when my understanding of Rust progresses.

Thank you and best regards,

...behai.

1 Like

Hi H2CO3,

You are right, I could have written a better example :slight_smile: I will use a String and hence Vec<String>.

Thank you for your helps and best regards,

...behai.

My favourite article about when you'd want String vs &str:

(Cleverly disguised as being about fizzbuzz.)

8 Likes

String is growthable, String is better than &str

Sting is growable, but &str is trivially copyable. Each type has their own pros and cons.

Also, the two types look more distinct than one should think of them. &str is really just a slight generalization over &String, and for most analyses of the pros and cons of String vs &str, it’s most useful to think of them as String vs. &String. By that I mean, the same kind of trade-off applies as what you’d have for other owned types Foo where you can compare Foo vs &Foo.

6 Likes

Others have clearly described the various fine grained tradeoffs you are able to make with these choices, but I just want to make clear that it's perfectly fine, and often ideal, to write "dumb" Rust code that just clones all the data everywhere instead of thinking about borrows and reference counts*, and just generally performs as poorly as possible.

You'll get a lot more done in the same time, probably still beat the pants off a dynamic language implementation in speed, and if something does end up slow the simpler code often means it's easier to change the higher level design to get the large wins rather than trying to shave cycles off individual calls.

I've found a lot of Rust experience comes down to learning when the "zero cost" options are too costly to think about, in other words.

  • if you can obviously make an argument a reference without extra complexity, do so, of course! This is still guidance, not gospel.
4 Likes

Good evening Simon,

Thank you for your response. I can see your point. However, in term of multi-threaded, making deep copies of data might actually defeat the purpose of ownership? I have not got there yet, but going through the Rust introductory book, I think ownership would make multi-threaded easier to manage. I think it is worth learning and using the ownership feature as it has been designed for?

Please keep in mind, I am a complete newbie to Rust :slight_smile: So what I think of Rust might not be strictly correct.

Thank you and best regards,

...behai.

Follow @simonbuchan's suggestion: take it slow. What he's trying to tell you is that it's a common misconception that you have to use references (or borrows in Rust terminology) for everything, as if it was a silver bullet that works for every single use case.

As with everything, this is not the case. In fact, such misconception usually leads to overcomplicated code due to the contagious nature of lifetimes, which results in newcomers getting irritated and blaming that Rust is hard or borderline impossible to learn.

When considering the performance of a piece of code, a simple diagram that you should follow when developing software goes something like this:

  1. Create the simplest and quickest solution for the problem
    1.1. Does it have good enough performance?
    Yes: Good. Leave it there and move on to the next problem.
    No: Revisit the solution and improve the performance.

The main point of the suggestion is: Don't spend more time than necessary on improving the performance of a piece of code unless it's not performing as it should. The order of the events matter! You first have to write something, measure it, and then decide if you need to rewrite it if the results of the performance measurement are unacceptable.

As a matter of fact, this is yet another reason why I love Rust. I can write quick, idiomatic, declarative code with the confidence that the code will be compiled as if it was written imperatively (i.e. this is the case for iterators).

4 Likes

Thanks for this link, @scottmcm
I've a follow-up question here, since at the moment on of my most frequent coding errors is " temporary value dropped while borrowed

Given the invalid snippet

for i in 1..101 {
    let result = if i % 15 == 0 {
        "FizzBuzz"
    } else if i % 5 == 0 {
        "Buzz"
    } else if i % 3 == 0 {
        "Fizz"
    } else {
        &*i.to_string()
    };
    println!("{}", result);
}

and given a fixed version

for i in 1..101 {
    let x;
    let result = if i % 15 == 0 {
        "FizzBuzz"
    } else if i % 5 == 0 {
        "Buzz"
    } else if i % 3 == 0 {
        "Fizz"
    } else {
        x = i.to_string();
        &*x
    };
    println!("{}", result);
}

Wouldn't the compiler be able to parse the first snippet as the second one - that is, on the fly create a temporary storage variable to keep track of the string? I'm sure that this would work (could be made to work) in this example. I realize it could not work if the function also tries to return a reference, but that is not the case here.

1 Like

Good evening moy2010,

Thank you for the advice... I did feel what you have written above... Rust is indeed strange :slight_smile:

With NodeJs, Python ( I understand they are very different to Rust ) within the first few hours, I can connect and query different database servers ( MySQL, PostgreSQL, MongoDB etc... ) and even a "Hello World!" "web app"...

With Rust, after 20 odd days, I still had problem with if else statement ha ha ha...

Best regards,

...behai.

The scopes don't match up. The non-compiling version does create a temporary place, but its lifetime is not long enough. It's simply not what that code means.

The compiler can't in general guess what you meant at the high level (and it shouldn't). Thus, for consistency, you can't expect it to guess what you meant in special cases, either. It is not the compiler's job to massage your code as much as possible until it happens to compile somehow. That would lead to an absolutely tangled, unpredictable, unmaintainable mess.

1 Like

Since the question of where a value is being destructed in Rust is relevant, there are consistent rules about the “drop scopes” of temporary variables. These have the effect that in the first code, the temporary value containing the result of i.to_string() is dropped at the end of the let result = …; statement.

These rules can be particularly relevant for things such as guard objects from Mutex or RefCell. If you write

let n: u32 = *my_mutex.lock().unwrap();

your code may be relying on the behavior that the RAII guard that resulted from the lock() function is dropped at the end of the let statement, so that subsequent code can e.g. re-lock it without dead-lock.

So there are not necessarily any good rules to “parse” the first code as the second, whilst not making other cases worse. Any changing the rules for temporary scopes/lifetimes would be a breaking change anyways; whilst perhaps possible with editions, for such a change, one would need even more convincing reasony.

One might wonder, whether there isn’t a solution without changing any existing code’s behavior. After all, the original code doesn’t compile so maybe just change it for code that would otherwise fail to compile?

However, at least if the idea would be to extend temporary lifetimes for code that would otherwise fail the borrow-checker is not necessarily a good ides. Rust is currently designed in a way that the borrow-checker, which is a complex beast that no Rust user will usually have an entirely complete and accurate mental model of anyways, does not ever influence program behavior. Instead it just aids as a verification tool whose only effect is that a program’s compilation can fail.

This approach has at least 2 great advantages: On one hand, an alternative Rust compiler may decide, to save a lot of work decide to (whilst of course then not being exactly safe to use) skip on implementing the borrow checker. (E.g. see this example.) Another advantage is that Rust may improve its borrow checker in the future. The borrow checker can become smarter, allowing more programs to be correctly identified as being memory safe. If there was program behavior dependent on whether or not the borrow-checker rejects another version of the program, then such improvements would be breaking changes; whereas with the borrow-checker not influencing program behavior, compiler developers can improve Rust’s borrow checker as a completely backwards-compatible affair. One effort on improving the current borrow checker in the future can be found here. Improvements in the past also exists, most notably so-called “non-lexical lifetimes” (NLL).

Edit: And a 3rd advantage, that I’ve already hinted at implicitly: It frees the programmer from understanding all the intricacies and corner cases of borrow checking just in order to figure out what their program is doing.

2 Likes

I have been wondering about whether or not it would make sense to start a topic here like "What are we struggling with in Rust this week?" Also wondering if the maintainers of the Rust Book take cues from this forum - I think it might be possible to improve that first course by elaborating some of the topic a bit more.

1 Like

I'm sorry to say, but Rust is not a hard language if you have sufficient experience with other technologies in general. I often see people who claim a lot of experience fail to organize code in such a way that's sensible, consistent, or in generally good style. That is not the fault of the language, but a sign of lack of actual architectural knowledge. Having memorized patterns in one language for years does not make one a better programmer unless it results in actual understanding of why such patterns work.

Rust usually exposes such architectural mistakes or bad practice early, as compile-time errors. The fact that other languages don't doesn't mean that such bad code is acceptable or correct in those languages; it merely means that the compile-time checking capabilities of such languages are weaker than those of Rust.

3 Likes

The compiler would not be guessing in this case. In this particular example it detects that a temporary storage is needed with lifetime scope for the whole function (see the original error message) and it can also easily verify that this is only needed for the duration of this function, right? The resulting written code (as in the first invalid snippet) to me looks less ugly/messy and more "ergonomic" than the valid code in the second snippet. So, I don't quite see how you can say "that is simply not what the code means", since "what it means" should be determined by the whole function, not just by that line.

I.e, it's guessing.

No, that's explicitly not how it should work. Action at a distance leads to poor maintainability and poor readability. (And global analyses sometimes cause exponential compilation times, but that's beyond the point.)