Returning strings: `String` vs. `&str`

For parameters, the type &str seems like an easy decision. When it comes to returning strings, is String usually the best choice (if we create something new)? Case in point: format!()

You cannot return &str unless you were passed in a &str, because you if you do, then you are creating a reference to a variable local to the function and that reference will become invalid one you return from the function. This is a hard compilation error.
You'd need to return a String.
Edit: The only &str you can return in any case is a &'static str.

4 Likes

Usually you return String because returning &str isn’t possible. If returning &str is possible in your use-case, return the &str instead. In cases where &str can commonly but not always be returned, there is the option to use Cow<'_, str>.

10 Likes
  • A &str return type to a function can only be a string that was already referenced (or part of something being referenced) as input to the function. A good example would be matching a string against a regex and returning the matching part: The returned &str is a part of the input! Or a function on a struct representing – say – a person, and returning their name. The name is already part of the input. (In both cases, I assume the input is a reference, too.)
  • Typical cases where only String is possible: If the data in the string is generated by the function. E.g. a function that combines two separate strings. Or a function that generates a string containing 100 'a's…
  • A standard library example where Cow<'_, str> is returned is String::from_utf8_lossy. Here, the common case that the input &[u8] contains valid UTF-8 allows us to simply re-interpret the &[u8] as &str; the other case will need us replacing invalid characters by '�', so the returned string is inherently newly generated, and thus a String. Cow<'_, str> is an enum with a &str and a String variant.

I just thought of another alternative return type: Some impl Display type. This is sometimes used because creating a String is expensive, and the caller might not actually want to own the returned string directly, instead they might want to make it part of some other even larger String, or directly write it to some output or a file. A struct implementing Display can be turned into a String, but the struct itself will just gather the necessary information that the String would be based on, together with a trait implementation containing the code necessary to turn that information into String data, but in a way where an arbitrary buffer can be used, so we can save allocations.

One example in the standard library I can think of is Path::display. Another example is (not the standard library) off the top of my head is Itertools::format, which acts as a more efficient and more customizable (though for full customization there’s also format_with) alternative to the otherwise quite similar Itertools::join.

Searching through std, I could also find char::to_lowercase (and its uppercase counterpart) which has a return type which is both an iterator of chars and a type implementing Display (and thus also ToString) simultaneously.

And then there’s one more and quite large class of such impl Display types: Errors! Since errors are usually one of finitely many cases, they commonly use enums, at least internally. Though their main use case is being printed, so logically, they’re fairly similar to simple String error messages. But by being an enum (or enum-containing struct) with a Display implementation instead, they can also be a bit more efficient.

7 Likes

Very helpful, thanks!

If you need to consume the input, String can be a better choice: if the caller already has a String they don't need to hold on to, they can give it to you. If they don't, they'll have to clone or create one -- but if you take a &str, you'll always have to do so.

Alternatively, take a generic S: ToString or S: Into<String>.

Cow<'static, str> (or S: Into<Cow<'static, str>>) is another possibility.

3 Likes

Here's my Rust 101 rules for this:

  1. "&str for parameters; String for return types". This always works, and has no borrowck complications, but is sometimes inefficient.

  2. the small tweak to add "If you always need to to_owned/to_string the &str parameter, change it to a String parameter". That's more efficient when your caller happens to already have a String, no less efficient if they don't, and still doesn't require thinking about lifetimes.

  3. if you're always returning a substring of one of the arguments, add a lifetime generic and return a &'a str, updating the corresponding parameter from &str to &'a str as well.

That covers the vast majority of cases, but there are of course more complicated scenarios that need something fancier.

9 Likes