Idiomatic string parmeter types: &str vs AsRef<str> vs Into<String>


#1

I want to accept a string that will be stored (owned) in a struct. There are a few options on what could be passed:

  1. &str
  2. String
  3. T: Into<String>
  4. T: AsRef<str>

Illustration:

pub struct Thing {
    name: String,
}

impl Thing {
    pub fn new(name: WhatTypeHere) -> Thing {
        Thing { name: name.some_conversion() }
}

Which is the idiomatic type to pass here? I was having a discussion on a PR where I tried to change from Into<String> to &str, but ended up dropping it.

Here’s what I think is relevant:

  1. &str: caller must take a reference if passing String; callee controls allocation / copy
  2. String: caller must convert to String if passing &str; caller controls allocation / copy
  3. T: Into<String>: caller can pass &str or String; if allocation/copy is necessary, it occurs at conversion time in callee
  4. T: AsRef<str>: caller can pass &str or `String; callee controls if and when allocation / copy occurs

I think either &str (1) or T: AsRef<str> (4) are the way to go. I once wrote a footnote in a blog post where I suggested using T: AsRef<str> but got a ping from someone more experienced that accepting &str was more idiomatic. I wish I could remember who that was, but in any case I made the change.

So, what’s the idiomatic thing to do here?


#2

There’s also T: Into<Cow<'a, str>> which can help avoiding allocations in the method impl


#3

It might be interesting to view this from a different angle: What types can users pass to Thing::new?

I want to accept a string that will be stored (owned) in a struct.

I would assume they should at least be able to pass in a String. &str might be nice as well, but it would mean that you need to do an allocation. And possibly Cow<str>. Anything else you had in mind?

I’d use Into<String> to make it nice for users: https://play.rust-lang.org/?gist=360e78818c397993ea117dce8e23d429&version=nightly&backtrace=0

BTW, given a T: AsRef<str>, how would you convert that to a String? I can only see x.as_ref().to_string() (which screams allocation ) but I’m kinda distracted and could be missing something.


#4

There are two pattern here:

  • (4) is a generalization of (1), passing by reference
  • (3) is a generalization of (2), passing by value/move (i.e. passing ownership)

I’d suggest the exact opposite.
In the end, you need an owned string, so passing ownership is more flexible for the caller.

With (1) and (4), you always go through a reference. This means, that you will always have to allocate a new String.

With (2) and (3) however, the caller knows that you will take ownership and can pass you any owned String. It’s often the case that the caller has a temporary String lying around anyway that can just be moved.

My personal rules are:

  • If the function always takes ownership, pass by value.
  • If the function never takes ownership, pass by reference.
  • If the function sometimes takes ownership and sometimes not, use a Cow.

#5

Ah, this is a winning argument, thanks!

In fact, I will take a look at changing it to Cow<&'static, str>, thanks.


#6

So you’ll use Into<Cow<'static, str>>?


#7

I don’t think there’s a one-size-fits-all answer here. I personally like &str for its simplicity, and I’ll use it anywhere I can, even if I end up converting it to a String. Some considerations:

  1. Am I trying to write library code that is allocation free? Then the above doesn’t work.
  2. Is the allocation of the String an expensive operation compared to the rest of the work being done? If so, then the above doesn’t work.

The regex crate, for example, is not allocation free and the computation and memory required for compiling a regex dwarfs the overhead of creating a copy of the pattern string itself. In this case, that copy is marginal and nearly immeasurable, so it’s not worth it (IMO) to complicate the type signature of Regex::new.


#8

How does String complicate the type signature any more than &str?


#9

It doesn’t, but now it’s less ergonomic. e.g., Regex::new("pattern".to_string()) instead of Regex::new("pattern").


#10

I’ll see what it looks like, at least. :slight_smile:


#11

This is a good point. Perhaps Into<String> is enough for this use case, as it’s more or less a construct-time-only of long-lived, rarely constructed beyond startup structs.


#12

Thanks all for this thread, it’s helped clarify some of the tradeoffs here!


#13

Not sure if this argumentation mentioned but I’m personally using Into<String> for cases where I end up taking ownership of a parameter and AsRef for other cases.

I.e.

struct X {
    name: String,
}

impl X {
    fn new<T: Into<String>>(name: T) -> X {
        X { name: name.into() }
    }
}

fn get_chars_stat<T: AsRef<str>>(x: T) -> CharsStat {
    // ...
}

Argumentation is simple: you will anyway need an owned string for first case and you will never need owned string for second case.


#14

A third option (and alternative to Cow) is: T: Into<String> + AsRef<str>. While the former alone is best if you will always take ownership, and the latter if you will never do so, the combination allows you to avoid taking ownership whenever possible - and if a move will suffice to take ownership, it uses that.


#15

What happens if the passed-in String has a lot of excess capacity?


#16

That remains unless the caller or callee uses shrink_to_fit.
IMO it’s ok to leave this to the caller.


#17

We’ll see. The Java situation was slightly different (substring would return a reference which keeps the original string around), and it did cause a lot of confusion (and libraries had to add defensive copies). The excess capacity in a non-shared string is slightly different, though, and the impact will be different.


#18

I used to do this as well, but I recently changed to be clearer around Ownership such that I pass &str (&Anything really) when there will not be a clone of that data inside the function. This way I can make sure that callers understand and are responsible for allocating or cloning where necessary. I hear you on the ergonomics though…

What it prevents though is cases where you might have multiple function calls where &str is passed, converted to String, then &str and then another allocation to String.


#19

Right. I think I brought up that point a little later in my comment. In particular, if the allocation matters, then I agree &str is probably a bad choice. But if the extra allocation is an order of magnitude less than the allocation you’ll be doing anyway, then the extra clone probably doesn’t matter and is therefore better to get the ergonomic win.