Best practices for string argument types

I'd like to understand better how to make my function APIs which take string arguments as user-friendly as possible.

The standard advice for filepath arguments is to prefer AsRef<Path>, allowing the caller to provide an argument of any type which has implemented AsRef<Path> — most notably, String, &str, PathBuf, OsString and OsStr.

It seems to me that there's a caveat which isn't always passed along: if the function needs to take ownership of the path argument, it should be clear from the function signature, as advocated in the Rust API Guideline C-CALLER-CONTROL — don't take a reference argument and then clone(), since that hides the allocation cost. I think that means using either Into<PathBuf>. or Into<Box<Path>>.

In any case, I would like to be similarly liberal in accepting ordinary string arguments, which might imply accepting AsRef<str> for reference arguments and Into<String> for owned arguments.

But is there really any compelling reason to take AsRef<str> and Into<String> instead of just &str and String?

Drawbacks of AsRef<str> and Into<String>

  • Using generics adds some clutter to the function signature, as lamented in the Rust API Guideline C-GENERIC. (It also adds a bit of noise to the function body as you need to call as_ref().)
  • If you have multiple arguments, you need to give each of them its own type parameter, because otherwise they are constrained to be the same type — which can cause some baffling errors at call sites.

It was actually difficulty debugging the latter problem which inspired me to write up this post. (playground)

Advantages of AsRef<str> and Into<String>

  • In certain cases, the caller is not obligated to invoke as_ref() or into() at the call site — those calls move into the callee's function body, making the API more flexible and easier to use. But for string arguments, although that applies for Into<String> (playground), it doesn't apply to AsRef<str> because you can just prepend an ampersand on a String to pass it to a function expecting &str. (playground)
  • This is trivial, and arguably not an advantage but a drawback: The caller can leave off the ampersand when passing a String to a function which takes AsRef<str>, whereas the ampersand is required for functions which take &str. The quirk is that if you leave off the ampersand, your invocation has move semantics and your String gets consumed for no good reason.

Summing Up

  • I wish there was a best practices document which provided easy-to-follow, definitive guidance about how to design APIs for functions which accept string arguments. I understand why one size may not fit all, but I wish it was easier to learn how to avoid sub-optimal API design.
  • It might also be nice to have a similar resource for filepath arguments.
  • The AsRef documentation uses AsRef<str> as an example. But should it really, since AsRef<str> seems worse than useless?
5 Likes

I personally think that &str is cleaner than AsRef<str> and agree with the observation that the latter wouldn't have good advantages. Reading this point

makes me wonder if API such as fs::read_to_string or File::open would've been better if they accepted <P: ?Sized + AsRef<Path>>(path: &P) instead of <P: AsRef<Path>>(path: P), because this would make callers more aware of when they're passing an owned value, perhaps even an explicitly cloned one1, when it's unnecessary. (Of course changing them now is not an option.)

1 e.g. I can totally imagine someone having a local String variable and passing it to File::open as a path. Then later, they'd add more code that re-uses the same string resulting in an "value used after it has been moved" error which then quickly gets fixed by adding a clone() instead of just taking a reference which would've been the reasonable approach.

Great observation. If further discussion here indeed concludes that AsRef<str> is a bad practice, make sure to open an issue about this documentation being suboptimal; or perhaps even a PR, with the goal to replace it with a more realistic use-case e.g. involving AsRef<Path> - or, if that's considered too complex of an example (because Path might be unfamiliar to the reader), at least a note should be added that &str-argument API can also be called with &String without need for generic AsRef arguments.

6 Likes

I also prefer &str directly

You can do impl AsRef<str> to get around this:

fn foo(s: impl AsRef<str>) { /* ... */ }

Though, getting rid of one drawback, I must add another to keep things balanced! Generics are guaranteed to be monomorphized, so there ends up being a copy of the function for every combination of types used to call it.

1 Like

For a top-level argument type, AsRef<str> is so useless that I'd like Clippy to lint against ever using it. &str is simpler, doesn't cause generics bloat, and essentially does the exact same thing, only on the caller side (where it's cheap to do).

However, AsRef<str> may be useful in nested types. Consider:

fn take_slice(strings: &[impl AsRef<str>])

there's no cheap away to convert &[String] to &[&str], so in this case the abstraction helps.

10 Likes

This is a great idea

Generally just take String, &mut String, or &str. The extra compile-time hit of the other options aren't really worth it much of the time, IMHO. It's just not that hard for the caller to do foo(&s) instead of foo(s).

I'd rather use lints for that, personally. I wish it was more generally allowed to give away ownership if you don't care -- so long as foo(my_string) makes it inaccessible, I don't really care if it happened to only need &str. (Assuming good warnings for "this was moved here, but you could borrow it instead" if I try to re-use it later.)

2 Likes

Personally, I wouldn't like that. I'm pretty annoyed by 3rd-party crates that do this, too. (The same holds for Read/Write or Serialize/Deserialize etc. impls as well.) The non-reference signature is strictly more general, and I don't see any reason to prevent passing owned values. It basically only causes unneeded friction and an extra debug-recompile cycle or two.

5 Likes

Thanks, everyone! There seems to be consensus about preferring &str and avoiding AsRef<str> in most cases, which was the most confusing issue. With that out of the way, things start to fall into place and it's easier to reason about tradeoffs.

My sense is that the default types for top-level arguments should be:

  • &str for immutably borrowed strings
  • String for owned strings, or Into<String> when ergonomically justified
  • AsRef<Path> for immutably borrowed filepaths
  • Into<PathBuf> for owned filepaths

In addition, although mutably borrowed string arguments are not very common, for the sake of completeness these seem like decent defaults:

  • &mut String for mutably borrowed strings
  • &mut PathBuf for mutably borrowed filepaths

With regards to preferring Into<String> over String when ergonomically justified: Sometimes it's nice to spare the user from having to call to_string/to_owned/into. For instance, if builder-pattern methods take Into<String>, you get client code like this...

let thing = ThingBuilder::new()
    .foo("hello world")
    .bar("greetings, earthlings")
    .build();

... instead of like this:

let thing = ThingBuilder::new()
    .foo("hello world".to_string())
    .bar("greetings, earthlings".to_string())
    .build();

With regards to taking Into<PathBuf> for owned filepaths: It looks like String, &str, OsString, OsStr, and Path all implement it, so the same rationales apply as with AsRef<Path>. (playground)

There may also be circumstances where the optimal argument type is Cow, OsString, Box<String>, or something else, but that's outside the scope of this inquiry.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.