Owned and borrowed types when performance is less important

Let's talk about those types that implement ToOwned, although I'll use str and String as my example. Also, I assume here that values are not modified anymore after construction (this is a common pattern), so it really is only about ownership and I could equally well use Box<str> instead of String here.

When writing methods, I usually take Into<String> when I need to own the data somewhere in the method, and AsRef<str> otherwise. After doing this in a larger project for a while, I noticed the following issues:

  • When I change the code so that it does not require owned data anymore (or the other way around), I usually don't change the type signature. Mostly because I forget/don't think about it (it is really subtle), but sometimes this would also be a breaking API change.
  • I now have a method that takes an AsRef<str>, which calls one that wants an Into<String> which calls again one that wants AsRef<str> and there are a lot of unnecessary clones on the way and generally the ownership model is all over the place.
  • This adds inconsistency across my APIs
  • More importantly though, it increases my memory load while programming, because I have to think about how the value might be used in the future.

Especially when performance/memory is not really critical, that memory load is more of a burden than it helps. Of course I could also always own all values and liberally clone them, but I am looking for a better compromise between both extremes that isn't wasteful but also ergonomic. What I tried so far and how it went:

I tried using Rc<str> everywhere, and it wasn't all that great. Most notably, the Rc type has no special support for values that implement ToOwned. For example, I'd like to convert a Rc to an owned value (or at least a Box containing it), but such that it only gets cloned when it has other references. Furthermore, if I put Into<Rc<str>> as method arguments to automatically wrap the Rc if needed, this is not the same as Into<str> anymore. The difference is subtle, but it is really obvious with Path types, since they have a lot of auto conversions from string and OS strings that suddenly must be made explicit. (We have str -> Path and str -> Rc<str> and Path -> Rc<Path>, but not str -> Rc<Path>.)

I have also thought about using Cow<'static, str> everywhere instead, but my previous experiences with using Cow to abstract over ownership weren't that ergonomic to use.

This is not the first time that I've stumbled upon issues with abstracting over ownership in Rust, but I'd like to hear your take on this specific problem that I face right now.

AsRef<str> is mostly pointless and only gives you ambiguity at the calling site, which isn't necessarily a good thing. It slows down compilation, and causes code bloat. Don't use it. Use &str instead. It does the same thing, but faster and more explicitly.

Box<str> sometimes makes sense if you need your structs smaller, and don't plan to modify the strings (there are also a few small string libraries that do even better than this).

Rc<str> makes objects single-threaded. If you want to share memory, prefer Arc<str>. If you have lots of strings to share, pick a string interning library.

Into<String> (and Into<Box<str>>, etc.) is fine, as it allows reuse of heap allocation. Although, if the caller accidentally calls it with &string, it will wastefully reallocate. Taking String explicitly prevents this, but may be annoying to callers that have &str.

7 Likes

This seems a bit funny to me. Normally I'll accept impl AsRef<str> or impl Into<String> at the edge of my public API because it makes the code easier to consume. Then, for internal functionality I'll skip the generics and pass around a &str or String directly.

Even if I'm calling code which will internally accept a generic (e.g. std::fs::read_to_string() accepts impl AsRef<Path>), I'll typically get rid of the generics in the first line and pass down the concrete type.

use anyhow::{Error, Context};

fn do_stuff_with_file(filename: impl AsRef<Path>) -> Result<(), Error> {
  let filename = filename.as_ref();

  let text = std::fs::read_to_string(filename)
    .with_context(|| format!("Unable to read \"{}\"", filename.display())?;

  ...
}

My rule of thumb is to pass references (&str) as function arguments and store owned values (String). It sometimes makes sense for a struct to store a &str or &[u8] (e.g. in a temporary iterator or "view" object), but there's no need to tie yourself in knots with Rc<str> or Cow<'static, str> trying to avoid a string copy[1].

If it makes you feel any better, every use of + to concatenate strings in languages like Java or JavaScript will create a new string. Don't overthink it :slightly_smiling_face:


  1. Unless you know it'll have a tangible effect on performance anyway. About the only such example I've experienced first hand is a ML application that processes large image tensors at 60 Hz, but even then you'll often have bigger bottlenecks (e.g. the Inception model will routinely take a couple hundred milliseconds to run on my computer). ↩︎

2 Likes

Arc<str> also specializes equality checks to be pointer checks, so it can have other benefits... though I admit my use of Arc<str> as a lazy programmer's interner has been entirely instinctual (e.g. porting Hash-heavy String-processing scripts to Rust) and not reactive (measured).

1 Like

Thanks, that's good to know. But does this advice also apply to Path? I'd guess no, because it has more useful type conversion and the standard library makes liberal use of it.

My library is single threaded at the moment, but of course I'd use Arc otherwise. But this is a good point, maybe I should use Arc right away just in case threading gets added later on …

I generally try to do this as well, but sometimes I expose multiple layers of API that call each other and also what is public API may change over time. Also, most of the problems that I described still apply when not using the type conversion traits: you can still have a method call chain that does String -> &str -> String.

It doesn't really. Because I am specifically concerned for the common case where the data is not modified anymore after creation and only passed around. And garbage collected languages provide a good trade off between performance and usability here. In Rust, it feels like I'm forced to choose between "micro-manage the ownership by hand" and "liberally clone everything" with no middle ground.

AsRef<Path> is useful, because it allows use of string literals.