How passing impl trait avoids cloning

There's always been 2 hurtles that have caused me to give up on using references and pass by value (from the heap):

  • Pushing items into a vec within the scope of a for loop (vec lifetime extends beyond the loop)
  • Where it gets interesting: When an API seemingly requires it, e.g. PathBuf instead of &PathBuf (see below):
fn append_to_path(p: PathBuf, s: &str) -> PathBuf {
    let mut p = p.into_os_string();
    p.push(s);
    p.into()
}

Until today, I would have just given up and called like:

let my_path: &PathBuf = todo();
append_to_path(my_path.to_owned(), "meh");

But then I found this recent SO solution that blew my mind:

fn append_to_path(p: impl Into<OsString>, s: impl AsRef<OsStr>) -> PathBuf {
    let mut p = p.into();
    p.push(s);
    p.into()
}

So are my the lessons learned correct or am I missing the bigger picture still?:

  1. Rust APIs are commonly broken into traits for this exact reason - to be able to make use of methods that wouldn't normally be available to just a reference of a concrete type?
  2. I should always check Rust docs if an underlying trait exists for a concrete type that could allow me to still make the desired API call on only a reference of that type (before resorting to cloning it)?
  3. Passing the impl type in this way is not a general solution to avoiding cloning; e.g. my earlier mention of pushing into a Vec some data that can only exist within the scope of a for loop must require the Vec containing owned types of that data in order for that Vec to extend beyond the lifetime of that for loop. Or am I missing a design pattern here too that avoids heap allocation?
  4. Rust analyzer cannot automate detection of passing impl type instead to avoid heap allocation, as this is largely just a performance optimization and may be subjective (I'm not sure but I think I've read somewhere that heap allocations are not exactly guaranteed to slower in all cases...)?

There’s no cloning being avoided. The relevant implementation of the .into() method (via the From trait) simply does the cloning for you.

So if append_to_path is called with a reference, then the let mut p = p.into(); line will clone the string, not in the sense that an actual method called “clone” is called, but in the sense that the same operation of allocating new owner memory and copying all the data is performed.

The only thing that is avoided is for the caller to explicitly do the cloning themself.

A possible downside to this API is that the caller might actually have an owned PathBuf they are willing to give away, but then, as nothing about the call append_to_path(&foo, &bar) gives away the cloning, they might not as easily notice that they introduced an unnecessary extra clone by passeing by reference.

Finally also note looking at the signature is fn(p: PathBuf, s: &str) -> PathBuf, some experienced Rust users might think that fn(p: &mut PathBuf, s: &str) could be nicer / more idiomatic. This can be implemented by using the method PathBuf::as_mut_os_string.

fn append_to_path(p: &mut PathBuf, s: &str) {
    p.as_mut_os_string().push(s);
}

There are downsides, as this kind of API can be a bit more tedious to call in some use cases (requiring you to define a mutable variable, then mutate, and then use the result, in separate statements), but on the other hand the other API is not usable at all in cases where you only have &mut PathBuf-reference access to the PathBuf (well… at least without resorting to mem::take).

4 Likes

wow thank goodness for this forum. I nearly invented my own reality =)

2 Likes

It’s understandable. The explicitness of cloning is one of Rust’s strengths in general, and a great improvement over C++; however not every deep copy of data is spelled “clone”, as other custom methods can do the same. Having it spelled “into” is arguably one of the most subtle ways.

At least it’s still in line with conversion method naming conventions, which suggests methods with a “into…”-based name can (but don’t have to) be expensive, where things like copying a lot of data to a new allocation is a typical thing considered “expensive”.

(Arguably, the Into::into method might be a special case[1] as its more versatile implementations including many borrowed->owned conversions mean that it’s slightly missing the standard naming as that’d usually be a “to_”-based method.)


  1. that also maybe technically doesn’t count anyways since it’s not literally starting with into_ (no underscore) ↩︎

2 Likes

In general, beyond the precise rules of how Rust ensures there is no UB, try to keep an eye on why you code doesn't have UB, i.e., not just why the compiler isn't happy, but why the compiler is right not to be happy. If you have a reference with a lifetime that is local to a for loop, it means the value is allocated for the duration of one iteration of the for loop. You will never (in safe Rust) be able to push such a reference to a Vec that has a larger lifetime, since the reference would become dangling at the end of the for iteration, causing use after free when reading it afterwards.

2 Likes

exactly, and it's a shame every time I find myself writing code that generates data in a for loop, because I know I'm not much better theoretically than a garbage collector probably :wink:

but... it is a common pattern that I never seem to find a way around. For instance, keeping with my example of parsing paths:

let my_vec: Vec<String> = Vec::new();
//somehow all paths magically exist here...       <--------------------------------
for file in files {              //                                                |
        let path = file?.path(); // if only this could be defined up here instead _|
        my_vec.push(stuff(path));
}
// do stuff with my_vec....
1 Like

I'm not sure I fully understand that pseudo code, but in some cases where you want something be owned at a higher-up level/scope (and it's in a loop, so a bunch of things, not just a single one) you could actually achieve such functionality using an arena type such as: Arena in typed_arena - Rust.

For things like strings or paths which are, in the grand scheme of things, tiny amounts of data, pulling such tools is probably unnecessary. But in any case it's an interesting kind of API to take a look at, if nothing else then as a learning experience to see what's possible in Rust.

(Note that while the Arena::alloc method returns a mutable reference for maximal flexibility, you can coerce that into an immutable one of you need to share the data.)

1 Like

I don't understand the problem either. What is the type of files? Is it an iterator of DirEntry values? If so, the path() method gives you an owned path, a PathBuf, not a &PathBuf reference, so you don't have to copy it.

Note that under the hood, path() concatenates the path passed to std::fs::read_dir to the name of the entry, and this involves a copy of both. As @steffahn noted, cloning paths is inconsequential since they are normally very small.

1 Like

I've seen Arena before but it never made sense to me why that much complexity was ever needed. I now get it and it's simple - lifetimes extending past the for loops they were created in!

And yes... sorry those are DirEntry type I'm calling path()? on...

It's weird to think back in my java days how much of this I was missing out on. It's like computers are computers again and before I was programming magical elves to crank cogs for me.

While arenas are useful in complex cases, you can often just collect everything you produce in the loop into a Vec and use references into that.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.