How to work with multiple `Cow`s?

The snippet below try to call multiple functions returning Cows inside a workflow control function (Or namely chaining Cows?). If every process function f1, f2, f3... rarely return a owned value, f should be able to return the input borrow value.

How can I fix this issue? Or any suggestion to improve this workflow? Thanks.

use std::{borrow::Cow, ops::Deref};

fn f1(s: &str) -> Cow<'_, str> {
    if s == "hello" {
        Cow::Owned(s.to_uppercase())
    } else {
        Cow::Borrowed(s)
    }
}

fn f2(s: &str) -> Cow<'_, str> {
    if s.contains(' ') {
        Cow::Owned(s.to_string().replace(' ', ""))
    } else {
        Cow::Borrowed(s)
    }
}

// Or maybe more...
// fn f3(s: &str) -> Cow<'_, str> {
//     ...
// }

fn f(s: &str) -> Cow<'_, str> {
    let s = f1(s);
    let s = f2(s.as_ref());     // let s = f2(s.deref());  won't work
    // ...
    // let s = f3(s.as_ref());

    s // cannot return value referencing local variable `s`
}



fn main() {
    let s = "hello world";
    f(s);

    let s = "abc";
    f(s);
}

This works Rust Playground

fn f2(s: Cow<'_, str>) -> Cow<'_, str>

The reason s.deref() won't work is like what's stated in this post:

  • s.deref() desugars to Deref::deref<'tmp>(&'tmp Cow<'f, str>) -> &'tmp str where 'f imposed on the f function strictly outlives 'tmp
  • then f2(s.deref()) is f2(s: &'tmp str) -> Cow<'tmp, str> and you'll get a shorter lifetime than 'f, thus the error
4 Likes

The trick is to notice that there can't be a single function like as_ref() than hands out a &'long T from a &'short Cow<'long, T>. The Cow can be owned, in which case it would return a reference pointing inside itself, which obviously can't outlive the Cow instance itself.

5 Likes

The hand-written way of chaining f1 and f2 I could come up with is like

fn f(s: &str) -> Cow<'_, str> {
    match f1(s) {
        Cow::Owned(s) => Cow::Owned(f2(&s).into_owned()),
        Cow::Borrowed(s) => f2(s),
    }
}

For 3 functions, you could avoid one cloning step like this

fn f(s: &str) -> Cow<'_, str> {
    match f1(s) {
        Cow::Owned(s) => Cow::Owned(f3(&f2(&s)).into_owned()),
        Cow::Borrowed(s) => match f2(s) {
            Cow::Owned(s) => Cow::Owned(f3(&s).into_owned()),
            Cow::Borrowed(s) => f3(s),
        },
    }
}

i.e. here, if f1 already returned an owned value, the subsequent f2 and f3 can be called directly in succession, and only one into_owned needs to happen at the end.

I’m not sure what the best approach to generalizing this or making it less tedious would be. E.g. if you wrote a simple function that can turn Fn(&str) -> Cow<'_, str> into Fn(Cow<'_, str>) -> Cow<'_, str>, then such intermediate copies could not be avoided easily.


However, this still has some downsides (either version above), as e.g. if f1 returns an owned string, and f2 needs to do no additional modifications, in principle it could return the same owned String unaltered, but if it only offers a Fn(&str) -> Cow<'_, str> interface to begin with, then there’s no way to preserve the owned String, and at least one additional copy of the string at the end of the chain would be necessary; so if the functions are re-written like @vague suggests, there’s in some cases more potential improvement, and chaining them then becomes trivial, anyways ^^ – at the cost that writing functions like f2 gets a bit harder.

3 Likes

Thanks for the hand made example, and pointing out the downside of take &str. I didn't think about it before I post the question.

That's helpful to figure out the problem in this case. Thanks!

As you're saying the common case is that the references are maintained, the overhead of copying strings probably does not matter at all. Especially when the operations involve reading the whole string anyways. Reading the whole thing or copying it isn't too much of a difference, I suppose.

So I'd say, you can actually create, say, the

that is mentioned, for maximal convenience (only in case your use case doesn't allow a straightforward Cow->Cow function definition already anyways), ensuring your common case where none of the functions create a new owned value is as good as possible.

1 Like

I need some time to dig into it and try to get it work. :laughing:

Just abstracting this pattern

In a (impl Fn... ) -> impl Fn... function should work ^^

fn(Cow<'_, str>) -> Cow<'_, str> is awkward to use since you have to write the borrowed branch anyway.

fn(&mut Cow<'_, str>) is a little nicer, but still not idiomatic in Rust.

As long as one more heap allocation is acceptable, why not start from String in the first place and use &mut String along the way? Rust Playground

1 Like

Yes, that can pretty simplify the implementation and more ergonomic in this case.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.