When would one receive `Cow`s as parameters?

minimum · May 22, 2023, 3:54am

I read the document of Cow and some post from Google. It seems that Cows are generally used as return value of a function or fields of some structs.

I search Cow in std filtering by parameters but the result seems inaccurate.

In the case of functions, when would one receive Cows as parameters? One of such case is the solution of this post.

Are there other general usages or design patterns of receiving Cows? Thanks.

steffahn · May 22, 2023, 4:14am

Edit: Ah, I missed you were asking for Cow as function arguments in particular. The text I wrote below is an example of Cow as a return value, so I somewhat missed your question, sorry

Functions like String::from_utf8_lossy are a common case: The idea is that in many cases when calling this function, on a &[u8], it's going to be value utf8 already and you can cheaply re-interpret the &[u8] into a &str, without any additional copying overhead. But in case any invalid utf8 will be replaced by '�' characters, a new owned buffer is required to write into, and thus a String needs to be produced. Use-cases that only need the string for a short time can with with the resulting Cow<'_, str> easily, but dereferencing it into a (short lived) &str. Use-cases that need an owned string after all can still convert it using ino_owned(), so it's the user's choice whether they want the overhead of always needing to copy vs. only in cases where any modification happened.

2e71828 · May 22, 2023, 5:22am

In most cases, it’s better to take either an & reference or an owned value depending on what the body of the function needs to do with the value. The only situation that I can think of for accepting Cow<T> as an argument is when all of these are true at the same time:

T is expensive to clone.
The function sometimes needs to make destructive changes, but other times can do its job with just a reference.
The caller needs access to an unmodified copy of T after the function call.

jongiddy · May 22, 2023, 6:39am

If a function might reasonably be chained with a function that returns a Cow, including itself, then it should accept a Cow. For example, it's unfortunate that Regex::replace takes a slice instead of a Cow because that makes it difficult to chain:

fn replace_twice(s: &str) -> Cow<'_, str> {
    let temp = REGEX_1.replace(s, "A"); // makes change, returns a Cow::Owned
    REGEX_2.replace(&temp, "B")  // no change, returns Cow::Borrowed(&temp)
    // error: temp dropped but is also in return value
}

If Regex::replace accepted a Cow for the text parameter and returned it on no change, then this code would work without further matching.

jbe · May 22, 2023, 8:35am

A constructor function might take a Cow<'static, &str> if the returned value contains that string but shall avoid memory allocations in case of static str slices and yet support dynamically generated Strings in certain (run-time determined) cases. However, this comes at a slight (maybe neglectible run-time cost?) when accessing the Cow. Also, ergonomics might be slightly worse.

I did something similar here, though instead of accepting a Cow, I provided two constructors: one taking a &str and one taking a String (and the fact that the type uses a Cow internally is a hidden implementation detail).

BurntSushi · May 22, 2023, 10:52am

Perhaps. But if it accepted a Cow it would be far more annoying to call. Unacceptably annoying IMO. Returning a Cow is annoying enough as it is, and I sometimes regret that API choice just to avoid an allocation+copy in the case that there are no matches.

steffahn · May 22, 2023, 11:40am

Makes me think: How is impl Into<Cow<'a, str>> as an argument type? Looking at the existing implementations, that looks fairly usable / not-annoying, accepting &str, &String, String and Cow<'_, str>. I'm not necessarily suggesting to actually do this for Regex, but if someone likes the flexibility in their API, that'd seem potentially nice to me.

BurntSushi · May 22, 2023, 11:57am

Does that work for the chained use case though? I'm not sure the lifetimes will match up.

(I haven't tried it. If it does work, then technically accepting an impl Into<Cow<'a, str>> would be an acceptable backwards compatible change with respect to the API Evolution RFC. But in practice it's likely to lead to too much breakage due to inference failures. It's unclear how annoying the inference failures would be in practice if I had started with impl Into<Cow<'a, str>>. I know they are annoying enough to not accept an impl Into<String> for Regex::new though.)

steffahn · May 22, 2023, 12:36pm

It should work, no? I’m imagining fn(impl Into<Cow<'a, str>>) -> Cow<'a, str> with matching lifetimes. It’s of course also a bit non-trivial to decide how the owned case should be handled in the first place. I imagine, the String could possibly, maybe, potentially… or maybe not… be re-used for the output String even if some replacing happens… though certainly at least in the no-changes-made case it could be retained as-is. (Such things would have to be documented ^^.)

A fn(impl Into<Cow<'a, str>>) -> Cow<'a, str> (or equivalently, apart from the convenience, a fn(Cow<'a, str>) -> Cow<'a, str>) function is a bit like two functions: one fn(&'a str) -> Cow<'a, str> function plus one fn(String) -> Cow<'a, str> function (though the latter has no input to borrow from, so commonly it would be like a fn(String) -> String function, except maybe when &'static str return values are an option for certain cases). Offering two such functions distinctly would allow callers to wrap them up into a single fn(Cow<'a, str>) -> Cow<'a, str> function themselves; and if the only optimized case of the fn(String) -> String one is to keep the String untouched when no changes are being made, then by inspecting the result of the fn(String) -> Cow<'a, str> function, a user can write their own fn(Cow<'a, str>) -> Cow<'a, str> wrapper easily.

In fact, it looks like Regex::replace only ever returns exactly the original &str in the Cow::Borrowed case, it might just as well be a fn(&str) -> Result<String, NoMatchesFound> kind of function, leaving keeping around the &str to the caller. Of course, the benefit of not doing that can be increased convenience for the user out of the box. But the more complex the type signature, the more possible smart things or optimizations (such as the above-mentioned idea that maybe replacing within a String could happen in-place, using existing String capacity if available) that users might (incorrectly) expect need to be rules out in the documentation.

This whole discussion also makes me wonder about – something I have never tried to think too deeply about before yet – what the “ideal” API for HashMap would look like, where at the moment HashMap::entry has the unfortunate API design of always requiring an owned key. And sometimes, say for a HashMap<String, T>, you already have an owned String you no longer need afterwards, anyways; but if you only have a &str, unconditionally cloning the thing seems “wasteful”. I don’t know how closely or loosely related such an entry API would be to the things discussed here, but it’s not entirely different, I believe.

BurntSushi · May 22, 2023, 1:28pm

Yes, that's what I'm asking. Like I said, I just haven't tried it.

And yes, there are lots of different API choices for a replacement API. Oodles of them. Something that sometimes gets lost in the noise is that implementing your own replacement routine is not a lot of code. And the code isn't particularly tricky either. The thing about replacements is that there is a giant hairball of intersecting concerns: allocs, copying, whether a match exists or not, replacement interpolation syntax and so on. It's tricky to satisfy all of them adequately. IIRC, the replacement APIs were one reason why I went with a bytes sub-module for searching on &[u8] instead of trying to write one API that generalizes over &str and &[u8]. (Because I couldn't find a satisfactory way to generalize it. Years ago anyway.)

I have not spend a ton of time trying to figure out what the right API design is. My general opinion is that the one provided by the regex crate should be as easy to use as possible. More complicated type signatures, IMO, subtract from "ease of use" in the common case of replacements specifically.

Anyway, we're probably getting a bit off topic here. I'd welcome more API discussion focused on the regex crate in the regex repo as a Discussion. Although note that it will probably just be for fun. I'm unlikely to change the existing APIs or add new ones any time soon.

steffahn · May 22, 2023, 2:35pm

I think the current API is fine, too. I was mainly discussing these things here out of personal curiosity, and also as it is – in my opinion – very on topic in this topic to discuss the possibilities, and pros vs. cons, etc, of using (or allowing) Cow as a parameter type for some concrete API example.

BurntSushi · May 22, 2023, 2:46pm

Yeah what I meant is that the replacement API, generally, has more constraints than just Cow stuff.

tuffy · May 22, 2023, 3:14pm

I imagine a trivial text replacement API would look something like:

fn maybe_replace<'s, S: AsRef<str> + Into<Cow<'s, str>>>(
    input: S,
    pattern: &str,
    replacement: &str,
) -> Cow<'s, str> {
    if input.as_ref().contains(pattern) {
        input.as_ref().replace(pattern, replacement).into()
    } else {
        input.into()
    }
}

The idea being that if the input was already borrowed or owned (or already a Cow), and no replacement was performed, no additional allocations would be performed either.

Though for a Regex, naturally this would be some method of a precompiled pattern rather than a loose function.

steffahn · May 22, 2023, 3:18pm

I see little benefit of the S: AsRef<str> + Into<Cow<'s, str>> approach, compared to unconditionally calling .into() on a S: Into<Cow<'s, str>> value and working with the result. Am I missing anything?

tuffy · May 22, 2023, 3:26pm

Probably not missing anything. That's just what happens when I whip up something without thinking it all the way through.

(Quick edit: my quick-and-dirty replacement API would now look more like:

fn maybe_replace<'s, S: Into<Cow<'s, str>>>(
    input: S,
    pattern: &str,
    replacement: &str,
) -> Cow<'s, str> {
    let input = input.into();
    if input.as_ref().contains(pattern) {
        input.as_ref().replace(pattern, replacement).into()
    } else {
        input
    }
}

)

steffahn · May 22, 2023, 3:28pm

No worries. I thought of AsRef initially, too, when I wrote my replies above. As for benefits the other way, besides the simplicity of having fewer trait bounds: Avoiding the AsRef<str> + Into<Cow<'s, str>> approach also has the benefit of avoiding the question of “how weirdly can this misbehave if the AsRef and Into implementations return inconsistent things”.

Topic		Replies	Views
Why is not `Cow<str>` the default string type?	17	18370	July 3, 2022
Can anyone explain me Cow? help	6	3671	December 23, 2019
Str::replace return type help	3	586	January 12, 2023
Take AsRef<str> argument and return Cow<str> help	3	851	August 28, 2023
How to work with multiple `Cow`s? help	11	688	August 20, 2023

When would one receive `Cow`s as parameters?

Related topics