How to deal with Cow<str> in a HashMap/Vector?

I have another question regarding the QueryString struct.

I have extended to original code to actually decode the URL-encoding (percent encoding) of the query parameters. For this purpose I use the urlencoding::decode() function.

Now, I understand that I can no longer store &str in the HashMap and in the Value, because the string that comes in as &str needs to be modified when it gets decoded, so we now need an owned String.

Changing HashMap and Value to store Strings is easy enough.

Here is what confuses me:
The urlencoding::decode() function returns a value of type Cow<str>, where I'd expect a String. I'm not exactly sure what is the purpose of Cow<str> here. To my understanding, Cow<str> is a sort of string that will be cloned automatically as soon as it is modified ("copy on write").

Anyway, I can convert Cow<str> into a "normal" String by calling .into_owned(). But is this the "proper" thing to do here? I assume that this triggers a clone() right away, correct? Is there a better alternative?

(after all, I could wrap a Cow<str> into the HashMap and in the Value, instead of a "normal" String, I think)

Thank you for any suggestions!

I think the point of Cow<str> is that the decoding process might be able to just return the original data, but if any characters are escaped it needs to create a new string.

3 Likes

It will only clone() if it’s storing an &str; if it’s already holding a String, you’ll be handed that without any additional allocation.

1 Like

I think the point of Cow<str> is that the decoding process might be able to just return the original data, but if any characters are escaped it needs to create a new string.

It will only clone() if it’s storing an &str; if it’s already holding a String, you’ll be handed that without any additional allocation.

Okay, this makes sense!

Still, I need all strings to be the same type, so that they can be used as key in a HashMap or stored in a Value. What should I do with the Cow<str> objects? Should I convert them all to String right away?

The alternative would be to make the HashMap and Value store Cow<str> elements. I think this would avoid having to allocate an actual String in the cases where the Cow<str> wraps the original &str. But, then, how do I implement my QueryString::get() -> Option<&Value> function ???

(After all, we don't want to expose Cow<str> to the user of QueryString, but &str or maybe String)

I would convert them to String. Otherwise your HashMap has a lifetime parameter.

I would convert them to String. Otherwise your HashMap has a lifetime parameter.

Wouldn't this be fine, because the struct QueryString has a lifetime parameter anyway?

In that case it's fine; it's just easier to deal with owned values.

Okay, I have now implemented everything with Cow<str> :wink:

Does this seem reasonable?

use std::borrow::Cow;
use std::collections::{HashMap, hash_map::Entry};
use std::mem::take;
use urlencoding::decode as url_decode;

#[derive(Debug)]
pub struct QueryString<'buf> {
    data: HashMap<Cow<'buf, str>, Value<'buf>>,
}

#[derive(Debug)]
pub enum Value<'buf> {
    Single(Cow<'buf, str>),
    Multiple(Vec<Cow<'buf, str>>),
}

impl<'buf> QueryString<'buf> {
    pub fn get(&self, key: &str) -> Option<&Value> {
        self.data.get(key)
    }
}

impl<'buf> From<&'buf str> for QueryString<'buf> {
    fn from(s: &'buf str) -> Self {
        let mut data = HashMap::new();

        for sub_str in s.split('&') {
            let mut parts = sub_str.splitn(2, '=');
            let key = decode(parts.next().unwrap_or_default());
            let val = decode(parts.next().unwrap_or_default());

            match data.entry(key) {
                Entry::Occupied(mut entry) => {
                    let existing = entry.get_mut();
                    match existing {
                        Value::Single(prev_val) => {
                            *existing = Value::Multiple(vec![take(prev_val), val]);
                        }
                        Value::Multiple(vec) => vec.push(val),
                    };
                },
                Entry::Vacant(entry) => {
                    entry.insert(Value::Single(val));
                }
            };
        }

        QueryString { data }
    }
}

fn decode<'a>(str: &'a str) -> Cow<'a, str> {
    match url_decode(str) {
        Ok(decoded) => decoded,
        Err(_) => str.into()
    }
}

I think calling .into_owned() is also fine. It will allocate a new String from Cow::Borrowed, but if the code isn't performance critical, you can save yourself a lot of wrangling with lifetime annotations.

This looks fine. Perhaps this came up on a previous thread or is outside the scope of your question but why are you using a variant for Value? Some weird things can be constructed (namely a Value::Multiple with less than two items in its Vec). It seems like the right type is (Cow<str>, Vec<Cow<str>>) (the head and the tail stored separately; this guarantees you always have at least one element).

Also, I think you can avoid the take but it might be a little hairy. You can use replace to put a Multiple with an empty vector, then push the items you have. The take constructs a default value which might I think allocates an empty string.

This looks fine. Perhaps this came up on a previous thread or is outside the scope of your question but why are you using a variant for Value?[/quote]

Well, because I'm following the example code from the Rust course and tried to extend it a little :grinning:

I think they use the variant to illustrate the concepts. But with pure &str it was a lot simpler, of course.