Using serde to desrialize to a owned `Cow`

If I have the following struct

#[derive(Clone, Debug, Deserialize)]
struct Foo<'a> {
    #[serde(borrow)]
    pub bar: Cow<'a, str>,
}

I can read it from a file like this:

    let file = File::open("some.json").unwrap();
    let mut reader = BufReader::new(file);
    let mut buffer = Vec::new();
    reader.read_to_end(&mut buffer).unwrap();

    let foo: Foo = serde_json::from_slice(&buffer).unwrap();
    assert!(matches!(foo.bar, Cow::Borrowed(_)));
    println!("from file {:?}", foo);

In this case bar will be borrowed. However, if I want to move all that code into a dedicated function it no longer works:

fn from_file<'a>() -> Foo<'a> {
    let file = File::open("some.json").unwrap();
    let mut reader = BufReader::new(file);
    let mut buffer = Vec::new();
    reader.read_to_end(&mut buffer).unwrap();

    let foo: Foo = serde_json::from_slice(&buffer).unwrap();
    assert!(matches!(foo.bar, Cow::Borrowed(_)));
    foo
}

This won't compile, because of course it won't, data in bar is not owned by foo, but by buffer. In these cases I won't mind if bar is actually owned, so doing this works:

fn from_file<'a>() -> Foo<'a> {
    let file = File::open("some.json").unwrap();
    let mut reader = BufReader::new(file);
    let mut buffer = Vec::new();
    reader.read_to_end(&mut buffer).unwrap();

    let foo: Foo = serde_json::from_slice(&buffer).unwrap();
    assert!(matches!(foo.bar, Cow::Borrowed(_)));
    
    let foo = Foo { bar: foo.bar.to_string().into() };
    assert!(matches!(foo.bar, Cow::Owned(_)));

    foo
}

However, if I want to have a function to do this it no longer works:

impl Foo<'_> {
    pub fn to_owned(self) -> Self {
        let foo = Self { bar: self.bar.to_string().into() };
        assert!(matches!(foo.bar, Cow::Owned(_)));
        foo
    }
}

The problem being:

30 |     let foo: Foo = serde_json::from_slice(&buffer).unwrap();
   |                                           ------- `buffer` is borrowed here
...
36 |     foo.to_owned()
   |     ^^^^^^^^^^^^^^ returns a value referencing data owned by the current function

What am I missing here? How can I tell the compiler that I'm no longer referencing buffer?

This means that you're deserializing to the borrowed Cow, not the owned one. Why do you need this?

If I remove #[serde(borrow)], I always get an Owned string in bar.

I noticed that sometimes I keep the original slice around at least as long as I'm using the struct, so I was just thinking that in those cases there's no need to allocate new Strings, I can just use &str. But not in all cases, so I wanted to support both, and that's how I ended up with a Cow there. This was more of a "hmm could I do that?" kind of thing, so I'm curious if it is possible to actually do it and why isn't the borrow checker happy in this case. As far as I understand it, I'm no longer referencing buffer in that case. Is that a false positive or I'm not understanding things correctly?

The to_owned function does not tell the compiler, that Foo no longer borrows data. You want to indicate that by using Foo<'static>. You can use a Foo<'static> value everywhere you need a Foo<'a>.

impl Foo<'_> {
    pub fn to_owned(self) -> Foo<'static> {
        let foo = Foo { bar: self.bar.to_string().into() };
        assert!(matches!(foo.bar, Cow::Owned(_)));
        foo
    }
}

I don't know if your asserts are just for understanding what is going on or if you really want to use them as is, but assert!(matches!(foo.bar, Cow::Borrowed(_))); will not work reliably. JSON allows and even requires escape sequences for certain characters, like " or \n or Unicode escapes \u0111. In these cases you still get an Owned string.

1 Like

I used the assertions just to confirm my assumptions, the data I used for testing this has no escapes.

So, to verify that I understand what happens there. the '_ life time on the impl Foo<'_> is there because I don't care about what life time goes there, and the 'static life time on the return type is there so that the compiler will know that I'm no longer referencing any data owned by someone else, and everything is owned by me?

Was that inferred in the case in which I added let foo = Foo { bar: foo.bar.to_string().into() }; directly to from_file?

As a reference lifetime 'static indicates that the data pointed to by the reference lives for the entire lifetime of the running program. It can still be coerced to a shorter lifetime.

The memory allocated for bar will still be freed when it gets out of scope, right?

Yes, a '_ means that you have an unnamed lifetime. They occur when you do not need to explicitly name them but instead can rely on Rust inferring the correct values. 'static is the longest possible lifetime and means that the data is valid until the end of the program. In this case, this means that you own the data, and do not borrow from something shorter like the temporary buffer.

Yes, the memory will still be freed. That is part of the Drop implementation of Cow and String.