Deserializing to a owned Cow in a trait method

I have a problem similar to https://users.rust-lang.org/t/using-serde-to-desrialize-to-a-owned-cow/78929.

Because my data structure has different "entry point" structs, I want to provide a trait with a default method to read in JSON data (the main format for the data structure) from a reader:

use serde::Deserialize;
use std::borrow::Cow;
use std::{error::Error, io::Read};


trait FromJson<'a>: 'a + Sized + Deserialize<'a> {
    fn from_reader<R>(mut reader: R) -> Result<Self::Owned, Box<dyn Error>>
    where
        R: Read,
        Self: CanOwn,
    {
        let mut buf = vec![];
        reader.read_to_end(&mut buf)?;
        let res: Result<Self, Box<dyn Error>> = serde_json::from_slice(&buf).map_err(|e| e.into());
        res.map(|r| r.to_owned())
    }
}

trait CanOwn {
    type Owned: 'static;
    fn to_owned(self) -> Self::Owned;
}

// One of many structs...
#[derive(Debug, Deserialize, )]
pub struct Manifest<'a> {
    #[serde(borrow)]
    pub attribution: Option<Cow<'a, str>>,
}

impl CanOwn for Manifest<'_> {
    type Owned = Manifest<'static>;

    fn to_owned(self) -> Manifest<'static> {
        Manifest {
            attribution: self.attribution.map(|a| a.to_string().into()),
        }
    }
}

However, this doesn't work, as the compiler assumes that there is still borrowed data from buf by the end of the from_reader function:

error[E0597]: `buf` does not live long enough
  --> src/lib.rs:14:72
   |
6  | trait FromJson<'a>: 'a + Sized + Deserialize<'a> {
   |                -- lifetime `'a` defined here
...
12 |         let mut buf = vec![];
   |             ------- binding `buf` declared here
13 |         reader.read_to_end(&mut buf)?;
14 |         let res: Result<Self, Box<dyn Error>> = serde_json::from_slice(&buf).map_err(|e| e.into());
   |                                                 -----------------------^^^^-
   |                                                 |                      |
   |                                                 |                      borrowed value does not live long enough
   |                                                 argument requires that `buf` is borrowed for `'a`
15 |         res.map(|r| r.to_owned())
16 |     }
   |     - `buf` dropped here while still borrowed

How can I convince the compiler that the data is in fact owned by the struct?

Maybe by using DeserializeOwned as a supertrait, rather than Deserialize<'a>?

use serde::Deserialize;
use serde::de::DeserializeOwned;
use std::borrow::Cow;
use std::{error::Error, io::Read};

trait FromJson: CanOwn + Sized + DeserializeOwned {
    fn from_reader<R>(mut reader: R) -> Result<Self::Owned, Box<dyn Error>>
    where
        R: Read,
    {
        let mut buf = vec![];
        reader.read_to_end(&mut buf)?;
        let res: Self = serde_json::from_slice(&buf)?;
        Ok(res.to_owned())
    }
}

trait CanOwn {
    type Owned: 'static;
    fn to_owned(self) -> Self::Owned;
}

// One of many structs...
#[derive(Debug, Deserialize, )]
pub struct Manifest<'a> {
    #[serde(borrow)]
    pub attribution: Option<Cow<'a, str>>,
}

impl CanOwn for Manifest<'_> {
    type Owned = Manifest<'static>;

    fn to_owned(self) -> Manifest<'static> {
        Manifest {
            attribution: self.attribution.map(|a| a.to_string().into()),
        }
    }
}

Playground.

2 Likes

Thanks very much for your, @jofas. I didn't expect the solution to be this straightforward. :slight_smile:

1 Like

As far as I’m aware… well, just became aware actually… putting #[serde(borrow)] onto Cow in particular is relatively nonsensical, as the standard implementation of Deserialize on it will always produce Cow::Owned values, anyways.

1 Like

This can now also use serde_json::from_reader. Also, if DeserializeOwned is required anyways, then the whole to_owned step is a bit redundant, as Self::Owned would typically – as is the case for Manifest<'static> then – implement DeserializeOwned directly, too. So the whole from_reader function becomes redundant, and one can use serde_json::from_reader directly.

Also, without the #[serde(borrow)] removed, I think Manifest<'a> would never implement DeserializeOwned, though the code example lacks any test case that makes sure FromJson can be implemented for it in the first place.

Hmm… maybe I’m mistaken on this point, actually. Still investigating.

Ah, there we go, it doesn’t work on Option containing a Cow.

The freaking derive macro has a magic special-case for things named Cow. :man_shrugging:

Indeed we should be able to get rid of the whole CanOwn trait (if we remove the #[serde(borrow)]) and simplify the code to this (I think—it does what I expect it to do, namely parse a Option<Cow<'a, str>> from a json reader into a Cow::Owned(...) instance, which may miss some intricacies OP is facing in their real code base):

use serde::Deserialize;
use serde::de::DeserializeOwned;
use std::borrow::Cow;
use std::{error::Error, io::Read};

trait FromJson: DeserializeOwned {
    fn from_reader<R>(mut reader: R) -> Result<Self, Box<dyn Error>>
    where
        R: Read,
    {
        Ok(serde_json::from_reader(&mut reader)?)
    }
}

impl<T: DeserializeOwned> FromJson for T {}

#[derive(Debug, Deserialize)]
pub struct Manifest<'a> {
    pub attribution: Option<Cow<'a, str>>,
}

#[test]
fn deserialize_manifest() {
    let json = r#"{"attribution": "some string"}"#;

    let res = Manifest::from_reader(json.as_bytes()).unwrap();
    
    assert_eq!(res.attribution, Some(Cow::Owned("some string".to_owned())));
}

Playground.


Ok I have no idea how to do it with #[serde(borrow)], without doing some very unsafe transmute shenanigans to extend the buffer's lifetime to satisfy 'a. I'm not sure whether my reasoning on whether this is actually safe is correct at all (miri doesn't complain though):

use serde::Deserialize;
use std::borrow::Cow;
use std::{error::Error, io::Read};
use std::mem::transmute;

trait FromJson<'a>: CanOwn + Deserialize<'a> {
    fn from_reader<R>(mut reader: R) -> Result<Self::Owned, Box<dyn Error>>
    where
        R: Read,
    {
        let mut buf = Vec::new();
        reader.read_to_end(&mut buf)?;
        
        let buf: &[u8] = &buf;
        
        // SAFETY:
        //
        // Our buffer `buf` lives only till the end of the scope, for some
        // unnamable lifetime `'1`. 
        // Given our type `Self<'a>`, according to our interface, the buffer 
        // must be alive for `'a`.
        // To satisfy the borrow checker here, we extend the lifetime of our
        // buffer to `'a`.
        // This is wildly unsafe as this would allow us to create dangling 
        // pointers, as our buffer is dropped at the end of the scope, even 
        // though we pretend like it lives for `'a`.
        // We can do that, because we immediately transform the data we
        // borrow from our buffer into owned data using `CanOwned::to_owned`. 
        // So even though we extend the liftetime of the slice beyond where it 
        // is actually valid, we won't ever leak it from this function.
        //
        let buf: &'a [u8] = unsafe { transmute(buf) };
        
        Ok(serde_json::from_slice::<Self>(buf)?.to_owned())
    }
}

impl<'a, T: CanOwn + Deserialize<'a>> FromJson<'a> for T {}

#[derive(Debug, Deserialize)]
pub struct Manifest<'a> {
    #[serde(borrow)]
    pub attribution: Option<Cow<'a, str>>,
}

trait CanOwn {
    type Owned: 'static;
    fn to_owned(self) -> Self::Owned;
}

impl CanOwn for Manifest<'_> {
    type Owned = Manifest<'static>;

    fn to_owned(self) -> Manifest<'static> {
        Manifest {
            attribution: self.attribution.map(|a| a.to_string().into()),
        }
    }
}

fn deserialize_manifest() {
    let json = r#"{"attribution": "some string"}"#;

    let res = Manifest::from_reader(json.as_bytes()).unwrap();
    
    assert_eq!(res.attribution, Some(Cow::Owned("some string".to_owned())));
}

fn deserialize_manifest_no_attribution() {
    let json = r#"{"attribution": null}"#;

    let res = Manifest::from_reader(json.as_bytes()).unwrap();
    
    assert_eq!(res.attribution, None);
}

fn deserialize_manifest_wrong_data_type() {
    let json = r#"{"attribution": true}"#;

    let res = Manifest::from_reader(json.as_bytes());
    
    assert!(res.is_err());
}

fn main() {
    deserialize_manifest();
    deserialize_manifest_no_attribution();
    deserialize_manifest_wrong_data_type();
}

Playground.

Thank you for the clarification, @steffahn and @jofas.

The reason why I use Cow<'a, str> in the first place (and the [serde(borrow)] macro) has been my (possibly naive) hope that I can on the one hand cater for different ways to read in data (from a reader, from a string and from a slice analogue to what serde_json provides) and on the other avoid an extra allocation if possible. The later of course can't be achieved when a reader is used, but should be doable with a String or a Vec<u8> (at least I thought).

But of course if the standard implementation of Deserialize will forcibly lead to Cow::Owned values, it would put the entire use case in question and I guess I could simply go for an Option<String> as the attribution's type for simplicity's sake.

Here’s some code somewhat similar to your original that might do what you’re looking for:

/*
[dependencies]
serde = { version = "1", features = [ "derive" ] }
serde_with = "3"
serde_json = "1"
*/

use serde::Deserialize;
use std::borrow::Cow;
use std::{error::Error, io::Read};
use serde_with::{BorrowCow, serde_as};

trait FromJson: Sized {
    fn from_reader<R>(mut reader: R) -> Result<Self, Box<dyn Error>>
    where
        R: Read,
        Self: CanBorrow,
        for<'a> Self::Borrowed<'a>: Deserialize<'a>,
    {
        let mut buf = vec![];
        reader.read_to_end(&mut buf)?;
        let res: Result<Self::Borrowed<'_>, Box<dyn Error>> = serde_json::from_slice(&buf).map_err(|e| e.into());
        res.map(|r| Self::from_borrowed(r))
    }
}

trait CanBorrow {
    type Borrowed<'a>;
    fn from_borrowed(borrowed: Self::Borrowed<'_>) -> Self;
}

// One of many structs...
#[serde_as]
#[derive(Debug, Deserialize)]
pub struct Manifest<'a> {
    #[serde_as(as = "Option<BorrowCow>")]
    pub attribution: Option<Cow<'a, str>>,
}

impl CanBorrow for Manifest<'_> {
    type Borrowed<'a> = Manifest<'a>;

    fn from_borrowed(borrowed: Self::Borrowed<'_>) -> Self {
        Manifest {
            attribution: borrowed.attribution.map(|a| a.to_string().into()),
        }
    }
}

impl FromJson for Manifest<'_> {}
1 Like

Your code works indeed as expected, @steffahn. Many thanks again for your help. For the sake of documentation, here's a fleshed out example of using the code:

use serde::Deserialize;
use std::{error::Error, io::Read};

pub(crate) trait FromJsonOwned: Sized {
    fn from_reader<R>(mut reader: R) -> Result<Self, Box<dyn Error>>
    where
        R: Read,
        Self: FromBorrow,
        for<'a> Self::Borrowed<'a>: Deserialize<'a>,
    {
        let mut buf = vec![];
        reader.read_to_end(&mut buf)?;
        let res: Result<Self::Borrowed<'_>, Box<dyn Error>> =
            serde_json::from_slice(&buf).map_err(|e| e.into());
        res.map(|r| Self::from_borrowed(r))
    }
}

pub(crate) trait FromBorrow {
    type Borrowed<'a>;
    fn from_borrowed(borrowed: Self::Borrowed<'_>) -> Self;
}

pub(crate) trait FromJson<'a>: 'a + Sized + Deserialize<'a> {
    fn from_slice(s: &'a [u8]) -> Result<Self, Box<dyn Error>> {
        serde_json::from_slice(s).map_err(|e| e.into())
    }

    fn from_str(s: &'a str) -> Result<Self, Box<dyn Error>> {
        serde_json::from_str(s).map_err(|e| e.into())
    }
}

#[test]
fn test_borrow() {
    use serde_with::{serde_as, BorrowCow};
    use std::borrow::Cow;

    // Order is important!
    #[serde_as]
    #[derive(Deserialize, Debug, PartialEq, Eq)]
    struct Shepherd<'a> {
        #[serde_as(as = "BorrowCow")]
        bovine: Cow<'a, str>,
        #[serde_as(as = "Option<BorrowCow>")]
        calf: Option<Cow<'a, str>>,
    }

    impl FromJsonOwned for Shepherd<'_> {}
    impl<'a> FromJson<'a> for Shepherd<'a> {}
    impl FromBorrow for Shepherd<'_> {
        type Borrowed<'a> = Shepherd<'a>;

        fn from_borrowed(borrowed: Self::Borrowed<'_>) -> Self {
            Shepherd {
                bovine: borrowed.bovine.to_string().into(),
                calf: borrowed.calf.map(|c| c.to_string().into()),
            }
        }
    }

    let test_str = r#"{"calf": "a cow", "bovine": "another cow"}"#;
    let test_bytes = test_str.as_bytes();
    let from_str = Shepherd::from_str(test_str).unwrap();
    let from_reader = Shepherd::from_reader(test_bytes).unwrap();
    let from_slice = Shepherd::from_slice(test_bytes).unwrap();

    assert!(matches!(
        (from_str.bovine, from_str.calf),
        (Cow::Borrowed(_), Some(Cow::Borrowed(_)))
    ));
    assert!(matches!(
        (from_slice.bovine, from_slice.calf),
        (Cow::Borrowed(_), Some(Cow::Borrowed(_)))
    ));
    assert!(matches!(
        (from_reader.bovine, from_reader.calf),
        (Cow::Owned(_), Some(Cow::Owned(_)))
    ));
}
1 Like

Just to try and explain for the benefit of anyone else seeing this and struggling to comprehend like I was.

You're deserializing a struct with lifetimes, specifically due to Cow<'a, str>.

It's nice to get serde to do the deserializing while keeping Cow in it's Borrowed form (Cow::Borrowed) to avoid allocation, for performance reasons.

This is quite difficult to do though. Fortunately serde has some utilities to help do this. The one that seemed to really work here was using serde_as which is an improved version of serde's with attribute. serde_as comes from a separate crate: serde_with. The serde_with crate specifically supports Cows via BorrowCow

The CanBorrow / FromBorrowed trait is yours, not part of serde at all and is used within from_reader - your own trait method analogous to serde_json's from_reader

In your from_reader method you've used serde_json's from_slice which gives you a borrowed form, borrowing from the owned data read inside the function. That can't be returned because it's about to be dropped at the end of the function, so from_borrowed gets you an owned version (that's what from_borrowed does).

All of this let's you use Cow efficiently. A naive serde impl would actually end up making owned Cows.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.