How do I return a deserialized serde struct?

I want to get something similar following code to return a struct which is deserialized from an input file. The goal is to hide the deserialization implementation from the api client.

let a_struct = a_module::from_file<AStruct>("/configuation/file.path")?

The following code demonstrates the problem that I am having.

trait MyTrait {
    fn do_something(&self) -> String;
}

#[derive(serde::Deserialize)]
struct MyStruct<'a> {
    field: &'a str,
}

impl MyTrait for MyStruct<'_> {
    fn do_something(&self) -> String {
        self.field.to_owned()
    }
}

fn from_str<S, T>(s: S) -> T 
where S: AsRef<str>,
      T: MyTrait + serde::de::DeserializeOwned
{
    serde_yaml::from_str::<T>(s.as_ref()).unwrap()
}

trait MyTrait {
    fn do_something(&self) -> String;
}

#[derive(serde::Deserialize)]
struct MyStruct<'a> {
    field: &'a str,
}

impl MyTrait for MyStruct<'_> {
    fn do_something(&self) -> String {
        self.field.to_owned()
    }
}

fn from_str<S, T>(s: S) -> T 
where S: AsRef<str>,
      T: MyTrait + serde::de::DeserializeOwned
{
    serde_yaml::from_str::<T>(s.as_ref()).unwrap()
}

fn main() {
    let a = from_str::<MyStruct>("a: A String");
    //let a = from_str::<_, MyStruct>("a: A String");
    dbg!(a.do_something());
}

When I try to use from_str::<MyStruct>("a: A String"); i get:

error[E0107]: this function takes 2 generic arguments but 1 generic argument was supplied
  --> src/main.rs:24:13
   |
24 |     let a = from_str::<MyStruct>("a: A String");
   |             ^^^^^^^^   -------- supplied 1 generic argument
   |             |
   |             expected 2 generic arguments
   |
note: function defined here, with 2 generic parameters: `S`, `T`
  --> src/main.rs:16:4
   |
16 | fn from_str<S, T>(s: S) -> T 
   |    ^^^^^^^^ -  -
help: add missing generic argument
   |
24 |     let a = from_str::<MyStruct, T>("a: A String");
   |                                +++

When I try to use from_str::<_, MyStruct>("a: A String");

error: implementation of `Deserialize` is not general enough
  --> src/main.rs:25:13
   |
25 |     let a = from_str::<_, MyStruct>("a: A String");
   |             ^^^^^^^^^^^^^^^^^^^^^^^ implementation of `Deserialize` is not general enough
   |
   = note: `MyStruct<'_>` must implement `Deserialize<'0>`, for any lifetime `'0`...
   = note: ...but `MyStruct<'_>` actually implements `Deserialize<'1>`, for some specific lifetime `'1`

I've don't understand why I need to pass the _ generic parameter in the second attempt but the error looks like it is getting closer to the root cause. I have tried all sorts of changes to the lifetimes and trait bounds of the function to no avail. If i read the file, deserialize it, and use the struct all in the main function it works perfectly...

How can I do this?

From serde's docs:

[DeserializeOwned] means "T can be deserialized from any lifetime." [...] It means T owns all the data that gets deserialized.

This isn't the case for your struct, so using DeserializeOwned won't work. After we fix the signature to have a lifetime parameter...

fn from_str<'de, S, T>(s: S) -> T
where
    S: AsRef<str> + 'de,
    T: serde::de::Deserialize<'de>,
{
    serde_yaml::from_str(s.as_ref()).unwrap()
}

...we're met with another error.

error[E0597]: `s` does not live long enough
  --> src/main.rs:11:26
   |
6  | fn from_str<'de, S, T>(s: S) -> T
   |             --- lifetime `'de` defined here
...
11 |     serde_yaml::from_str(s.as_ref()).unwrap()
   |     ---------------------^^^^^^^^^^-
   |     |                    |
   |     |                    borrowed value does not live long enough
   |     argument requires that `s` is borrowed for `'de`
12 | }
   | - `s` dropped here while still borrowed

Using AsRef isn't going to work here. The signature accepts Strings, for example. According to the signature, this is a perfectly valid way to use the function

from_str::<_, MyStruct>(String::from("hi!"));

but our implementation can't actually handle that, because the string is dropped as soon as the function exits which would leave us with a MyStruct.field that points nowhere.

The goal is to hide the deserialization implementation from the api client.

If that's the case and the config example reflects your use case, you definitely don't want to have borrows in the struct you're deserializing to. The data it's borrowing needs to be stored somewhere, in this example it would be somewhere before the from_file call, making it the client's problem. It's much easier and cleaner for everyone involved if the struct owns the data:

#[derive(serde::Deserialize)]
struct MyStruct {
    field: String,
}

fn from_str<S, T>(s: S) -> T
where
    S: AsRef<str>,
    T: serde::de::DeserializeOwned,
{
    serde_yaml::from_str::<T>(s.as_ref()).unwrap()
}

fn main() {
    let a = from_str::<_, MyStruct>("a: A String");
}
4 Likes

Thanks... I had tried all of these options and this is the situation that I was trying to avoid :stuck_out_tongue_closed_eyes: But, it works..

I guess I'm playing tug of war with myself between 2 different and possibly competing goals!! On one hand I want the client api to be clean, and on the other I want to keep the program as lean as possible.

I was trying to use references until I get to the final client structs of the application. In this case it's what is returned from the deserialized struct methods.

The struct in this example is a deserialized configuration file (which can get large) so I was trying to keep everything as references. I had a similar problem that I couldn't solve with dynamically deserializing trait objects without moving everything to use Strings and I was trying to avoid it for the whole configuration file. With the new code, I have a whole bunch of clone()s in the code to create the final client structs...

  1. Would Cows :cow: be of any use in the configuration file struct to prevent using String everywhere?
  2. Is there any way to get rid of the ignored generic _ in the method call so it becomes module::from_file::<MyStruct>("/path/to/config.yaml")? I don't really understand why it's necessary when we have the actual path. Is there any way to use AsRef as the type without declaring it as a function generic?

Edit: answered myself for question 2. The following signature does the job..

fn from_file<T>(path: impl AsRef<Path>) -> Result<T>
where
    T: Provider + DeserializeOwned,

If the data you're deserializing from is out of your control, you have to be careful about deserializing into plain references, because many data formats support things like escape characters that mean values can't always be borrowed directly.

I tweaked your code so it would work without DeserializeOwned to illustrate this

trait MyTrait {
    fn do_something(&self) -> String;
}

#[derive(serde::Deserialize)]
struct MyStruct<'a> {
    field: &'a str,
}

impl MyTrait for MyStruct<'_> {
    fn do_something(&self) -> String {
        self.field.to_owned()
    }
}

fn from_str<'de, T>(s: &'de str) -> T
where
    T: MyTrait + serde::de::Deserialize<'de>,
{
    serde_yaml::from_str::<T>(s).unwrap()
}

fn main() {
    let a = from_str::<MyStruct>("field: A String");
    dbg!(a.do_something());

    // Panics with "Error("field: invalid type: string \"A\\n String\", expected a borrowed string", line: 1, column: 8)"
    from_str::<MyStruct>(r#"field: "A\n String""#);
}

Using a Cow avoids this issue.

I'm not an expert on zero-copy deserialization like this, but in order to deserialize with references like this, you need to keep the entire config file in memory as a string. Depending on the details of the file and the struct you're deserializing to, this may not be particularly lean. Imagine a config struct with one string field and lots of enum fields; using String lets you copy the small relevant section from the config file and drop the rest, using &str forces you to keep the entire thing in memory even though most of it is not necessary. Of course, it's possible this doesn't apply in this case, but in general I think yaml is not particularly well suited for zero-copy deserialization because it contains a lot of unnecessary information like whitespace, punctuation, field names and such.

Also, config files are typically not something that are deserialized often, or something that you'd store a large amount of, so a small amount of potential inefficiency in its deserialization is probably nothing to worry about unless you've benchmarked it and found that there's an issue.

I don't think so. The Cow::Owned case is basically the same as using a String, while Cow::Borrowed has the same problem as &str of where this data is being borrowed from.

This sounds a little strange, do you store the same strings in multiple places in the final struct or something like that? Without knowing the exact details here, refactoring away these clones sounds more reasonable than using borrows in these intermediate structs.

1 Like

@semicoleon I had tried this trait bound, but since the actual code is reading and deserializing a yaml file (from_str is actually from_file(path)) I kept running into the error that the file content String doesn't live long enough. It bleeds into the point that @Heliozoa makes about having to keep the file contents in memory if deserializing to a reference.

This is something I had considered, but didn't think about the overall memory usage. I guess I have been trying to reduce the number of allocations to String until the final output.

The configuration struct for this trait implementation is actually a declarative web scraper. It builds a raw serde value with the CSS selector configuration which gets serialized back into a generic struct (or a simple map). The strings need to be cloned into the raw serde value when building the output. I'll see what I can do to reduce the number of allocations when building the output. The plan is to be able to write different providers to get 'objects' from REST, GraphQL, Web, etc.. consistently through configuration of different providers (eg. the web provider).

At the moment it looks like this:

let mut items = Sequence::new();

for element in html.select(&elements) {
    let mut map = Mapping::new();
    map.insert("provider".into(), self.name.clone().into());
    map.insert("source".into(), self.source.clone().into());

    for field in &self.fields {
        let value = field.value.from_element(&element).unwrap();
        map.insert(field.name.clone().into(), value.into());
    }

    items.push(map.into());
}

Love this forum!! Always get great responses and discussion.

I see. If I'm understanding correctly, you're turning them into Values which means turning them into Strings either way, since the From<&str> for Value just calls to_string(): https://docs.rs/serde_yaml/latest/src/serde_yaml/value/from.rs.html#56-70. So even if self only contains &strs, the same amount of allocations will still be made during this step, they're just hidden inside the .into() calls.

One idea I got is to deserialize into Arc<str> (see https://serde.rs/feature-flags.html#-features-rc) and use an IndexMap<Arc<str>, Value> instead of a Mapping (since Mapping is pretty much just an IndexMap<Value, Value>). Not sure how well that would work for you, but when I need to do a lot of cloning and I don't need to mutate the the data being cloned, I tend to reach for Arc. :thinking: