How do I write a generic function to deserialize multiple structs? (csv + serde)

I have been happily strolling along with csv and serde. What a great pair!
However, I've found a gaping hole in my knowledge. I am trying to write a function to deserialize any struct that implements DeserializeOwned.
You can see a public example of my non-generic function here.
In that example, the following snippet is the part I need to make generic:

    for result in reader.deserialize() {
        let _record: Reservation = result?;
        count += 1;
    }

I would like to author the function so Reservation could be any struct that derives DeserializeOwned.
And now for the slew of questions to calm my nerves.

  • How would you approach this problem?
  • Am I right thinking I should use a macro here? I believe that based on I don't think I can pass a named struct as a parameter.

Thank you in advance!

This would be how to make the function generic.

pub fn count<T: serde::de::DeserializeOwned>(bytes: &[u8]) -> Result<usize, Box<dyn std::error::Error>> {
    let mut reader = Reader::from_reader(bytes);
    let mut count = 0;
    for result in reader.deserialize() {
        let _record: T = result?;
        count += 1;
    }
    Ok(count)
}

And then call it with

count::<Reservation>(reservations.as_bytes())

Link to Playground

::<T> is the called the turbofish syntax and can be used to specify a generic parameter for a function.

2 Likes

Thanks! Your adaptation of my contrived example made it stick for me.
I realized my "Real World" version of this function required 2 type bounds:

pub fn count_and_copy<W: std::io::Write + ?Sized, T: serde::de::DeserializeOwned>(
    bytes: &[u8],
) -> Result<usize, Box<dyn std::error::Error>> 

So in this case I needed a dynamic trait object as the first thing passed to the turbofish:

count_and_copy::<dyn Write, Reservation>("foo".as_bytes(), a_reservation);

Is it accurate to say dyn Write means any type that implements the std::io::Write trait and Reservation is a concrete type that implements DeserializeOwned?

Either way, thanks again. I'll add dynamic trait objects to my Rust learning goals.

Are you passing an object that implements Write? As in, why do you need the Write trait in the function? Its no obvious to me based on what you've shown how its being used.

In general, you only need to specify types that the compiler can't figure out on its own. In the case of your original function, since the type was not used in the arguments or the output, there is no way for the compiler to guess what type you want to use for deserialization.

Typically when you use Write you are taking as an argument an object that implements Write. In which case, there's no need to explicitly specify a type, since the object you're passing would be known to the compiler. The signature of the function would be something like this

pub fn count_and_copy<W: std::io::Write, T: serde::de::DeserializeOwned>(
    bytes: &[u8], writer: W
) -> Result<usize, Box<dyn std::error::Error>> 

You're on the right track with respect to your last question, except that dyn is indicative of a trait object. I'll refer to text from the rust book that you might find helpful Link

A trait object points to both an instance of a type implementing our specified trait as well as a table used to look up trait methods on that type at runtime. We create a trait object by specifying some sort of pointer, such as a & reference or a Box<T> smart pointer, then the dyn keyword, and then specifying the relevant trait.

We can use trait objects in place of a generic or concrete type. Wherever we use a trait object, Rust’s type system will ensure at compile time that any value used in that context will implement the trait object’s trait. Consequently, we don’t need to know all the possible types at compile time.

2 Likes

A couple other things that I left out. To call the function with the signature I showed in the last comment, you could do the following

copy_and_count::<_, Reservation>(reservations.as_bytes(),writer)

The _ can be used to let the compiler infer the types that you don't want to specify. This only works if the compiler can determine what the type is, but in this case it would be able to since the type of writer should be known by the caller.

Another way to implement something like this, which might have a syntax you like more, would be to create a trait with a copy_and_count function. And then implement the trait for any structs you want to call copy_and_count for. The function could then be called via Reservation::copy_and_count(...). See this Playground Link for an example. If you give the trait a default definition for the function, then you still only need to define it one time.

1 Like

Good question. In my other implementation I am passing a Vec<u8> to write transformed bytes to. That implementation creates a new CSV document by appending a column and copying the source document's fields.
I applied your tip to use _ to let the compiler infer the appropriate type.
So the clearer example is:

/// copy and existing CSV into a new document whilst prepending a column
fn copy_and_append_column<T: serde::de::DeserializeOwned + Serialize, W: std::io::Write>(
    bytes: &[u8],
    writer: &mut W,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut reader = ReaderBuilder::new().from_reader(bytes);
    let mut output = WriterBuilder::new().has_headers(false).from_writer(writer);
    for result in reader.deserialize() {
        let record: T = result?;
        output.serialize(("new col", record))?;
    }
    Ok(())
}

I do like using a trait for grouping this functionality.
trait Count: serde::de::DeserializeOwned also clicked here. Any type that implements the trait Count must also have an implementation for DeserializeOwned.

And while impl Count for Reservation {} may look odd to other readers, this is how we tell the compiler Reservation should delegate to the trait's default implementation.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.