Did I solve this refactor well? (passing in something that generates a trait)

I'm a relatively new Rust programmer (< 6 months); I've mostly programmed in GC languages before...

To avoid XY issues, here's my problem. I was refactoring a function at $WORK. The point of the function is to process a CSV, using the Reader object in the csv crate. I've posted a simplified version of what the code looked like here:

The key point is that the function takes in a Reader<Read + Seek>. It does that so that it has access to the seek() method on the reader, which is only defined if the inner reader can seek().

So far, so good. But my refactoring task is to try and make it so that the function can also process GZipped CSVs. OK, no problem, there's a great crate, flate2, for handling GZipped data. Oh, but the handle flate2 returns is Read, but not Seek. Reading up on this makes it seem unlikely that you could make a Seek implementation of gzip data, which TBH makes sense to me intuitively. So, bummer.

But wait--the only reason my function does a seek() is so that it can read in the number of records, then go back to the start of the file and read in the records sequentially (if anyone is interested, the reason it is doing this is so that it preallocate some vectors). So I don't really need the ability to seek()--I just need to be able to get the number of records in the file, and then go through the records.

Ok, I say, this is easy--just read the file twice. Once to count the records, then once to process them. How do I do that? Well, my thought is, change the function to accept something that can generate the reader more than once. I'll generate one reader to count the rows, then a second one for the actual processing.

My instinct, then, is to have my function take in a closure that returns a reader. One problem: what is the type of this thing? I tried stuff like

read_csv_twice(
  g: &dyn Fn() -> Reader<Read>
) 

but I just got errors about unknown sizes. Which makes sense to me; I've read the Rust book chapters about trait objects and that, and I understand why the compiler won't allow that. I just can't tell if there is a way to actually do what I want, or if this kind of "generates a thing of this type" is not a thing that Rust allows.

Anyway, I solved the issue by making enums which handle all the types of things I might want to read. My solution looks like this:

But I'm wondering if this is the best way to solve this.

The way you handle unsized types like dyn Trait and str is to store them behind a pointer of some type -- that's (presumably) why your parameter was a &dyn Fn() and not a dyn Fn(), say.

For owned values, that often means putting them in a Box:

fn read_csv_twice(g: &dyn Fn() -> Reader<Box<dyn Read>>) { /* ... */ }

Though instead of &dyn Fn I'd probably suggest:

fn read_csv_twice<G: Fn() -> Reader<Box<dyn Read>>>(g: G) { /* ... */ }

Alternatively written (for the most part):

fn read_csv_twice(g: impl Fn() -> Reader<Box<dyn Read>>) { /* ... */ }

However, you may not even need to erase the type of the reader, and could just make it generic with a trait bound as well:

// (`fn` type parameters like `R` here are `Sized` by default)
fn read_csv_twice<R: Read>(g: impl Fn() -> Reader<R>) { /* ... */ }

All that being said, if you just need a handful of cases, an enum is fine.

As a side-note, you may want to test if preallocation and two file (decompressions and) reads is actually faster than a single one without preallocation, given representative inputs. You may be able to save yourself some code complexity and come out ahead.

3 Likes

Oh, interesting! I had tried to put a Box around the entire Reader type, but that didn't get me anywhere. It didn't occur to me to just put the inner type as a Box. I'm playing around with it, and it looks to me like that requires my generator to have to put it in a box too, like

        let c = Cursor::new(c_str.to_string().into_bytes());
        ReaderBuilder::new().has_headers(true).from_reader(Box::new(c))

I guess this works because Box<R> implements Read if R implements Read? So it wouldn't work for a generic trait Foo, unless I also implemented it for Box myself?

(BTW, I'm with you here, but the pre-allocation was already present and written by other people, and I didn't want to change too much of the system in one go)

Well, if you require that the generator returns Reader<Box<…>>, then it has to return Reader<Box<….>>. It doesn't look like to me it has anything to do with trait impls – it's simply the fact that types have to match up. If the function signature specifies Reader<Box<dyn Read>>, then you can't feed it a Reader<MyNonBoxedReader>, because MyNonBoxedReader is not the same thing as Box<dyn Read>.

Incidentally, this is why you should follow @quinedot's last suggestion and make the function generic over the reader as well. It doesn't look like you need either a box or a trait object in this case, so why force an allocation and dynamic dispatch?

Ah, sorry--my point was that the trait bound for csv::Reader is that it takes a type of Read, which I didn't clarify. But I can change my Reader<R> to Reader<Box<R>> and it just works, because Box<R> implements Read as long as R implements Read.

I was more surprised that Reader could accept a Box<dyn Read>

Yes, there's an implementation for

impl<R: Read + ?Sized> Read for Box<R>
1 Like