Rust pushes users into those approaches because of a hole in the type system.
I have no problem with generics when they're appropriate. I use them throughout my code. But this type simply isn't generic. There is only one type I: Iterator<..>
that is ever constructed.
I've considered using Box<dyn Iterator<...>>
but that has significant performance implications. Sometimes I do collect my child iterators into Vecs - but again, thats not ideal in many cases for the same reason.
It has been asked a few times why custom iterators come up so much in my code.
The program in question is diamond-types, a library for realtime collaborative text editing. We recently published an academic article on the algorithm, and that article includes benchmarks comparing to other contemporary (and at this point, very well optimised) CRDT libraries.
Performance matters. In figure 10 of the paper, the high peak memory usage for eg-walker happens as a direct result of collecting one of my iterators into a vec.
The library stores & manipulates text editing events. Humans often type in sequential runs, for example:
- Insert "T", position 100
- Insert "e", position 101
- Insert "x", position 102
- Insert "t", position 103
- ... and so on.
Editing events also have IDs (which are also mostly sequential) and "parent versions" - which again, are almost always sequential in practice. Storing & processing individual editing events one by one is much slower than dealing in runs "all at once". So we store all this data in memory (and on disk) in a run-length encoded format:
- Insert "Text", position 100-104
- Delete items 50-59
- ... etc
But even this is a simplification. We actually use a struct-of-arrays set of formats, where each "array" stores a different field. (We have one struct storing the editing positions, another storing the IDs, another storing parent versions, and so on).
I wrote a blog post going into this optimisation (and many others) in more detail on this a few years ago.
This is very efficient, but as you might guess, I've ended up with a lot of specialised traits (generically) describing items which can be losslessly combined & split. And specialised collection types which can store & process internally run-length encoded data efficiently. Eg RleVec or a custom order-statistic tree. And some specialised combinators, like rle zip.
As you can imagine, custom iterators get produced and consumed all over the place. And I basically can't use any of the combinator functions (map
, filter
, fold
, etc) because the produced iterator isn't as versatile as I usually need it to be. Performance matters (and so does wasm bundle size) - so I avoid Box<dyn Iterator<..>>
and .collect<Vec<_>>
where I can. The poor ergonomics around impl Iterator
, and the lack of coroutines (aka generators) make my code significantly more complicated than it needs to be.
Traits for splitting & merging RLE items are fabulous in rust. But iterators are, unfortunately, much easier to write in javascript because it doesn't have rust's limitations.