New iterator adaptors from existing iterator adaptors

Let's say I have some way of transforming a char using some lookup structure.

struct LookUp(Vec<char>);

impl LookUp {
    fn from(s: &str) -> LookUp {
        LookUp(s.chars().map(|c| c.to_ascii_lowercase()).collect())
    }
}

fn transform_char(c: char, look_up: &LookUp) -> char {
    if look_up.0.contains(&c.to_ascii_lowercase()) {
        c.to_ascii_uppercase()
    } else {
        c
    }
}

I want to use it to transform an Iterator<Item = char> into another Iterator<Item = char>, composing already existing adaptors. Of course I can do it with functions (here are two version, one with more implicit impl Trait and the second more explicit)

fn transform<'a>(
    input: impl Iterator<Item = char> + 'a,
    look_up: &'a LookUp,
) -> impl Iterator<Item = char> + 'a {
    input.map(move |c| transform_char(c, look_up))
}

fn explicit_transform<'a, I>(
    input: I,
    look_up: &'a LookUp,
) -> std::iter::Map<I, impl FnMut(char) -> char + 'a>
where
    I: Iterator<Item = char>,
{
    input.map(move |c| transform_char(c, look_up))
}

but then using them is a bit annoying because I cannot chain the methods chars, transform and collect (imagine the annoyance in a more involved situation):

fn main() {
    let input = "Rust is fantastic!";
    let look_up = LookUp::from("rust");

    let output: String = transform(input.chars(), &look_up).collect();
    println!("{}", output);

    let output: String = explicit_transform(input.chars(), &look_up).collect();
    println!("{}", output);
}

What I would like to do is to add a method .transform(look_up) to every Iterator<Item = char>. One possible way of doing this is with

trait Transform: Sized {
    fn transform<'a>(
        self,
        look_up: &'a LookUp,
    ) -> std::iter::Map<Self, Box<dyn FnMut(char) -> char + 'a>>;
}

impl<I> Transform for I
where
    I: Iterator<Item = char>,
{
    fn transform<'a>(
        self,
        look_up: &'a LookUp,
    ) -> std::iter::Map<Self, Box<dyn FnMut(char) -> char + 'a>> {
        self.map(Box::new(move |c| transform_char(c, look_up)))
    }
}

and the usage becomes much more comfortable because I can now chain the methods.

fn main() {
    let output: String = input.chars().transform(&look_up).collect();
    println!("{}", output);
}

The downside is that I have to use dynamic allocation with Box<dyn Trait> because impl Trait is not permitted inside traits.
Here is a link to the code so far.

The code I would like to write is more similar to this

trait Transform: Sized {
    fn transform<'a>(
        self,
        look_up: &'a LookUp,
    ) -> std::iter::Map<Self, impl FnMut(char) -> char + 'a>;
}

impl<I> Transform for I
where
    I: Iterator<Item = char>,
{
    fn transform<'a>(
        self,
        look_up: &'a LookUp,
    ) -> std::iter::Map<Self, Box<dyn FnMut(char) -> char + 'a>> {
        self.map(move |c| transform_char(c, look_up))
    }
}

where I'm forced to use impl FnMut(char) -> char because I cannot write down the explicit type of the closure move |c| transform_char(c, look_up). Unfortunately, this is not currently possible. It's not even possible to solve the problem with an associated type (as far as I can see).

The last alternative is to use the strategy adopted by Itertools. In that case, every iterator adaptor returns a new structure specifically designed for the task. The problem I have with that is the proliferation of unnecessary Iterator structures, where my resulting iterator could be instead expressed as a combination of already existing Map, Filter, etc... Also this results in much more code, because I have to replicate the logic of Map, Filter, etc. instead of simply returning them.

What is your recommendation in this regard?

Why don't you create both an extension trait and iterator wrapper type?

It is usual to define custom iterator wrapper when you cannot define a type for an iterator (because of closure).
Even if the implementation is combination of existing combinators, I believe it is the best to define custom wrapper type to hide the (internal) type of the closure.

What you suggest is precisely what I was referring to in:

Inside your impl<'a, I: Iterator<Item = char>> Iterator for TransformIter<'a, I>, the next() method (lines 43-46) replicates the logic of Map: iterator. Imagine the situation if my iterator adaptor had to return a Map<Filter<Map<Iter, some_closure>, some_closure>, some_closure>. Then in your custom TransformIter.next() I would have to replicate all this mapping and filtering.

The advantage of the first two approaches (functions and trait with Box<dyn _>) is that I can reuse all the already existing iterators such as Map, Filter... What is the benefit of having them if I cannot compose them easily and I have to rewrite the Iterator::next method every time on some custom structure?

I think the approach you suggest (that is, the one adopted by Itertools) is fine if the returned iterator has to do something substantially new (chunks, interleaving, merging, dedup, grouping...). It is not satisfactory if the returned iterator can be easily expressed in terms of already existing iterators (because you have to replicate all the logic!).

I think std::iter::Iterator combinators are quite primitive, and using them is not so different than using primitive control structures (such as ?, if, match, etc.).
I agree it would be better to use combinators when both (raw control structure and combinator) are easy, because combinators provide higher-level "meaning" to the users.
However, I think it is good to write raw control structure when combinators are hard to use.

Another aspect is that custom wrapper is more future proof.
If you wanted to change internal implementation of the iterator, bare std iterator wrappers easily lose compatibility.
For example, you cannot change iter.map(|v| foo(bar(v)) to iter.map(bar).map(foo) without changing return type signature (even if they are doing the same thing).

Types such as Filter<Map<Iter, {some_closure_type}>> is abstract enough, and users can know quite few about its internals and "meaning"s.
(Filter may change the number of elements, and Map may change the values. Then what users can expect from the type signature?)
So, I don't think it is always better to return Filter<_, _> or Map<_, _>, and I prefer to custom wrapper in some cases (including your example).

Returning impl Iterator<Item = char> works well enough if one accepts to use functions instead of methods (hence giving up the convenience of methods chaining, which I consider quite a big downside):

fn transform<'a>(
    input: impl Iterator<Item = char> + 'a,
    look_up: &'a LookUp,
) -> impl Iterator<Item = char> + 'a {
    input.map(move |c| transform_char(c, look_up))
}

The return type is opaque, so it doesn't matter to the user if it is Map<_, _> or Map<Map<Filter<Map<_, _>, _>, _>, _>, and the implementation can be easily refactored.

The point is that iterator adaptors would be extremely convenient to use in the situations I have in mind. I can express all the logic I need by combining existing iterators. I find very frustrating that they are there but cannot be combined easily in library code, to produce intermediate adaptors. They are convenient to use only for the end user.

The two options I see are:

  1. Use functions: can return impl Iterator, cannot chain methods
  2. Use extension trait: can chain methods, must use Box<dyn _> because cannot return impl Trait.

From a convenience point of view, 2 is better than 1, but I would like to get rid of the dynamic allocation.

I would like very much to not have to rewrite all the logic by hand in a next() method. Since it can be arbitrarily complicated (filtering, mapping, interleaving, deduplicating, skipping...), the task is too error prone. The adaptors from std::iter::Iterator and itertools::Itertools are there for a reason: they do complicated things and it's much better to compose them rather than rewrite all the involved logic that you get once you start composing them.

More precisely, my question can be: is it possible to have the convenience of option 2 with the performance characteristics of option 1? (Avoiding the unnecessary dynamic allocation).

I imagine it would be very easy if impl Trait inside traits were a thing, but maybe there is a workaround that I'm missing.

But most importantly, I want to avoid having to rewrite all the logic by hand. Is there a nice way to compose iterator adaptors with method syntax and without dynamic allocation?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.