How to return either std::iter::Take or std::iter::Empty?

// Make `key` length equal to that of `s`
fn key_to_use<'a>(key: &'a str, s: &'a str) -> impl Iterator<Item = &'a u8> {
    let rem = if key.len() < s.len() {
        iter::repeat(b'a')
            .take(s.len() - key.len())
    } else {
        iter::empty::<u8>()
    };

    key.chars()
        .map(|ch| ch as u8)
        .take(cmp::min(key.len(), s.len()))
        .chain(rem)
}

One of the compilation errors is:

error[E0308]: `if` and `else` have incompatible types
  --> src/lib.rs:38:9
   |
34 |        let rem = if key.len() < s.len() {
   |   _______________-
35 |  |         iter::repeat(b'a')
   |  |_________-
36 | ||             .take(s.len() - key.len())
   | ||______________________________________- expected because of this
37 |  |     } else {
38 |  |         iter::empty::<u8>()
   |  |         ^^^^^^^^^^^^^^^^^^^ expected struct `std::iter::Take`, found struct `std::iter::Empty`
39 |  |     };
   |  |_____- `if` and `else` have incompatible types
   |
   = note: expected type `std::iter::Take<std::iter::Repeat<u8>>`
            found struct `std::iter::Empty<u8>`

itertools has a simple helper enum called Either for this. You can put one iterator in an Either::A and the other in an Either::B.

I'm aware of Either but what's the point of iterator if it can only assume one usable type.

In this specific case, you can replace iter::empty() with iter::repeat(b'a').take(0).

In general case, what do you mean by "what's the point"?

2 Likes

Some other alternatives for the general case:

fn key_to_use<'a>(key: &'a str, s: &'a str) -> impl Iterator<Item = u8> + 'a {
    let rem: Box<dyn Iterator<Item = u8>> = if key.len() < s.len() {
        Box::new(iter::repeat(b'a').take(s.len() - key.len()))
    } else {
        Box::new(iter::empty::<u8>())
    };

    key.chars()
        .map(|ch| ch as u8)
        .take(cmp::min(key.len(), s.len()))
        .chain(rem)
}
fn key_to_use<'a>(key: &'a str, s: &'a str) -> impl Iterator<Item = u8> + 'a {
    let rem = if key.len() < s.len() {
        Some(iter::repeat(b'a').take(s.len() - key.len()))
    } else {
        None
    };

    key.chars()
        .map(|ch| ch as u8)
        .take(cmp::min(key.len(), s.len()))
        .chain(rem.into_iter().flatten())
}

Also, key.chars().map(|ch| ch as u8) should really be key.bytes(), since you're comparing the lengths in bytes.

Or in this specific case, the logic can be simplified:

use std::iter;

fn key_to_use<'a>(key: &'a str, s: &str) -> impl Iterator<Item = u8> + 'a {
    key.bytes().chain(iter::repeat(b'a')).take(s.len())
}
6 Likes

This one can also be expressed as iter::repeat(b'a').take(s.len().saturating_sub(key.len())).

3 Likes

The clunkiness of itertools::Either reminds me of the equivalent for Future, futures::future::Either.. Which I used to use a lot until async functions came around. I wonder if this is a case generators will help make more ergonomic..

I expect operations on iterators will still produce iterators, until a terminal operation turns it into something else. If I perform one operation and it becomes something that is no longer an iterator type, it's not very useful for passing around.

Thank you for providing the options. As a beginner, it helps me learn. I think I like the first option the most for the general case, although it requires considerably more typing.

There is no singular iterator type, and I think that's what's tripping you up here?

Iterators in general were added to Rust to make it easy to write code using any iterator, and to have a general interface for iterating over then consuming values. It might help to know that they weren't designed for the sake of producing arbitrary iterators.

I think you're write to draw connections to Futures when talking about -> impl Iterator<...> in particular! The -> impl X feature is most used with Futures, and it -> impl Iterator has the same downside as -> impl Future (and all -> impl Trait usages) that you can only return one concrete type.

The advantage of this model is that we retain maximum efficiency - any code generated using your returned iterator is optimized specifically for its usage, and this lets Rust iterators match and exceed for loops in performance. The disadvantage is that the process of taking multiple different iterators and returning them (or setting a variable to either of them) is very unintuitive.

2 Likes

That's what impl Iterator<Item = T> means.

If you want something like you might see in a GC'd language, you might consider Box<dyn Iterator<Item = T>> which has similar advantages (type erasure) and disadvantages (allocation, virtual calls) to the ones you'll see in many other languages.

2 Likes

But that's exactly what happens. The problem is not that either or both of these aren't iterators; they both are. However, Rust is still statically typed, and iter::empty() has a different type than iter::repeat(…) so you can't mix them like that.

This has nothing to do with iterators per se, and everything to do with the language being statically typed. If you are trying to return two different types from a function, that won't work, regardless of whether they are iterators or something else.

2 Likes

That has nothing to do with the language being statically type (cue Scala), and everything to do with zero-cost abstractions and a lack of proper sum type handling in the language. At the very least Either should be a type in libcore, implementing all standard traits (Iterator, Future, Fn, etc). Ideally there should be language-level support for arbitrary anonymous sum types, so that the OP's example would work with minimal modifications.

Yeah, but it still wouldn't (shouldn't) work as-is, exactly because of static typing. A value (be it a function return value or anything else) has exactly one type, and you can't just assign differently-typed values to it. An enum effectively opts in to a level of dynamism over a closed set of types.

2 Likes

One problem is that it can't -- for example, it can't forward both IntoIterator and Iterator.

Not to mention that there are choices about how to implement the traits. For example, what's the item type for Either<A, B>? Is it Either<A::Item, B::Item>? Or is it just T, requiring A: Iterator<Item = T> & B: Iterator<Item = T>? Both of those are perfectly justifiable.

4 Likes

Both are justifiable, yet one is strictly more general than the other. Either<A, B> should return Either<A::Item, B::Item>, and if A::Item == B::Item == Item, then one could use something like .map(Either::into_inner)to convert into an iterator overItem`.

The issue with Iterator and IntoIterator is indeed unfortunately blocked on specialization. Personally I would prefer to skip the IntoIterator impl since it can be easily side-stepped with an explicit .into_iter() call, but it surely isn't as pretty as desirable.

That's true, but interestingly it's not what the either crate does: https://docs.rs/either/latest/either/enum.Either.html#impl-Iterator

So that's just more evidence that the "correct" thing to do is non-obvious.

2 Likes

While this is true from a logical point of view, Item = Either<A::Item, B::Item> still incurs a branch for every item, which might be undesirable in some cases.