How does one go about converting an iterator adaptor chain into an iterator for a struct?

Hi,

A specific solution to my specific problem is appreciated.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=0e28d5bf71f29d95d07b8c7b69cc5b07
And I'd also like to learn a bit more generally :slight_smile:

I'm fairly new to Rust, but have read the Book and many examples for Iterators.
I am aware many explanations and tutorials exist.

I have found that most explanations/tutorials for Iterators have one (often simplistic) example for which they show how to implement an iterator. For most of the steps or syntax, they don't really explain why they are necessary.
Being pretty experienced in C++ and having only a semi grasp on Rust lifetime (notations), I found it hard to write my own Iterators, because the mental model is so different in Rust.
It appears my problems never fit the simplistic tutorials or the examples were so specific that I cannot parse out the important things.
In my attempts I have often found myself in a situation where I have a working iterator adaptor chain such as

[o.a.map(|v| Options::A(v)), o.b.clone().map(|v| Options::B(v))]
.into_iter()
.flatten()

which I now want to shape into an iterator for o.

From what I have read and understand, the following factors are important for the solution of the implementation:

  • Do you introduce new "state" (here the array [...])?
  • Do you want to consume or borrow?
  • Are you ok with a Box allocation?

I also know about some tools at my disposal, which I understand in isolation:

  • Iterator, IntoIterator, Box, dyn, impl, Cow

But I don't know how to put the pieces together to reliably get from an iterator chain like above to my own iterator, because iterator types change on each chain step and some borrow and some consume.

Do you have some guidance/ resources on how to go about converting an iterator adaptor chain into an iterator implementation?
I am aware this is kinda an open ended question, which is why this is not on Stackoverflow.
I think a mapping from "if you want this" use "IntoIterator", "if you want this", use "Iterator" would be helpful.
I think I am lacking a mental model for when iterator adaptors (map, flatten, collect) are borrowing and when they are consuming.

To kickstart the conversation, I have the above example in a playground with a missing iterator implementation:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=0e28d5bf71f29d95d07b8c7b69cc5b07

Many thanks in advance.

So, let's say you have this:

impl TwoOptions {
    pub fn iter(&self) -> impl Iterator<Item=Options> + '_ {
        [self.a.map(|v| Options::A(v)), self.b.clone().map(|v| Options::B(v))]
            .into_iter()
            .flatten()
    }
}

But you wish you had a stand-alone iterator struct, etc.

pub struct TwoOptionsIterator { /* stuff */ }
impl Iterator for TwoOptionsIterator { /* ??? */ }

The compiler tells you you're missing a couple things...

impl Iterator for TwoOptionsIterator {
    type Item = Options;
    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

OK, that's the scaffolding. What will we need to actually get the behavior? If we look at the type of our iter return, it is...

Flatten<std::array::IntoIter<Option<Options>, 2_usize>>

So in this case, you could just do this!

pub struct TwoOptionsIterator {
    inner: std::iter::Flatten<std::array::IntoIter<Option<Options>, 2_usize>>,
}

impl Iterator for TwoOptionsIterator {
    type Item = Options;
    fn next(&mut self) -> Option<Self::Item> {
        self.inner.next()
    }
}

impl TwoOptions {
    pub fn iter(&self) -> TwoOptionsIterator {
        let inner = /* as before */;
        TwoOptionsIterator { inner }
    }
}

However, this isn't always desirable, because

  • You have to name the Item associated type in your Iterator implementation
  • Things closures (and some other things) cannot be named
  • You can often use dyn Iterator or dyn FnOnce or the like to get around this...
    • ...but those carry some costs

And I feel it's not really in the spirit of what you asked. So let's take a closer look at what's going on with this:

[self.a.map(|v| Options::A(v)), self.b.clone().map(|v| Options::B(v)) ]
    .into_iter()
    .flatten();

You've basically built up your results pre-emptively ([Option<Options>; 2]). Then, for each thing in your array (into_iter()), you either return Some(Options) or you just skip it and try the next one. Once you've tried everything (whether they returned or not), you don't return anything else.

Since we know we just have zero, one, or two items, we might as well just store them as fields and then reproduce this logic. So here's one way:

pub struct TwoOptionsIterator {
    a: Option<Options>,
    b: Option<Options>,
}

impl Iterator for TwoOptionsIterator {
    type Item = Options;
    fn next(&mut self) -> Option<Self::Item> {
        if self.a.is_some() {
            self.a.take()
        } else if self.b.is_some() {
            self.b.take()
        } else {
            None
        }
    }
}

Here I've taken advantage of the fact that you can only take an Option once. So this is actually a bit inefficient as we test a and b every time. We could instead have stored an array and kept an index as state or the like, but I'll just leave that as an excersizeTM. In general, thinking about how to keep track of where you are in the iteration is part of making an Iterator type.

I also wrote out that if-else tree for ease of clarity, but it can be made more elegant or confusing (depending on your experience) like so:

    fn next(&mut self) -> Option<Self::Item> {
        self.a.take().or_else(|| self.b.take())
    }

Alright, I think that covers your baseline question well enough. I will make an observation that this implementation is based around &self but it clones some inner data, while usually people expect iterators on &self to be borrowing and cheap, and instead something like this would be an iterator that consumes self. So here's that adjustment:

-    pub fn iter(&self) -> TwoOptionsIterator {
+    pub fn into_iter(self) -> TwoOptionsIterator {
         TwoOptionsIterator {
             a: self.a.map(|v| Options::A(v)),
-            b: self.b.clone().map(|v| Options::B(v)),
+            b: self.b.map(|v| Options::B(v)),
         }
     }

And then you probably want to

impl IntoIterator for TwoOptions { /* ??? */ }

So fill in what the compiler told you to:

impl IntoIterator for TwoOptions {
    type Item = Options;
    type IntoIter = TwoOptionsIterator;
    fn into_iter(self) -> Self::IntoIter {
        self.into_iter() // <-- calls inherent method
    }
}

And now you can do things like

    let os = TwoOptions { a: Some(1), b: Some("two".into()) };
    for o in os { ... }

(If you don't want to consume the TwoOptions, just use for o in os.clone() instead.)


That's a lot, so I'll stop here. Borrowing iterators would probably be the next sub-topic.

2 Likes

Thank you! This is a huge help.

I learned from you that .iter() is usually expected to be borrowing and cheap.
And the difference to .into_iter is also much clearer now for me, thank you.

I have some follow-up questions:

  1. You wrote

    impl TwoOptions {
        pub fn iter(&self) -> impl Iterator<Item=Options> + '_ {
              [self.a.map(|v| Options::A(v)), self.b.clone().map(|v| Options::B(v))]
                 .into_iter()
                 .flatten()
         }
    }
    

    What does this anonymous lifetime '_ mean here? Does it just mean "an equal or longer lifetime than of self"? What exactly in the iterator chain causes the need for this?

  2. Is .into_iter usually expected to not clone/ copy itself?

  3. You mentioned there would be a way to not having to name the iterator type such as std::iter::Flatten<std::array::IntoIter<Option<Options>, 2_usize>> with dyn Iterator.
    Coming from C++, I tried the impl route because it should have fewer costs than dyn and no additional cost compared to writing it out:

    pub struct TwoOptionsIterator<I: Iterator<Item = Options>> {
       inner: I
    }
    

    and got stuck. You don't need to tell me how it could be done, but if you tell me it could be done, I'll take that as homework :slight_smile:

  4. You said

    You've basically built up your results pre-emptively ([Option; 2]).

    I assume they are being built up preemptively because it first constructs the array, then iterates. Regardless of whether it makes sense in this case, can I avoid that preemptive construction in this functional style? Maybe with self.a.into_iter().map(|v| Options::A(v)).chain(self.b.into_iter().map(|v| Options::B(v)))?

  5. When would it be good to implement IntoIterator, but not .iter()?

This part was me driving on automatic! The '_ here means "the returned iterator can only last as long as the borrow of &self". The whole signature is short for:

    pub fn iter<'a>(&'a self) -> impl Iterator<Item=Options> + 'a

due to lifetime elision.

The reason I say I was driving on automatic is, this function doesn't actually need it! Because the iterator you return doesn't actually borrow Self, you could leave it off and, in this case, the returned iterator will end up being 'static instead. But with the typical borrowing iterator returned from iter(&self), you may get some error saying you have to specify it.

into_iter typically consumes self, and because it's consuming, it typically doesn't need to clone. If it has to, though, it could -- into being expensive is not so surprising, like iter would be.

See the API conventions here and also here.

Let's say we had this:

pub struct AnOption<T> { opt: Option<T>, }
impl AnOption<u32> {
    pub fn into_iter(self) -> impl Iterator<Item = u32> {
        self.opt.into_iter().map(|x| x + 42)
    }
}

That works, no problem. But what if we then wanted to do this:

impl IntoIterator for AnOption<u32> {
    type Item = u32;
    type IntoIter = (); // <-- Placeholder
    fn into_iter(self) -> Self::IntoIter {
        self.into_iter()
    }
}
/* Error:
   = note: expected unit type `()`
            found opaque type `impl Iterator`
*/

Hmm, okay, let's try...

    type IntoIter = impl Iterator<Item = Self::Item>;
/* Error: error[E0658]: `impl Trait` in type aliases is unstable */

Dang. Let's forget the opaque type then...

impl IntoIterator for AnOption<u32> {
    type Item = u32;
    type IntoIter = (); // <-- Placeholder
    fn into_iter(self) -> Self::IntoIter {
        self.opt.into_iter().map(|x| x + 42)
    }
}
/* Error:
   = note: expected unit type `()`
                 found struct `Map<std::option::IntoIter<u32>, [closure@src/lib.rs:15:34: 15:44]>`
*/

There's an unnameable type. Here is where the type-erasing dyn Trait comes in. We want to return an owned iterator here, so we have to put it in a Box. (dyn Trait always needs to be behind a pointer, like &dyn Trait or Box<dyn Trait>.)

impl IntoIterator for AnOption<u32> {
    type Item = u32;
    type IntoIter = Box<dyn Iterator<Item=u32>>;
    fn into_iter(self) -> Self::IntoIter {
        Box::new(self.into_iter())
        // also works: Box::new(self.opt.into_iter().map(|x| x + 42))
    }
}

Playground.

That illustrates how dyn Iterator can get around unnameable types. In practice though, instead of writing it like this (where dyn Iterator is in your public API), I suggest making a new type that you control which wraps up the Box<dyn Iterator>. Then, later, you may be able to write a different implementation of Iterator and IntoIterator that avoids dyn Iterator (e.g. a struct that stores 42 and adds it when you call next, or whatever), without changing your API (which is a breaking change to people using your API).

I think you mentioned fn earlier. Closures that don't capture their environment can coerce into function pointers, so sometimes that's another way to make a closure-containing iterator chain something you can name. The same advice about not exposing these implementation details in your public API applies. You're also still incurring an indirection through the fn pointer.

Yes, that all looks correct. It's also another example of ending up with unnameable closures in your Iterator type (common with lazy iteration, as that often involves a closure).

Well, when a borrowing iterator doesn't make sense, I suppose. In your OP there wasn't anything like this for example:

pub enum BorrowedOptions<'a> {
    A(i32),
    B(&'a str),
}

And if you never want or need something like this -- if your Options always need to be owned -- you may not have a need for iter in addition to into_iter().

1 Like

Very interesting. In C++, while they also have unnamable types, those can be automatically deducted.

It appears there is a RFC that intends to add some of this functionality to Rust. And a discussion about it and related topics here

Again, thanks a lot, I keep being blown away by the Rust community :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.