Can somene review my learning exercise?

Here is some code that filters a sequence, just as a learning exercise. I'm sure this functionality can be found in some crate, but I need some learning. I'd appreciate some feedback. TIA

use std::collections::{HashSet};
use std::hash::Hash;

struct Distinct<'a, T>
where T: Eq + Hash + Clone
{
    keys: HashSet<T>,
    iter: &'a mut dyn Iterator<Item = T>,
}

impl<'a, T: Eq + Hash + Clone + Sized> Distinct<'a, T> {
    fn new(iter: &'a mut dyn Iterator<Item = T>)-> Distinct<'a, T> {
        Distinct { keys: HashSet::new(), iter: iter }
    }
}

impl<T: Eq + Hash + Clone> Iterator for Distinct<'_, T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        while let Some(candidate) = self.iter.next() {
            if self.keys.contains(&candidate) {
                continue;
            }
            self.keys.insert(candidate.clone());
            return Some(candidate);
        }
        return None;
     }
}

fn main() {
    let nums = [0, 1, 2, 3, 2, 1, 4, 3, 2, 1, 4, 7, 5];
 
    let mut nums_iter = nums.iter();
    let mut distinct = Distinct::new(&mut nums_iter);
    
    while let Some(item) = distinct.next() {
         println!("Item: {}", item);
    }
}


Item: 0
Item: 1
Item: 2
Item: 3
Item: 4
Item: 7
Item: 5

I could not find a way to store the Iterator in my struct, was getting compiler errors about size that is unknown at compile time. So I am storing a mutable reference. So I have to do this mut thing twice, which looks ugly

    let mut nums_iter = nums.iter();
    let mut distinct = Distinct::new(&mut nums_iter);

Is there a way to move the iterator, and store it in a struct?

1 Like

Instead of erasing the type of the iterator (i.e. using dyn Iterator), you can make Distinct generic over the iterator type. This will allow you to get rid of the lifetime parameter, too.

struct Distinct<T, I> {
    keys: HashSet<T>,
    iter: I,
}

Your implementations will naturally have to change to match.

impl<T, I: Iterator<Item=T>> Distinct<T, I> {
    fn new(iter: I) -> Distinct<T, I> {
        let keys = HashSet::new();
        Distinct { keys, iter, }
    }
}

impl<T, I> Iterator for Distinct<T, I>
where
    T: Eq + Hash + Clone,
    I: Iterator<Item=T>,
{
// ...

Then in main, you can do

    let nums_iter = nums.iter();
    let distinct = Distinct::new(nums_iter);
    
    for item in distinct {
         println!("Item: {}", item);
    }

Playground.

I've made some other preferential/stylistic changes, mostly not mentioning bounds where you don't need to [1]. But note that I used a for loop specifically, instead of a while let loop. Sometimes you need while let with an iterator, but if you don't, it's idiomatic to use for.


  1. I left a bound on new so that the parameter fully determines the generic types ↩︎

3 Likes

The use of generics rather than dyn (runtime polymorphism) is also the key for the very impressive performance of iterators: thanks to monomorphization, even long iterator chains, leading to deeply nested next() call stacks, can be aggressively inlined and then optimized in various ways that would be impossible to achieve across stack frames, besides the more obvious fact that indirect function calls via a vtable are themselves quite costly.

2 Likes

This is most helpful, thanks a lot!

I was wondering if we can peek at the source code of standard things like Filter and Map. All I was able to find so far is a thin public wrapper over some hidden implementation, like this:
where I can see a thin wrapper fn filter but not the actual Filter. Can you share a link?

I feel curious about these performance implications. Can you share some links where I can read more about it? TIA!

One more thing: can you elaborate on when exactly we "need while let with an iterator"? TIA!

Look at the top of the file, find the use item that imports Filter, and find the module from which it was imported.

1 Like

And there it is Thanks a bunch!

Sure, basically, when you need to interact with the iterator in more ways than just consuming one item per loop, like perhaps you need to call it a second time inside the loop:

    while let Some(arg) = args.next() {
        match arg.to_str() {
            Some("-h") | Some("--help") => todo!(),
            Some("--config") => {
                config = args.next().expect("--config requires a path").into();
            }
            _ => files.push(arg.into())
        }
    }

I've also needed to replace an iterator from within a loop:

    while let Some(line) = lines.next() {
        let line = line?;
        if line.starts_with("@") {
            file = File::open(&line[1..])?;
            reader = BufReader::new(file);
            lines = reader.lines();
        }
    }

(Both examples contrived for illustration.)

This last one is tangential, but there exist iterator-like patterns that Rust's Iterator trait cannot handle -- the most prominent one being a "lending iterator", that wants to hand out a reference to some internal buffer or other data on each call to next. [1] You may occasionally run across a library with [2] a data structure that has a .next() method, but isn't actually an Iterator -- it would be a LendingIterator, if we had that trait.

Because it's not an Iterator [3], you can't use the for loop, so you have to use while let Some(...). [4]


  1. When handing out a borrow of yourself, the borrow has to end before the consumer calls .next() again, so the consumer can only hold one item at a time. But Iterator can't support this -- instead it supports things like collect() where you hold on to every item at the same time. ↩︎

  2. or implement yourself ↩︎

  3. actually, because it's not an IntoIterator -- all Iterators are IntoIterator by design, and it's IntoIterator you need for for ↩︎

  4. Rust will probably get LendingIterator and IntoLendingIterator traits at some point, and hopefully for will be able to consume them too, some day. ↩︎

1 Like