Rust iterators and looking forward (peek/multipeek)


#1

Cross posted on SO Really sorry and hope I am not wasting anyones time. Just looking for a pointer if possible.

I am trying to use a pattern with iterators in Rust and falling down somewhere, apparently simple.

I would like to iterate through a container and find an element with a predicate [A] {simple}, but then look forward using another predicate and get that value [B] and use [B] to mutate [A] in some way. In this case [A] is mutable and [B] can be immutable; this makes no difference to me, only to the borrow checker (rightly).

It would help to understand this with a simple scenario, so I have added a small snippet to let folk see the issue/attempted goal. I have played with itertools and breaking into for/while loops, although I want to remain as idiomatic as possible.

Silly Example scenario

Lookup an even number, find next number that is divisible by 3 and add to the initial number.

#[allow(unused)]
fn is_div_3(num: &u8) -> bool {
    num % 3 == 0
}

fn main() {
    let mut data: Vec<u8> = (0..100).collect();

    let count = data.iter_mut()
        .map(|x| {
            if *x % 2 == 0 {
                // loop through numbers forward to next is_div_3,
                // then x = x + that number
            }
            true
        })
        .count();

    println!("data {:?}, count was {} ", data, count);
}

playground


#2

Here’s one solution using itertools. (Note: I’m not sure whether you wanted to count all of the items, or just the ones that matched the first predicate; adjust as needed.)

extern crate itertools;
use itertools::Itertools;

fn is_div_3(num: &u8) -> bool {
    num % 3 == 0
}

fn main() {
    let mut data: Vec<u8> = (0..100).collect();
    let mut count = 0;
    {
        let mut iter = data.iter_mut().multipeek();
        while let Some(x) = iter.next() {
            count += 1;
            if *x % 2 == 0 {
                while let Some(y) = iter.peek() {
                    if is_div_3(*y) {
                        *x += **y;
                        break;
                    }
                }
            }
        }
    }
    println!("data {:?}, count was {} ", data, count);
}

#3

Thanks for pointing out MultiPeek!

It allocates internally, though, and it seems that for the OP’s example, there should be a sound way to solve it without requiring intermediate allocations.

I’ve been racking my head on this with Rust’s iterator + type system and coming out a little empty. I think ideally I would want &'a [T]'s IterMut to be able to have an iterator adapter on it which iterates (&'a mut T, IterMutRest<'b>) where

  • 'a is the lifetime of the slice while
  • 'b is the lifetime of the iterator itself so that it must not outlive IterMut's invocation of next().

Thus IterMutRest<'b> would not be allowed to outlive sequential invocations of IterMut's next(). However, this is not possible with rust’s iterators since

pub trait Iterator {
    type Item;
    fn next(&'a mut self) -> Option<Self::Item>;
    // ...
}

doesn’t allow any way to associate the iterator’s lifetime with what it enumerates. Perhaps @BurntSushi’s Streamer trait might be better for this (I don’t have a good feel for it yet)?


#4

That’s a good point. For cloneable iterators this is easy, but for iterators that yield mutable references you’d need something like you describe, to prevent mutable aliasing.

In production code I’d probably just use indexing to solve this currently, if the data were already in some indexable container.


#5

Coming from C++ and Haskell, and as a fan of Stepanov’s “Elements of Programming” if find this comment confusing. Indexing is an iterator (an IndexedIterator or a RandomAccessIterator) and just as much part of the iterator hierarchy as ForwardIterator and BidirectionalIterator. I find Rust’s incomplete implementation of the available iterators to be a bit odd, and slightly Not-invented-here syndrome.

Rust’s “Iterators” are actually Ranges, and not iterators at all. On this basis a slice is an indexed range, and yet I don’t seem to be able to find the traits to implement my own indexed ranges. It all feels very patchy and inconsistent. I wish more people would read EoP.


#6

Your definition of iterator seems to be overly constrained — identified specifically with the EoP / C++ API. But if you notice on the Wikipedia page, C#'s and Java’s iterators follow a similar pattern. Specifically, Rust iterators are not necessarily ranges — instead, they know when they reach the end which is essential for a safe API. If you can design an EoP-style iterator API that can guarantee memory safety (as defined in Rust), I and (I’m sure) others would be extremely interested in that (though it would be better if it were a separate thread, as I would love to just focus on the OP’s problem here). I am particularly interested in cases like the one presented by the OP since they come about in some graph algorithms where one needs to update every node based on its neighbors.


#7

Let’s leave the naming issue to one side. Stepanov defines a hierarchy of iterators based on the abstract algebra of data access patterns. The key insight, and the foundation of generic programming is to study the algorithm not the data structure. Algorithms get classified by their data-access requirements, and iterators turn those classifications into APIs.

So lets look at the algorithm, it requires access to two different locations at the same time. This brings us to the semantic difference between a plain “Iterator” and a “ForwardIterator”. A plain Iterator models something like a stream, you cannot go backwards, and you cannot hold a reference to any element except the current element (because the underlying stream advances on call to ‘next’), technically this means the successor fuction of an Iterator is not a "Regular Procedure/Function’ (It does not always give the same output for the same input), A ForwardIterator is multi-pass, you can go through the collection multiple times, and that means you can copy the iterator and it’s successor function is a regular procedure.

Does Rust distinguish between Iterators and ForwardIterators? If it does not it is not capable of expressing the semantic difference between these two cases. This is important because the algorithms for a ForwardIterator are different from those for a plain Iterator. To use some code from my Rust translation of EoP to illustrate, here is the “find_adjacent_mismatch” algorithm, a simpler algorithm with the same requirement to ‘peek’ at the current elements neighbour. Probably the important bits to read this code are that ‘next’ is split into:

fn successor(self) -> Self // from Iterator trait
fn source(&self) -> &Self::ValueType' // from Readable trait

However I don’t think these differences change the fundamental point, and the distinction between plain Iterators and ForwardIterators could be made with Rust style iterators.

pub fn find_adjacent_mismatch<I, R>(mut f : I, l : &I, mut r : R) -> I
where I : Iterator + Readable, R : FnMut(&I::ValueType, &I::ValueType) -> bool {
    // Precondition: readable_bounded_range(f, l)
    if f != *l {
        let mut x = (*f.source()).clone();
        f = f.successor();
        while (f != *l) && r(&x, f.source()) {
            x = (*f.source()).clone();
            f = f.successor();
        }
    }
    f
}

The above is for a plain Iterator (using Stepanov style iterators, you can find the definitions in the rust translation of EoP above if you are interested). Note, it has to clone the element because it cannot copy the iterator (and Rust allows this semantics to be enforced, the Iterator is neither ‘Copy’ nor ‘Clone’ and ‘successor’ consumes the old iterator to produce the next one). Contrast with:

pub fn find_adjacent_mismatch_forward<I, R>(mut f : I, l : &I, mut r : R) -> I
where I : ForwardIterator + Readable, R : FnMut(&I::ValueType, &I::ValueType) -> bool {
    // Precondition: readable_bounded_range(f, l)
    if f != *l {
        let mut t = f.clone();
        f = f.successor();
        while f != *l && r(t.source(), f.source()) {
            t = t.successor();
            f = f.successor();
        }
    }
    f
}

This is a different algorithm, that requires a ForwardIterator because it needs to copy (Clone) the iterator in order to access the previous element.

So in general there is no solution to this problem, unless you distinguish between plain Iterators and ForwardIterators, otherwise if you allow iterators to be copied/cloned some algorithms will go-wrong on some collections (streams / channels / event-queues etc), or you are restricted to only one class of algorithms (the algorithms definable on plain iterators). Also it is worth noting that a BidirectionalIterator is by necessity a ForwardIterator, so it looks to me like Rust is missing a class of iterator between the plain and the bidirectional. Rather than just creating iterators ad-hoc, it is better to study the classes of algorithms and define iterators for those classifications. Fortunately Stepanov has already done this for us.

Edit: Unfortunately a Rust DoubleEndedIterator is not semantically the same as a bidirectional iterator. A bidirectional iterator can go left/right/left etc, it does not consume input as such. This makes it clear there is a real semantic difference between what rust calls an iterator and what Stepanov calls an iterator. Rust iterators always consume something, whereas only Stepanov’s plain Iterator models consumption. As far as I can see the Rust standard library has no answer for the higher iterator concepts, and this severely limits the generic algorithms that can be written. However to have the generic power of higher iterator concepts appears to require the sacrifice of some safety (IE you can make an iterator go out of bounds if you make a mistake). I have some ideas about a solution, but that’s for a different thread.


#8

Sorry, to clarify: I mentioned iterators in other languages only to point out that rust does not suffer from “not-invented-here-syndrome”. Rust, as far as I have seen, has done exactly the opposite — learning as much as possible from other languages and incorporating the ideas. However, as you are well aware by now, rust brings a very subtle wrinkle with lifetimes and aliasing control in safe rust code, which is what makes nested-iterator algorithms with mutability much more challenging.

(Note, this is a rephrasing of my first post above, but in the context of your solution.)

Specifically, the solution you provide above for a different problem does not address the case where one of the iterators allows mutations. You have R: FnMut(&I::ValueType, &I::ValueType), but what the OP needs it closer to R: FnMut(&mut I::ValueType, &I) since the first value needs to be updated based on the following items. This requires the original iterator (let’s call it f as you have) to have semantics of both a source and a sink. In the solution I propose, f would be defined to enumerate pairs (&mut val, t) of type (&mut ValueType, &I + 'a), where the iterator I: ForwardIterator + Readable:

  • enumerates the items that come after (and not including) &mut ValueType, and
  • is bounded by the lifetime 'a of f so that f.successor() cannot be called while t is still alive, to avoid aliasing between the previously enumerated t and the next enumerated &mut val.

This API, catered specifically to the OP’s problem, would avoid aliasing issues, irrespective of the underlying container type (as long as the container itself is not consumed by iteration). However, it cannot be implemented with rust’s iterators, as I mentioned above, since the Iterator API does not support tying the Iterator's lifetime to what’s enumerated by the iterator.


#9

I don’t agree with your analysis, Stepanov pretty much created generic programming, why ignore the primary source and only look at derivatives that didn’t understand the concepts and missed the point?

Producing individual APIs for specific problems is not generic programming, the goal is to specify algorithms in the most general way with no loss in efficiency.

You are right of course, Rust does have a problem, because in C++ we can have the outer iterator mutable without a problem.

The issue here is the solutions require changing the algorithm, which as I said is not really a solution. We know the algorithm is not going to mutate a cell that is being read. What we need is something like the concept of sequential mutation. For example it is sound to borrow any value for reading except the current element borrowed by the mutating iterator.


#10

I “think” this is something close, but not sure. I recall seeing a video where there was work at being able to actually mutably borrow several container elements independently or similar. I assume that would mean mix of mutable / immutable borrows per iteration. Perhaps that is closer to a solution ?

I am in no way up to speed with the core though to give details of any of this.


#11

Maybe something like this?

use std::slice::IterMut;
use std::mem::transmute_copy;
trait ReborrowSliceIterMut<T> {
    // "Reborrow" the iterator; this is semantically similar to reborrowing
    // a mutable reference.
    fn reborrow(&mut self) -> IterMut<T>;
}

impl<'a, T: 'a> ReborrowSliceIterMut<T> for IterMut<'a, T> {
    fn reborrow(&mut self) -> IterMut<T> {
        unsafe { transmute_copy(self) }
    }
}

This lets you make a “clone” of an std::slice::IterMut, which solves the problem. The only downside is that it’s impossible to write a generic Reborrow trait.


#12

I don’t think anyone is missing the point of C++'s STL design here; I’ve used it for many years and I understand its benefits pretty well. The only problem is, as you agree, I’m not sure anyone knows how to make EoP work with safe Rust. As I mentioned previously, I’m extremely interested in a solution to it, and I have thought about this problem in the back of my mind since I’ve dealt a lot with graph algorithms; if you or someone else has a proposal, I would really love to hear more.

I also agree that it’s not ideal that we would have to change the algorithm – hence the solution I proposed, which is intended to fit precisely the case of this algorithm. To me, that begged the question of what generic iterator combinators would help, and @mbrubeck (thankfully) pointed one out in itertools which works, but unfortunately employs allocations underneath. Again, I’d love to know any other strategies that you may have to propose within safe Rust.

Is your solution safe? You have two copies of IterMut<> to the same data, resulting in mutable aliasing?


#13

IterMut itself is just a couple of raw pointers… so the question is whether you can call next() on the outer IterMut while the inner IterMut is live. The answer is that you can’t. The borrow checker enforces this:

fn main() {
  let mut v: Vec<u8> = vec![0; 100];
  let mut a = v.iter_mut();
  let b = a.reborrow();
  a.next(); // "cannot borrow `a` as mutable more than once at a time"
}

#14

Presumably its copying the iterator only not the data? It looks like it still has one problem and that is that it now aliases the element pointed to by the mutable iterator. In theory we don’t want read access and mutate access to the same element at the same time.

What is needed is something that takes the mutable iterator and splits it into two immutable iterators, one that holds the location of the original iterator, and one that you can advance through the rest of the collection. At the end there needs to be a way to recombine the two immutable iterators back into a single mutable iterator pointing at the original element.


#15

This is a much better description of the algorithm I am looking at. It’s the start of a pattern that may be very handy actually. Not only in this specific pattern, possibly the reverse (the first hit mutates second data item). What is interesting is the ability to safely do this and in a way that uses the minimum of allocation.