The anti-iterator pattern, does it exist?

Instead of retrieving several items of type T from a data structure, I want to "push" items to that data structure, and I would like to indicate when I have successfully finished pushing them. These operations may be fallible. Passing iterators or callbacks seem to be a bit complex/annoying in regard to error handling.

I wonder what's the idiomatic approach to solve this problem. I came up with an AntiIter trait as shown below. Is such a thing commonly used? What are your solutions when wanting to push a couple of items while

  • being able to indicate completion,
  • being flexible with control flow,
  • handling errors properly.

The use case is writing items into a database. I don't want to keep all to-be-written items inside a Vec as that would mean that all items have to be in memory at the same time). Yet I want to be able to rollback the whole transaction when not all items have been successfully retrieved and stored.

#[derive(Debug)]
enum PushError {} // infallible in this example, but may be used in other cases

// This seems complex:

impl<T> Foo<T> {
    fn push_iter<E, I>(&mut self, iterator: I) -> Result<Result<(), PushError>, E>
    where
        I: Iterator<Item = Result<T, E>>,
    {
        for res in iterator {
            self.data.push(res?);
        }
        Ok(Ok(())) // double-wrapped `Result`, not nice :-(
    }
}

// How about using a trait like this:

trait AntiIter {
    type Item;
    type Error;
    fn next(&mut self, item: Self::Item) -> Result<(), Self::Error>;
    fn commit(self) -> Result<(), Self::Error>;
}

(Full Playground with example)

I called this "Anti-Iterator" because it's like an Iterator, but instead of returning items in the next method, it will consume items.

I think you're looking for the Extend trait and a fallible equivalent of it. (surely a fallible equivalent is being considered in the Rust in Linux project?)

3 Likes

Very interesting. I think there's another substantial difference between Extend and AntiIter though:

In my "anti-iterator" approach, I can signal completion (like an Iterator can indicate completion by returning None from the Iterator::next method). Instead of making AntiIter::next take an Option<T>, I decided to make a separate commit function, as that function can consume self.

I can't use the Extend trait to signal completion, if I understand it right.


Another difference is that the item type is a type paramter instead of an associated type. I think using a type parameter for the Item makes more sense here.

Looking at the Extend trait and using a type parameter instead of an associate type for the item, I came up with the following revised trait:

(Playground) edit: revised, see below:

trait FallibleExtend<T> {
    type Error;
    fn extend<I>(&mut self, iterator: I) -> Result<(), Self::Error>
    where
        I: IntoIterator<Item = T>,
    {
        for item in iterator {
            self.extend_one(item)?
        }
        Ok(())
    }
    fn extend_one(&mut self, item: T) -> Result<(), Self::Error>;
}

trait FallibleExtendAndCommit: FallibleExtend<Self::Item> + Sized {
    type Item;
    fn commit(self) -> Result<(), Self::Error> {
        Ok(())
    }
}

The extend_reserve method could be added too, but not sure about its return type (should it be fallible too?).

Usage is like this:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut foo = Foo::<i32> { data: vec![] };
    foo.extender().extend([1, 2, 3])?; // passing an (Into)Iterator is possible
    let mut extender = foo.extender();
    for x in [4, 5, 6] {
        // but flexible control flow too, e.g. I could return if I want to
        extender.extend_one(x)?;
    }
    extender.commit().unwrap(); // indicating completion
    assert_eq!(foo.data, vec![1, 2, 3, 4, 5, 6]);
    Ok(())
}

(Playground)

Is there something existing like that? Or maybe a combination of traits I could use? Or should I define my own trait? Or is it not worth to provide such an abstraction?

Reminds me very slightly of some traits that rayon uses internally, but that might be merely due to the fact that I’ve read its “plumbing” documentation somewhat recently (an interesting read by the way…), and the use-case might be completely different :slight_smile:

1 Like

It's indeed very similar, with the following differences:

  • Folder::consume not being fallible (but returning Self for method chaining),
  • FallibleExtendAndCommit::commit returning (), but Folder::complete returning an associated Folder::Result type,
  • the additional Folder::full method, which I don't need for my use case,
  • Folder::complete being generic in regard to the Item type (it's a type parameter) while FallibleExtendAndCommit fixes that type parameter (of FallibleExtend) to an associated type.[1]

But still very similar in my opinion.


  1. I feel like providing a commit method which consumes self should fix the item type since it doesn't make sense to push values of different types because you can only call commit for one item type. ↩︎

Interestingly enough, Folder kind-of supports fallibility, too. E.g. look at this module implementing the try_reduce operation.

The way it works is that failure is signaled by the fact that future calls to full return true and further items are ignored. And the error value can then get back out as part of the Result value.

I mean, obviously the approach is different, but IMO it’s at least an interesting thing to look at.

1 Like

I feel like the part with commit doesn't deserve a trait, as it's highly use case specific.

So how about making a FallibleExtend:

pub trait FallibleExtend<T> {
    type Error;
    fn try_extend<I>(&mut self, iter: I) -> Result<(), Self::Error>
    where
        I: IntoIterator<Item = T>;
    fn try_extend_one(&mut self, item: T) -> Result<(), Self::Error> {
        self.try_extend(Some(item))
    }
}

And then implementing the commit method as inherent methods where needed (or as part of another, unrelated trait, in cases there's some abstraction needed):

impl<T> FooExtender<'_, T> {
    fn commit(self) -> Result<(), PushError> {
        // no-op in this example, but may do something in other cases
        Ok(())
    }
}

Usage is still very flexible then:


    foo.extender().try_extend([1, 2, 3])?; // passing an (Into)Iterator is possible
    let mut extender = foo.extender();
    for x in [4, 5, 6] {
        // but flexible control flow too, e.g. I could return if I want to
        extender.try_extend_one(x)?;
    }
    extender.commit()?; // indicating completion

(Playground)

So I feel like the trait I miss is FallibleExtend and nothing more. But it doesn't exist in std and or a commonly used crate, right?


I see how this provides fallability, but I feel like in practice it would be a bit awkward to use. Interesting though.


Rethinking about this, I wonder if FallibleExtend would really be useful. The error type would be specific to the use case too (as well as properly handling it). So perhaps it's not really worth to use a trait for this. Instead, i will probably just use inherent methods:

impl<T> FooExtender<'_, T> {
    fn try_extend<I>(&mut self, iter: I) -> Result<(), PushError>
    where
        I: IntoIterator<Item = T>,
    {
        self.inner.data.extend(iter);
        Ok(())
    }
    fn try_extend_one(&mut self, item: T) -> Result<(), PushError> {
        self.try_extend(Some(item))
    }
    fn commit(self) -> Result<(), PushError> {
        // no-op in this example, but may do something in other cases
        Ok(())
    }
}

(Playground)

But I do like the idea to provide two variants: one accepting an IntoIterator<Item = T> and one accepting a T. It's a bit of boilerplate but makes usage easier.

P.S.: The try_extend method as defined in my last example doesn't provide fallability for the Iterator though. If an item might not be retrieved, I'd have to use the try_extend_one method anyway. If I would want to fix this, I would have to give try_extend a similar signature as the push_iter method in my OP, which I found complex/confusing. And then I'm back to my original problem. :face_with_diagonal_mouth:

P.P.S.: I guess the try_extend method as defined above is still useful, e.g. it allows me to simply provide items through an array or Vec. In cases where I fetch each item using I/O (which could fail), I still can use try_extend_one. The try_extend method could also be used, for example, if I fetch items from I/O in chunks, and errors may only happen in between of each chunk received.

I think whatever I'll do in the end, the cruicial part is that I need some sort of "extender" value, which is similar to an Iterator, but will consume items instead of returning them. I feel like a trait covering everything might not be very useful. So maybe I'd answer my questions as follows:

Indeed.

Return a value (an "extender") which provides methods (inherent or through a trait) to add the items.

This could be a method accepting one item at a time, or several items (by accepting an IntoIterator), or both.

Completion would be indicated by calling a method on that "extender", which consumes self.

1 Like

This reminds me of the Iterator-Observable-Duality, just the Exception value is opposite http://csl.stanford.edu/~christos/pldi2010.fit/meijer.duality.pdf

1 Like

In the async ecosystem, the Iterator-AntiIterator pair that you propose is called Stream and Sink. They are conceptually dual: a Stream is an iterator where the next function is async, while a Sink is a consumer of values. However, if you look at the API of Sink you'll notice that it's overfitted to be an abstraction over async IO write calls, rather than a generic consumer API. For example, its operations are always fallible, and the buffer flushing operation is built in as part of the trait.

I'm not aware of any established sync equivalent. The Extend trait is probably the closest, although it can't be made fallible.

2 Likes

There's also a relevant IRLO thread: Fallible alternative to Extend - language design - Rust Internals, in which several crates are mentioned that have a trait like this. It's mostly called TryExtend, analogous to TryFrom. "fallible extend" is also a good keyword to search for.

1 Like

It's not too mature, but I feel like RxRust supports this?


I'm most familiar with RxJS, but it's a push-based iterator (plus more, I suppose. It's meant to deal with notifications that may arrive over time) with completion and error notifications. You can build an observable from just about anything. Looks like the linked library hasn't implemented some key operators (like retry), but it seems like an opportunity to contribute rather than start from scratch.

At it's most generalized, the "anti iterator" pattern you describe is just internal iteration.

External iteration is Iterator::next; you call a mutation method and get the next iterated element. Internal iteration is Iterator::for_each; you give some method a callback and it invokes the callback for each iterated element.

There's obviously more to what you describe — fallibility of iteration, which isn't handled well by Rust currently for Iterator either (instead iterating over Result with an Err understood to be the last item in the stream); completion signal, which is difficult (but possible) to attach to an impl FnMut's drop glue — but the general shape is internal iteration.

Another name you might see is "output iterator;" any type which is designed to receive instead of produce an iterated stream of elements.

In today's Rust, I'd say it's decently well represented by Extend. There's no "Extend views" currently defined in std, but you can fairly simply imagine a bridge wrapper around a reference which forwards (or buffers!) Extend and commits any finalization on Drop. This is much more common with Write implementations, but applies identically to Extend.

But, disclaimer: I've not seen anyone actually use Extend in this way, so it's potentially novel. Internal iteration tends to just take FnMut and push elements one-by-one (or maybe give FnMut some single concrete iterator type; likely due to the ease of custom FnMut closures) rather than take Extend objects/adaptors (which are much more heavyweight to define) for the potential benefit of providing multiple iterators at the same time. (For the analogy to Write, &[u8] is the typical push unit, an write_vectored (taking roughly &[&[u8]]) is rarely used/specialized in practice, unfortunately.)

There just isn't any std-endorsed way for handling iterators where iteration itself can fail. The closest to fallible iteration in std is ReadDir for iterating over the files in a directory, but it actually could potentially continue after returning an error (e.g. if that error was in reading an entry's metadata, rather than the metadata of the iterated directory).

Fallible iteration is rare in practice. The closest would be "infinite" iterators running into resource limits (e.g. integer overflow), which conventionally is considered a panicking error in Rust. The consumer of internal iteration failing, though, is supported; see the previous links w.r.t. TryExtend.

1 Like

Imo fn parse_next(&mut self) -> Result<Option<Element>, Error> is the most natural interface for an incremental parser that reads a sequence of elements one-at-a-time.

I write a lot of parsers. (I'm stuck in writing-parsers hell, I can't escape to write the things I want to use the parsers for.)

Good parsers don't stop on the first error; they're able to do some amount of error recovery so that multiple errors can be reported at the same time. As such, parsers fit the next(&mut self) -> Option<Result<_, _>> shape more than next(&mut self) -> Result<Option<_>, _>. And yes, I know that most parsers don't do this (e.g. nom's IResult).

IMHO, the platonic ideal of a parser is fn parse(tokens: /*nonempty*/ &[Token], diagnostics: &mut impl Extend<Diagnostic>) -> { len: usize, res: Data }. Obviously, real-world needs and practicalities quickly disguise this optimal interface, but the "soul" remains somewhere underneath.

2 Likes

I think error recovery can make sense for parsing languages that are written and read by humans (though it has its own problems and is rarely helpful to me these days), but for e.g. parsing a binary file format I think the right strategy is most often to stop at the first error, since:

  • You're not going to be generating suggested fixes or diagnostics beyond "this input is broken"
  • It's much more likely that the input you're parsing was generated by faulty code or in an unrecognized format, or that some kind of unpredictable corruption has occurred -- all of which imply that recovery is not going to be fruitful
  • On the other hand, it's unlikely that the source of the problem is "human suffering from temporary butterfingers, the rest of the input might be fine"

My own approach in sndjvu_format has been to generally halt on error, but provide facilities for skipping (not ever parsing) any part of the input that you don't care about, so problems in those regions don't affect your ability to parse the rest.

2 Likes

(ah, yes, I agree on parsers for binary formats; I just work with so much text parsing I default to that.)

(but do note that Data is implied to support carrying partial, ill-formed, and/or no actual data in my "soul" there, in addition to properly constructed data, so skipping would be e.g. Data::Invalid; with binary files it can make sense to not actually parse the file in "source" order)

1 Like

My try_extend would then correspond to SinkExt::send and the commit method would correspond to SinkExt::close.

In my case, I need an explicit commit, and not one which automatically fires on drop. That is because if an error happens, I want the whole transaction to abort.

Yes, I wouldn't want/need the flushing part.

But isn't the "for each" approach not just switching the direction but also restricting control flow? The problem is that for_each won't return until all elements have been pushed. With the AntiIter trait in my OP, it's up to the caller to do things intermediately, e.g. pushing items to several AntiIters, or returning with an error. So I feel like using callbacks and providing an "anti iterator" isn't the same. But maybe I'm misunderstanding.

I think there are two aspects of fallibility: the sender or the receiver could fail. In case of the normal Iterator, the receiver can fail and simply not invoke next anymore. In case of std::iter::Extend::extend_one (unstable yet, but you could call Extend::extend with a limited number of items), the sender can fail and simply not invoke the method anymore. In my use case, I wanted both the sender and the receiver being able to signal an error. Extend (as defined in std) cannot achieve this, because the receiver of the items cannot report an error other than panicking. (Though I guess it could refuse to retrieve further items from the iterator and report the error otherwise.)

Regarding std::iter::Iterator::try_for_each, it would support errors from either side (if the Item is a Result, but I still feel like control flow is restricted as I cannot, for example, zip two iterators, right?

The for_each argument is an "anti iterator," in that it takes the sequence of elements by calling it multiple times. I'm not saying that for_each is what you're looking for, but for_each is the shape which you are implementing. Your thing which wants to push items to the "anti iterators" is an internal iterator.