Current state of the art regarding streaming/lending iterators

I'd like to know what's currently state of the art when wanting to have iterators that borrow instead of returning owned items. Reading into it, I found an article on the web and also a crate: streaming-iterator.

The linked article, "Solving the Generalized Streaming Iterator Problem without GATs", seems to address the problem from a pre-GAT p.o.v. I meanwhile use GATs a lot by using Rust nightly and #![feature(generic_associated_types)], so using those wouldn't be a big issue.

The downside of using a streaming_iterator::StreamingIterator seems to be that the API is not compatible to an Iterator.

As for my current problem, I have an Arc<Vec<String>> and would like to get an iterator that yields &str, so I can avoid cloning the Strings.

This is what I do now:

use std::sync::Arc;

pub struct StrIterOwner(Arc<Vec<String>>);

impl<'a> IntoIterator for &'a StrIterOwner {
    type Item = &'a str;
    type IntoIter = StrIter<'a>;
    fn into_iter(self) -> Self::IntoIter {
        StrIter {
            vec: &self.0,
            idx: 0,
        }
    }
}

pub struct StrIter<'a> {
    vec: &'a Arc<Vec<String>>,
    idx: usize,
}

impl<'a> Iterator for StrIter<'a> {
    type Item = &'a str;
    fn next(&mut self) -> Option<Self::Item> {
        if self.idx < self.vec.len() {
            let result = Some(&*self.vec[self.idx]);
            self.idx += 1;
            result
        } else {
            None
        }
    }
}

fn main() {
    let arc = Arc::new(vec!["Hello".to_string(), "World".to_string()]);
    let iter_owner = StrIterOwner(arc.clone());
    for item in iter_owner.into_iter() {
        println!("{}", item);
    }
    for item in iter_owner.into_iter() {
        println!("{}", item);
    }
}

(Playground)

Output:

Hello
World
Hello
World

My solution isn't generic (it's specific to String/str), but it seems to work as I want. It doesn't even require GATs. Is there any better way to do it that's more idiomatic. One downside is that I had to implement Iterator::next on my own, and my implementation isn't double ended.

What do other people do when wanting to iterate over values that can only be borrowed but not moved?

P.S.: Of course, I'd like to hear feedback on whether my solution above is a good thing to do or can be done better or should be done differently.

That’s not a setting where you need a streaming iterator, implementing IntoIter for a reference is the best solution here. The important feature of streaming/lending iterators is that only return short borrows on next, a requirement that’s most commonly important because you want to overwrite / re-use the same memory location between calls to next.

E.g. think of an iterator that returns lines of a File as &str, but each of this &str item points into the same (iterator-internal) String buffer that get’s re-used and cleared for each line.

Or an iterator adapter like group_by but that never uses allocations, because the kind of access-pattern that the itertools implementation must allow for (and does allow, using allocations) is not possible for a streaming iterator.

6 Likes

If you’re on nightly anyways, you could also just do something like

#![feature(type_alias_impl_trait)]

use std::sync::Arc;

pub struct StrIterOwner(Arc<Vec<String>>);

impl<'a> IntoIterator for &'a StrIterOwner {
    type Item = &'a str;
    type IntoIter = impl DoubleEndedIterator<Item = Self::Item> + ExactSizeIterator;
    fn into_iter(self) -> Self::IntoIter {
        self.0.iter().map(|s| s.as_str())
    }
}
1 Like

@steffahn: Thank you very very much. This iterator/borrowing issue has caused me headache for days :hot_face:. And your solution for it is very elegant! :smiley:

A lot of my code uses type_alias_impl_trait.

Here's the Playground link in case someone else wants to try it.

Ah okay, then I misunderstood the concept of streaming iterators. Indeed, I have my strings in an Arc, so the memory won't be touched while I have a clone of the Arc.

It would be nicer if the the Arc was owned by the iterator itself (so I can have functions directly returning an Iterator which owns the Arc, instead of returning a value whose borrow must be turned into an iterator), but I assume that's not possible without self-referential data structures?

Right… under that aspect: it would indeed be possible to implement a streaming iterator trait directly on the datastructure that owns the Arc. But IMO, the disadvantages of only offering such a streaming iterator implementation are too great, after all streaming iterators are less capable; you cannot e.g. collect their items, you cannot handle multiple items concurrently, etc…

Self-referencing types would not help with an Iterator implementation for StrIterOwner, but they would help users of a StrIterOwner that need to keep an owned StrIterOwner next to the corresponding iterator in a struct.

Arguably, you could of course even offer both, a streaming iterator implementation for an owned Arc<Vec<String>>-containing struct, and the Iterator implementation for the borrowed version.

I think this would require some kind of Arc projection so that each of the yielded items is capable of keeping the backing store alive on its own.

Just a drive-by note on terminology: although these are called streaming iterators in a lot of places like the GAT RFC, these days I prefer to call them lending iterators to avoid confusion with the Stream trait for async iterators. (Or with the eventual lending streams...)

2 Likes

Isn't it then possible to implement the Iterator trait on the struct which contains the StrIterOwner and the Iterator by forwarding the next method to the inner iterator? :thinking: Maybe that doesn't work because next works on &mut self and I'd actually end up with a streaming lending iterator.

Yes, I understand. I would lose functionality when using the lending iterator interface. This makes me feel more comfortable with my approach of implementing IntoIterator for &StrIterOwner. And your improvement with "impl Trait" in the type alias makes it relatively easy to write down. I won't even need a crate for it. (I'm really happy about the improved approach!)

Yes, exactly, you could only define a lending iterator, even using a self-referencing struct.

1 Like

I tried to write an example, but it ICEs on cargo build :sob:. Most likely a duplicate of an existing issue, but I don’t feel like minimizing it right now…

// ouroboros = "0.14.0"

#![feature(type_alias_impl_trait)]

use std::sync::Arc;

pub struct StrIterOwner(Arc<Vec<String>>);

impl<'a> IntoIterator for &'a StrIterOwner {
    type Item = &'a str;
    type IntoIter = impl DoubleEndedIterator<Item = Self::Item> + ExactSizeIterator;
    fn into_iter(self) -> Self::IntoIter {
        self.0.iter().map(|s| s.as_str())
    }
}

type StrIter<'a> = <&'a StrIterOwner as IntoIterator>::IntoIter;

use ouroboros::self_referencing;

fn process_two_strings(a: &str, b: &str) -> i32 {
    (a.len() + b.len()) as _
}

#[self_referencing]
struct ProcessWindows {
    owner: StrIterOwner,
    #[not_covariant]
    #[borrows(owner)]
    iter_and_previous: (StrIter<'this>, Option<&'this str>),
}

fn process_windows(owner: StrIterOwner) -> ProcessWindows {
    ProcessWindowsBuilder {
        owner,
        iter_and_previous_builder: |owner| {
            let mut iter = owner.into_iter();
            let previous = iter.next();
            (iter, previous)
        },
    }
    .build()
}

impl Iterator for ProcessWindows {
    type Item = i32;

    fn next(&mut self) -> Option<Self::Item> {
        self.with_iter_and_previous_mut(|(iter, previous)| {
            let first = previous.take()?;
            let second = iter.next()?;
            *previous = Some(second);
            Some(process_two_strings(first, second))
        })
    }
}


fn main() {
    let x = StrIterOwner(Arc::new(vec![".".into(), "..".into(), "...".into()]));
    let x = process_windows(x);
    assert_eq!(&x.collect::<Vec<i32>>(), &[3, 5]);
}

I wonder: Is it possible to keep the type StrIterOwner private and have a function that returns an impl Trait with &'a Trait implementing IntoIterator<Item = &'a str>? I.e. a function that returns an opaque type which doesn't implement a trait but a borrow of that type implements a trait?

Like fn foo() -> for<'a> &'a impl IntoIterator<Item = &'a str> (syntax made up). Perhaps it's possible with a trait alias?

(Hope it's clear what I mean. Edit: See comments in this Playground for a draft that doesn't work.)

You can do that in some sense on stable, but it's not actually useful in that form due to the lack of implied bounds.

You could do it indirectly with RPITIT, but I don't think you want the indirection.

impl<T> RefIntoIters
where
    for<'any> &'any T: IntoIterator<Item=&'any str>,
{
    fn iter(&self) -> impl IntoIterator<Item=&'_ str> + '_  {
        self
    }
}

// Elsewhere: have to go through `RefIntoIters`
for s in rii.iter() { /* ... */ }
// Errors: for s in &rii { /* ... */ }

I think this is basically the same as your playground idea, but without nameable GATs. Here's your playground bumped forward using associated_type_bounds. The errors it produces if you uncomment the code in main look like more "no implied bounds" problems. Maybe this is good enough for you.

If the indirection is good enough, you can approximate it on stable with dyn by returning a boxed iterator.

There might be a better way than any of these.

1 Like

Thank you for these examples. I experimented a bit on my own, based on your examples (Playground).

If I understand right, then the lack of "implied bounds" is what make it impossible to write for _ in &foo, but instead requiring me to use a method call here.

I don't really need the type to be private, so all the effort is likely not worth it. Instead, I'll go with @steffahn's approach here. But it would be nice if (some day in future) there might be a way to return &impl Trait with an easy syntax (not sure about the downsides of making something like that possible though).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.