How to breakup an iterator into chunks?

HongtaoYang · January 20, 2023, 11:09am

Hi, I'm new to rust and I'm coming from python. I'm having trouble figuring out how to convert an iterator of item T into another iterator of item Vec<T>.

For example, let's say we have an infinite series of natural numbers 0, 1, 2, ...., and I want to chunk them into size 4 chunks [0,1,2,3], [4,5,6,7], [8,9,10,11], .... So I need a function that can lazily converts an iterator of usize to another iterator of Vec<usize>.

To make things more concrete, I want to recreate the following python function in rust:

def chunked(a: Iterator[int], chunk_size: int) -> Iterator[list[int]]:
    chunk = []
    for i, item in enumerate(a):
        chunk.append(item)
        if (i+1) % chunk_size == 0:
            yield chunk
            chunk = []
    
    yield chunk

Appreciate any help, thanks!

jofas · January 20, 2023, 11:13am

I think you are looking for the chunks or chunks_exact methods.

Note though that the chunks method is defined on slices ([T]), not iterators. Slices have an exact length, so they do not allow lazy iteration. There is the array_chunks method on Iterator, but this is nightly-only.

Example:

fn main() {
    let v = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];
    
    let chunks: Vec<Vec<i32>> = v.chunks(3).map(|c| c.to_vec()).collect();
    
    assert_eq!(chunks, vec![vec![1, 2, 3], vec![4, 5, 6], vec![7, 8, 9]]);
}

Playground.

HongtaoYang · January 20, 2023, 11:23am

Thanks @jofas, I'm wondering how can we do this lazily? In your example, what if v is an infinite series? let v = 1..;.

HongtaoYang · January 20, 2023, 11:26am

array_chunks looks promising, thanks! Although as a learning experience, I'd like to see how array_chunks was implemented.

steffahn · January 20, 2023, 11:27am

If the goal is to just get chunking behavior, then there are existing methods e.g. in the itertools crate, with further optimizations, e.g. the ability to avoid the need to allocate and create Vecs entirely, if the items are consumed in the right order.

If the goal is to learn how to write iterators yourself, there are two basic approaches.

One approach is to use existing general iterator adapters / constructors, and implement the necessary logic in a closure, appropriately. One of the most general existing ways of constructing iterators in Rust is via the std::iter::from_fn function.

E.g.

fn chunked<I>(a: impl IntoIterator<Item = I>, chunk_size: usize) -> impl Iterator<Item = Vec<I>> {
    let mut a = a.into_iter();
    std::iter::from_fn(move || {
        Some(a.by_ref().take(chunk_size).collect()).filter(|chunk: &Vec<_>| !chunk.is_empty())
    })
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }
}

The (mostly equivalent) alternative is then to write your own iterator adapter type manually. The benefits would be that you have the ability to

be able to add further optimizations, e.g. implement the size_hint
have a named type for the iterator

The downside is more boilerplate. You manually need to list the stuff the closure above captured automatically. But it’s not too bad either; a minimal translation would be

struct Chunked<I> {
    iterator: I,
    chunk_size: usize,
}

fn chunked<Collection>(a: Collection, chunk_size: usize) -> Chunked<Collection::IntoIter>
where
    Collection: IntoIterator,
{
    let iterator = a.into_iter();
    Chunked {
        iterator,
        chunk_size,
    }
}

impl<I: Iterator> Iterator for Chunked<I> {
    type Item = Vec<I::Item>;
    fn next(&mut self) -> Option<Self::Item> {
        Some(self.iterator.by_ref().take(self.chunk_size).collect())
            .filter(|chunk: &Vec<_>| !chunk.is_empty())
    }
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }
}

jofas · January 20, 2023, 11:27am

Like this maybe? It requires you to use the nightly toolchain though (because the method is not considered stable yet):

#![feature(iter_array_chunks)]

fn main() {
    for [a, b, c] in (1..).array_chunks() {
        println!("{a} {b} {c}");
        
        if c == 9 {
            break;
        }
    }
}

Playground.

HongtaoYang · January 20, 2023, 11:39am

Thanks a lot @steffahn , this is exactly what I'm looking for. And you are right, I want to learn to write iterators myself.

steffahn · January 20, 2023, 11:50am

More of a show-off… but with some external crate magic, we can get pretty close to the Python version syntactically. Of course, rewriting this into a from_fn iterator is so trivial, that it isn’t worth the overhead of additional dependencies and the boxing in this particular case.

/*
[dependencies]
next-gen = "0.1.1"
*/

use next_gen::prelude::*;

#[generator(yield(Vec<T>))]
fn chunked_gen<T>(a: impl IntoIterator<Item = T>, chunk_size: usize) {
    let mut chunk = vec![];
    for (i, item) in a.into_iter().enumerate() {
        chunk.push(item);
        if (i + 1) % chunk_size == 0 {
            yield_!(chunk);
            chunk = vec![];
        }
    }
    yield_!(chunk);
}

// wrapper to make it an ordinary iterator
fn chunked<'a, T: 'a>(
    a: impl IntoIterator<Item = T> + 'a,
    chunk_size: usize,
) -> impl Iterator<Item = Vec<T>> + 'a {
    chunked_gen.call_boxed((a, chunk_size))
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }

    // by the way, this implementation would always yield an empty extra vec
    // if the number divides evenly
    println!("--------");
    
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 5) {
        println!("{chunk:?}");
    }
}

Run in the “Rust Explorer”

HongtaoYang · January 20, 2023, 12:06pm

I'm blown away, great thanks @steffahn !

system · April 20, 2023, 12:07pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to destructure a slice in an Iterator? help	4	859	November 29, 2022
Iterate over &[1,2,3,4,5,6,7,8,9] to receive `&[1,4,7]` then `&[2,5,8]` then `&[3,6,9]` help	21	925	February 23, 2022
How to operate on a chunk	3	535	March 6, 2022
How can I chunk a vector with non-copyable-cloneable items? help	9	632	May 24, 2023
Iterator: combine skip & take	7	1641	December 10, 2019

How to breakup an iterator into chunks?

Related Topics