How to breakup an iterator into chunks?

Hi, I'm new to rust and I'm coming from python. I'm having trouble figuring out how to convert an iterator of item T into another iterator of item Vec<T>.

For example, let's say we have an infinite series of natural numbers 0, 1, 2, ...., and I want to chunk them into size 4 chunks [0,1,2,3], [4,5,6,7], [8,9,10,11], .... So I need a function that can lazily converts an iterator of usize to another iterator of Vec<usize>.

To make things more concrete, I want to recreate the following python function in rust:

def chunked(a: Iterator[int], chunk_size: int) -> Iterator[list[int]]:
    chunk = []
    for i, item in enumerate(a):
        chunk.append(item)
        if (i+1) % chunk_size == 0:
            yield chunk
            chunk = []
    
    yield chunk

Appreciate any help, thanks!

1 Like

I think you are looking for the chunks or chunks_exact methods.

Note though that the chunks method is defined on slices ([T]), not iterators. Slices have an exact length, so they do not allow lazy iteration. There is the array_chunks method on Iterator, but this is nightly-only.

Example:

fn main() {
    let v = vec![1, 2, 3, 4, 5, 6, 7, 8, 9];
    
    let chunks: Vec<Vec<i32>> = v.chunks(3).map(|c| c.to_vec()).collect();
    
    assert_eq!(chunks, vec![vec![1, 2, 3], vec![4, 5, 6], vec![7, 8, 9]]);
}

Playground.

2 Likes

Thanks @jofas, I'm wondering how can we do this lazily? In your example, what if v is an infinite series? let v = 1..;.

array_chunks looks promising, thanks! Although as a learning experience, I'd like to see how array_chunks was implemented.

If the goal is to just get chunking behavior, then there are existing methods e.g. in the itertools crate, with further optimizations, e.g. the ability to avoid the need to allocate and create Vecs entirely, if the items are consumed in the right order.

If the goal is to learn how to write iterators yourself, there are two basic approaches.

One approach is to use existing general iterator adapters / constructors, and implement the necessary logic in a closure, appropriately. One of the most general existing ways of constructing iterators in Rust is via the std::iter::from_fn function.

E.g.

fn chunked<I>(a: impl IntoIterator<Item = I>, chunk_size: usize) -> impl Iterator<Item = Vec<I>> {
    let mut a = a.into_iter();
    std::iter::from_fn(move || {
        Some(a.by_ref().take(chunk_size).collect()).filter(|chunk: &Vec<_>| !chunk.is_empty())
    })
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }
}

The (mostly equivalent) alternative is then to write your own iterator adapter type manually. The benefits would be that you have the ability to

  • be able to add further optimizations, e.g. implement the size_hint
  • have a named type for the iterator

The downside is more boilerplate. You manually need to list the stuff the closure above captured automatically. But it’s not too bad either; a minimal translation would be

struct Chunked<I> {
    iterator: I,
    chunk_size: usize,
}

fn chunked<Collection>(a: Collection, chunk_size: usize) -> Chunked<Collection::IntoIter>
where
    Collection: IntoIterator,
{
    let iterator = a.into_iter();
    Chunked {
        iterator,
        chunk_size,
    }
}

impl<I: Iterator> Iterator for Chunked<I> {
    type Item = Vec<I::Item>;
    fn next(&mut self) -> Option<Self::Item> {
        Some(self.iterator.by_ref().take(self.chunk_size).collect())
            .filter(|chunk: &Vec<_>| !chunk.is_empty())
    }
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }
}
4 Likes

Like this maybe? It requires you to use the nightly toolchain though (because the method is not considered stable yet):

#![feature(iter_array_chunks)]

fn main() {
    for [a, b, c] in (1..).array_chunks() {
        println!("{a} {b} {c}");
        
        if c == 9 {
            break;
        }
    }
}

Playground.

Thanks a lot @steffahn , this is exactly what I'm looking for. And you are right, I want to learn to write iterators myself.

More of a show-off… but with some external crate magic, we can get pretty close to the Python version syntactically. Of course, rewriting this into a from_fn iterator is so trivial, that it isn’t worth the overhead of additional dependencies and the boxing in this particular case.

/*
[dependencies]
next-gen = "0.1.1"
*/

use next_gen::prelude::*;

#[generator(yield(Vec<T>))]
fn chunked_gen<T>(a: impl IntoIterator<Item = T>, chunk_size: usize) {
    let mut chunk = vec![];
    for (i, item) in a.into_iter().enumerate() {
        chunk.push(item);
        if (i + 1) % chunk_size == 0 {
            yield_!(chunk);
            chunk = vec![];
        }
    }
    yield_!(chunk);
}

// wrapper to make it an ordinary iterator
fn chunked<'a, T: 'a>(
    a: impl IntoIterator<Item = T> + 'a,
    chunk_size: usize,
) -> impl Iterator<Item = Vec<T>> + 'a {
    chunked_gen.call_boxed((a, chunk_size))
}

fn main() {
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 4) {
        println!("{chunk:?}");
    }

    // by the way, this implementation would always yield an empty extra vec
    // if the number divides evenly
    println!("--------");
    
    for chunk in chunked([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 5) {
        println!("{chunk:?}");
    }
}

Run in the “Rust Explorer”

3 Likes

I'm blown away, great thanks @steffahn !

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.