Custom chunk in rust

hi , rust expert,

I found there is a chunk in std, but my need is to chunk elements based on a custom function instead of a known element number,

e.g. I want to chunk a vector of string and to make sure the total string length in one chunk not greater than 2,

maybe something like this:

fn main() {
    let vec = vec!["a", "bbb", "ccc", "d", "e"];
    
    
    // make a new chunk if total string lengths in slice are greater than 2
    let chunks: Vec<Vec<&str>> = vec
        .my_chunk(...)
        .collect::<Vec<_>>();
    
    chunks[0] == ["a", "bbb"];  
    chunk[1] == ["ccc"]
    chunk[2] == ["d", "e"]
}

i or what is the ruster way to implement this kind of function

The full code is here, I was looking for a ruster way to implement it
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=90a8839a21e6fddc1bbef57d0fdf0fb7

fn main() {
    let vec = vec!["a", "bbb", "ccc", "d", "e"];
    let mut chunks = vec![];

    let mut buffer = vec![];
    let mut total_size = 0;
    for s in vec {
        if total_size > 2 {
            chunks.push(buffer.clone());
            buffer.clear();
            total_size = 0;
        }
        total_size += s.len();
        buffer.push(s);
    }
    if !buffer.is_empty() {
        chunks.push(buffer);
    }
    assert_eq!(chunks[0], vec!["a", "bbb"]);
    assert_eq!(chunks[1], vec!["ccc"]);
    assert_eq!(chunks[2], vec!["d", "e"]);
}

You could write your own iterator like this, but I'm not sure it's better.

use std::mem;

struct MyChunker<I> {
    iter: I,
    chunk: Vec<String>,
    total_size: usize,
    done: bool,
}

impl<I> Iterator for MyChunker<I>
where
    I: Iterator<Item = String>,
{
    type Item = Vec<String>;
    
    fn next(&mut self) -> Option<Vec<String>> {
        if self.done {
            return None;
        }
        loop {
            match self.iter.next() {
                Some(item) => {
                    self.total_size += item.len();
                    self.chunk.push(item);
                    if self.total_size > 2 {
                        self.total_size = 0;
                        return Some(mem::take(&mut self.chunk));
                    }
                }
                None if self.chunk.is_empty() => {
                    self.done = true;
                    return None;
                }
                None => {
                    self.done = true;
                    return Some(mem::take(&mut self.chunk));
                }
            }
        }
    }
}

Note that this uses std::mem::take, which replaces the vector with an empty vector and returns the old vector. This avoids cloning the contents of the vector.

it seems what I want, I'll try it . Thanks a lot :smiley:

This can be simplified and generalized slightly (playground):

struct MyChunker<I: Iterator> {
    iter: I,
    chunk: Vec<I::Item>,
    max_total_size: usize,
    total_size: usize,
}

impl<I> Iterator for MyChunker<I>
where
    I: Iterator,
    I::Item: AsRef<str>,
{
    type Item = Vec<I::Item>;
    
    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.iter.next() {
                Some(item) => {
                    self.total_size += item.as_ref().len();
                    self.chunk.push(item);
                    if self.total_size > self.max_total_size {
                        self.total_size = 0;
                        return Some(mem::take(&mut self.chunk))
                    }
                }
                None => return if self.chunk.is_empty() {
                    None
                } else {
                    Some(mem::take(&mut self.chunk))
                }
            }
        }
    }
}
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.