Extending vector with multiple elements generated from each input element

Hi community, I've been playing a lot with rust in the past days, and I am trying to improve the performance of a piece of code using rayon.

It follows a very simplified version of the code I have. In the real code, I am operating on big vectors (hundreds of thousands) of a more complex type, and instead of using iterate().take(), I have a custom iterator with deterministic behaviour. But this simplified version looks like the real one.

fn main() {
    use itertools::iterate;
    
    let mut v: Vec<u32> = vec![];
    
    let m: Vec<u32> = vec![1, 2, 3];
    
    // I want to replace this call by a single v.extend(<Soomething here!>);
    // (Actually to par_extend(), from rayon, as it can be done in parallel)
    m.into_iter().for_each(|a| {
        // this is just an example of "finite" iterator.
        // In the real, I am not using take(), but a custom iterator that
        // generates many items from a givem "a"
        let it = iterate(a, |&i| i * 10 ).take(2);
        
        v.extend(it);
    });
    
    // I actually don't care with the order of the elements, provided they are all there
    assert_eq!(v, vec![1, 10, 2, 20, 3, 30]);
}

We can see that for each element in "m", two elements are being created and those two elements are being added to "v".

The issue is that this code looks ugly and does not scale to multiple CPUs, although it can clearly be done in parallel. I am currently using rayon with par_extend() in other parts of the code, with good results, but I have not been able to transform this code to fit rayon parallel iterators.

Can this be accomplished? If yes, please show me how. If not, how can I improve this code?

Thank you in advance.

EDIT: I forgot to mention that I don't care with the order the elements are inserted in the final vector.

1 Like

I'm on mobile so I can't check if it works, but something along the lines of

fn main ()
{
    use ::itertools::iterate;
    use ::rayon::prelude::*;
    
    let m: Vec<u32> = vec![1, 2, 3];

    // /* or */ let mut v = Vec::from_par_iter(
    let mut v = vec![];
    v.par_extend(
        m   .into_par_iter()
            .flat_map(|a| {
                iterate(a, |&x| 10 * x)
                    .take(2)
                    .collect::<Vec<_>>() // may be needed to have a parallel iterator
            })
    );
    
    // I actually don't care with the order of the elements, provided they are all there
    v.sort();
    assert_eq!(v, vec![1, 2, 3, 10, 20, 30]);
}

should do the trick.

1 Like

I did not know about flat_map, and actually the .collect() is not even needed, as returning take() also works. Another day with rust, another lesson learned :slight_smile:

Unfortunately using rayon on the real code is not very easy, as the iterator has to implement ParallelIterator, which does not seem that simple to implement.

Collecting into Vec<> works, but does some extra memory allocation that made the code quite slow. Probably due heap allocations. As the max size of the generated array is known, I am now trying to use SmallVec on it.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.