Pool of Reusable Vectors

Hi Everyone! I'm know a bit or two about rust, but I found a problem and I have the feeling that I'm missing something and ask for your help. :slight_smile:

I'm trying to port a C# code to Rust. Rust looks like a really fast language, and after some works with it, looks like a good choice to speed my program. But I can't find a way to create a Pool of reusable vectors. In C#, i have this algorithm.

// C# Source Code
var popLength = 1024;

//This with result an array with 1024 null items
var arr = ArrayPool<Individual>.Shared.Rent(popLength);

  // Do a lot of operations, with each item of the array, 
  // like, initialize the individual, do some operations with each item, etc

// After all is done, I clean the array
for(var i =0; i < arr.Length; i++) {
   // This helps the GC identify objects that can be discarded.
    arr[i] = null;
}

// And return it to the pool, to be used again
ArrayPool<Individual>.Shared.Return(arr, true);

In rust I tried to mimic this behavior with a struct:

// Rust Code
use std::collections::VecDeque;

/// Structure to hold a lot of reusable vectors
pub struct VecPool<T> {
    pool: VecDeque<Vec<T>>,
}

impl<T> VecPool<T> {
    /// Create a new pool
    pub fn new() -> Self {
        Self {
            pool: VecDeque::new(),
        }
    }

    /// Rent a new vector
    pub fn rent(&mut self, size: usize) -> Vec<T> {
        match self.pool.pop_front() {
            Some(mut p) => {
                p.clear();
                p
            }
            None => Vec::with_capacity(size),
        }
    }

    /// Return a vector to the pool
    pub fn put_back(&mut self, mut vec: Vec<T>) {
        vec.clear();
        self.pool.push_back(vec);
    }
}

The problem that I have occurs when I try to return the vector to the pool. For example, when I try to execute an operation:

/// Execute an operation on the population
pub fn execute(
    pool: &mut VecPool<Individual>,
    population: &mut Vec<Individuals),
) {
    let mut temp = pool.rent(population.len());
    
    while temp.len() < poppulation.len() {
         // Execute a lot of operations with each item of the population
         // This is jsut an example, no the exact source code that I have
         let new_ind = do_stuff(&population);
         temp.push(new_ind)
    }

    population.clear();
    population.extend(temp);
   // This does not compile, because of the move
   pool.put_back(temp);
}

The problem that I'm facing is: I cant find a way to move all contents from the temp array to the population, and after that, send the temp array to the pool to be re-utilized.

There is some way to accomplish this?

Thank you in advance!

You have to clone temp, extend population with the cloned data and then store temp back in your pool. population and pool store owned data and data can only have one owner (unless you put the data behind a shared ownership smart pointer like Rc).

You can use .extend(temp.drain(..)).

7 Likes

I understand that the data must have only one owner. But, if I understand correctly what you propose, for each new item generated, a clone will be generated, and more memory will be allocated than if i could move the item from the first array to the second array. It's that true?
And, when I call vec.clear() in the temp item, each element with be dropped... witch means, a lot of drops will be called too, right? It is something like this that will happen?

Yes, sorry, Vec::drain makes a lot more sense in your setting.

Note that in your Rust port, it might be actually better to leave VecPool<T> out. All else being equal, it's better to avoid deallocating and reallocating a container you could reuse instead, but in this case, in order to achieve that, you are using an additional data structure. The memory allocator is itself a data structure whose job is to efficiently give you memory to reuse, so your own pool won't necessarily be faster than it can offer.

Once your program is working, write a benchmark for it, and compare the performance of your program using VecPool to your program using Vec::with_capacity() with a good capacity guess (VecPool is effectively guessing the capacity as "the capacity used before"). Or if you don't care enough about performance to write the benchmark, then get rid of VecPool as unnecessary complexity.

Also, if you do continue using VecPool, it would probably be better for VecPool to return the most recent inserted vector (like a stack) rather than the oldest (like a queue), to keep the working set slightly smaller.

1 Like

I just executed a test with Vec::drain(..) and It worked very well. Than @jofas and @steffahn for the answers.

The test that i executed:

println!("# Before Drain");
println!("a1 capacity: {}", a1.capacity());
println!("a1 length: {}", a1.len());
println!("a1 elements: {:?}", &a1);
println!("a2 capacity: {}", a2.capacity());
println!("a2 length: {}", a2.len());
println!("a2 elements: {:?}", &a2);

a2.extend(a1.drain(..));

println!("# After Drain");
println!("a1 capacity: {}", a1.capacity());
println!("a1 length: {}", a1.len());
println!("a1 elements: {:?}", &a1);
println!("a2 capacity: {}", a2.capacity());
println!("a2 length: {}", a2.len());
println!("a2 elements: {:?}", &a2);

the output

# Before Drain
a1 capacity: 5
a1 length: 5
a1 elements: [1, 2, 3, 4, 5]
a2 capacity: 0
a2 length: 0
a2 elements: []


# After Drain
a1 capacity: 5
a1 length: 0
a1 elements: []
a2 capacity: 5
a2 length: 5
a2 elements: [1, 2, 3, 4, 5]

Thanks for the input! The reason that I'm trying to mimic the ArrayPool from the C# code to my rust code, i'ts because of performance (or at least, it is one of the things that I know to be a performance bottleneck).

My original source C# code is a genetic algo, that generates a ton of small and large arrays. Some are simple ints, others are a bit more complex, like objects and strings.

In my first implementation attempt (In c#), i didn't use the array pool, and the application was slow. Really, really slow. And when I executed the benchmark, I saw that most of the app time was spent on allocating and deallocating the arrays. When I refactored the app to use the pool of arrays, the speed of the application was very significant.

I implemented the app in rust. Right now, I'm not using the pool strategy, and my app in rust is really slow. Something like 10 times slower than my C# implementation. So, now, i will try to implement the same pool strategy, and see if some gains are achieved, in the same ways that i achieved those gains in C#. I'm hoping that the rust implementation to be faster than my c# code.

I am making this recommendation for two reasons:

  • C# implementations use a garbage collector, which will have very different performance characteristics around allocation then Rust does. You should not assume that because the pool was a benefit in C# that it will be a benefit in Rust.
  • I have been working on performance-sensitive code for quite a while, and found that doing additional work in order to reuse allocations is often not beneficial. Reuse allocations, yes, but do it when it’s a simple matter of replacing clones with moves or clearing a vector and using it a second time in the same function, not when it demands its own extra data structure. Of course, the behavior of your application may be different. That's why one must write and use a benchmark.

I'm not saying that you should not use the pool. I'm saying that you should measure whether the pool is in fact better, before deciding whether to use it.

1 Like

Minor comment: Clippy recommends changing this to append because it is faster and more concise:

a2.append(&mut a1);

Using Clippy is very worthwhile.

2 Likes

Just to be sure: Are you building in release mode?

4 Likes

Also, if you're doing lots of allocations and deallocations, you can consider using a different allocator -- lots of them will do size-pooling for you.

The default allocator is the system one, which is the best for code size, but not necessarily for performance. Try mimalloc - Rust or https://jemalloc.net/ or ...

5 Likes

Just a wild guess: Could it be that the issue isn’t with the arrays themselves, but with the number of individuals being copied repeatedly?

Consider a sequence like this:

let mut temp = Vec::new();

let new_ind = do_stuff(...);
temp.push(new_ind);

population.extend(temp);

In this example, the individual new_ind is copied twice—first into the internal array of temp, and then again into the internal array of population.

Normally, copying is quick, but with large elements or a large number of them, this can start to add up. Sometimes, this copying even leads to extra overhead due to CPU caching and similar factors.

As a potential solution, you might consider putting the individuals inside a Box<Individual>, so that effectively only pointers are being moved around.

Additionally, you could then optimize the allocation and deallocation of these individuals using a simple free-list, like this one:

use std::{
    alloc::{alloc, dealloc, Layout},
    mem,
    ptr,
}

#[derive(Default)]
pub struct FreeList<T>(Vec<*mut T>);

impl<T> FreeList<T> {
    pub fn new() -> Self {
        FreeList(Vec::new())
    }
    pub fn with_preallocated(size: usize) -> Self {
        let layout = Layout::new::<T>();
        let mut chunks = Vec::new();
        chunks.resize_with(size, || unsafe { alloc(layout).cast() });
        Self(chunks)
    }
    pub fn make_box(&mut self, value: T) -> Box<T> {
        if let Some(ptr) = self.0.pop() {
            unsafe {
                ptr.write(value);
                Box::from_raw(ptr)
            }
        } else {
            Box::new(value)
        }
    }
    pub fn recycle(&mut self, value: Box<T>) {
        let ptr = Box::into_raw(value);
        unsafe { ptr::drop_in_place(ptr) };
        self.0.push(ptr);
    }
    pub fn clear(&mut self) {
        let layout = Layout::new::<T>();
        for ptr in mem::take(&mut self.0) {
            unsafe { dealloc(ptr.cast(), layout) }
        }
    }
    pub fn resize(&mut self, new_len: usize) {
        if new_len == 0 {
            self.clear();
        } else {
            let layout = Layout::new::<T>();

            if new_len > self.0.len() {
                self.0.resize_with(new_len, || unsafe { alloc(layout).cast() });
            } else {
                for ptr in self.0.drain(new_len..) {
                    unsafe { dealloc(ptr.cast(), layout) }
                }
            }
        }
    }
}
impl<T> Drop for FreeList<T> {
    fn drop(&mut self) {
        let layout = Layout::new::<T>();
        for ptr in self.0.iter().copied() {
            unsafe { dealloc(ptr.cast(), layout) }
        }
    }
}

Another option might be for the constructor of an individual to write directly into raw memory to avoid any unnecessary copying, though this would require some use of unsafe code.

Alternatively, you could equip individuals with a method like reset() and construct them using a builder method.

The goal here is to eliminate as much unnecessary copying as possible.

1 Like

That's a great idea!

I'd like to know more about the Individual structs: how large they are and whether they contain allocated items. I think what you suggest makes sense for largish structs.

So do you have a pool per type of individual?

Extend / pointers / ... This sounds more like a job for swap:

std::mem::swap(&mut temp, population);
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.