Iterators of Vector, performance consideration

siscia · July 27, 2019, 12:15pm

Hi folks,

I have an iterator whose item is a Vector of enums.

struct SQLiteResultIterator<'s> {
    num_columns: i32,
    previous_status: i32,
    stmt: &'s crate::community_statement::Statement,
}

impl<'s> Iterator for SQLiteResultIterator<'s> {
    type Item = Vec<sql::Entity>;
    fn next(&mut self) -> Option<Self::Item> {
        if self.previous_status != sql::ffi::SQLITE_ROW {
            return None;
        }
        let mut row = Vec::with_capacity(self.num_columns as usize);
        for i in 0..self.num_columns {
            let entity_value = sql::Entity::new(self.stmt, i);
            row.push(entity_value);
        }
        unsafe {
            self.previous_status =
                sql::ffi::sqlite3_step(self.stmt.as_ptr());
        };
        Some(row)
    }
}

At every iteration I allocate a new vector, and it seems very wasteful.

Is that true that I will end up allocating a new vector at any iteration? I might be lucky and having the memory allocator being smart enough to don't call the OS, but I am still performing an allocation, correct?

Is there a way to avoid multiple allocations in cases like this one? How?

HadrienG · July 27, 2019, 12:39pm

Yes, this will allocate lots of Vecs. Instead of trusting the system allocator to do the right thing, you may want to look into streaming iterators.

The general idea of these is that you keep the Vec inside of your iterator. On every call to next(), you clear the inner vec, fill it again, and return a reference to it (or a slice spanning its entire content) to the user. This won't allocate on every iteration because Vec's backing storage is kept around when clearing.

A caveat of this design, however, is that such an iterator cannot implement Rust's standard Iterator trait, because that trait requires being able to collect all iterator outputs into a collection, whereas here the user must drop the previous output before it's allowed to call the next() method again.

siscia · July 27, 2019, 1:25pm

Thanks for confirming my doubts! Really appreciate!

I thought about this idea of streaming iterators as well, but eventually, I want to have all the result together in memory. So this is not going to cut it.

The other option I am considering is to pass a vector of vector as input and having the function fill the outside vector. It seems like a better idea...

RustyYato · July 27, 2019, 1:29pm

Another option is to yield iterators, then your users can do whatever they want with those. But this may not always be possible, and may be hard to do.

siscia · July 27, 2019, 1:32pm

I am myself the user of this API fortunately

droundy · July 27, 2019, 2:22pm

If you always will want all the rows, why make an Iterator at all? Just write a function that returns a Vec, and you can organize the allocation however you care to, including iterating over the rows.

siscia · July 27, 2019, 2:34pm

Wait! I am saying something stupid.

If I start doing something like:

impl<'s> SQLiteResultIterator<'s> {
    fn fill(&mut self) -> Vec<Vec<sql::Entity>> {
        let result = Vec::with_capacity(20)
        ...
    }
}

result will be now a pre-allocated vector of 20 .... pointers... Which is completely pointless. I need to pre-allocate the 20 inner vectors.

Is there a simple way to do it?

The only thing I am coming up at the moment is a

siscia · July 27, 2019, 2:35pm

How do I pre-allocate the internal vector? That is the main question...

Yes you are right though

Yandros · July 27, 2019, 3:02pm

const EXPECTED_NUM_ROWS: usize = 20;

let mut result: Vec<Vec<_>> =
    (0 .. EXPECTED_NUM_ROWS)
        .map(|_| Vec::with_capacity(self.num_columns))
        .collect()
;

siscia · July 27, 2019, 3:08pm

This will make one allocation or 20, one after the other?

Anyway thanks, is going to be faster than what I am doing anyway!

RustyYato · July 27, 2019, 3:09pm

That will make 21, 1 for each of the 20 inner vecs and 1 for the vec of vecs

TomP · July 27, 2019, 3:15pm

If each inner vector can contain a different number of elements, you have no choice except to allocate each of them.

Yandros · July 27, 2019, 3:16pm

The other option being to have a huge Vec that you subslice to interpret it as a matrix.

siscia · July 27, 2019, 3:16pm

No each inner vector will contain the same number of elements! Are columns from a SQL query.

siscia · July 27, 2019, 3:23pm

This is indeed another interesting approach!

For reference I believe that the method Yandros is refering is Vec::chunks_mut()

TomP · July 27, 2019, 3:24pm

That's one approach, but doesn't use the type system as well as it could. You really want an extensible 2-D matrix, as @Yandros pointed out, so define a type for a row and make a vector of that type.

siscia · July 27, 2019, 3:26pm

But how? The row is an ordered list of an arbitrary (definitely non-constant) number of elements. To define a type, I will need to have such number know at compile time.

TomP · July 27, 2019, 3:33pm

Apparently I misinterpreted your earlier reply. From what you've just written, it appears that all rows in a given instance are the same length, but that length is not knowable until runtime. The clean way to implement that is to provide runtime functions (or macros) to get and set a slice, using Vec::chunks() and Vec::chunks_mut(), respectively, together with the 1-dimensional index of the slice.

siscia · July 27, 2019, 3:36pm

Sorry! My bad! Each query result will have different numbers of columns. But inside one query result, all the row will have the same number of columns.

Yes, I believe we have converged into the best solution together!

Cheers and thanks for helping!

Greedeuh · July 31, 2019, 8:04am

What if you let your iterator have the row Vec :

struct SQLiteResultIterator<'s> {
    num_columns: usize,
    previous_status: i32,
    stmt: &'s Statement,
    row: Vec<Entity>,
}

Then when you choose what's next() return by replacing each columns.

for i in 0..self.num_columns {
    let entity_value = Entity {};
    mem::replace(&mut self.row[i], entity_value);
}

And then return a reference to the row Vec.

Actually it does not compile because of the lifetime of Item. But there is a crate that allow you easily to break this wall : streaming_iterator.
You just have to divide your impl : the for loop in advance() and the return in get().

For detail look at impl :

Topic		Replies	Views
Fill vector with elements with increasing values	7	3876	January 20, 2021
Which style do you prefer? And reason? help	5	583	December 12, 2019
Iterator best practice - please comment on my code code review	7	494	September 15, 2023
Iterators for dyn traits (lifetime problems) help	12	1282	December 7, 2020
Custom cyclic iterator over vector of vectors and mutability woes	4	373	December 12, 2022

Iterators of Vector, performance consideration

Related Topics