Iterators of Vector, performance consideration

Hi folks,

I have an iterator whose item is a Vector of enums.

struct SQLiteResultIterator<'s> {
    num_columns: i32,
    previous_status: i32,
    stmt: &'s crate::community_statement::Statement,
}

impl<'s> Iterator for SQLiteResultIterator<'s> {
    type Item = Vec<sql::Entity>;
    fn next(&mut self) -> Option<Self::Item> {
        if self.previous_status != sql::ffi::SQLITE_ROW {
            return None;
        }
        let mut row = Vec::with_capacity(self.num_columns as usize);
        for i in 0..self.num_columns {
            let entity_value = sql::Entity::new(self.stmt, i);
            row.push(entity_value);
        }
        unsafe {
            self.previous_status =
                sql::ffi::sqlite3_step(self.stmt.as_ptr());
        };
        Some(row)
    }
}

At every iteration I allocate a new vector, and it seems very wasteful.

Is that true that I will end up allocating a new vector at any iteration? I might be lucky and having the memory allocator being smart enough to don't call the OS, but I am still performing an allocation, correct?

Is there a way to avoid multiple allocations in cases like this one? How?

Yes, this will allocate lots of Vecs. Instead of trusting the system allocator to do the right thing, you may want to look into streaming iterators.

The general idea of these is that you keep the Vec inside of your iterator. On every call to next(), you clear the inner vec, fill it again, and return a reference to it (or a slice spanning its entire content) to the user. This won't allocate on every iteration because Vec's backing storage is kept around when clearing.

A caveat of this design, however, is that such an iterator cannot implement Rust's standard Iterator trait, because that trait requires being able to collect all iterator outputs into a collection, whereas here the user must drop the previous output before it's allowed to call the next() method again.

1 Like

Thanks for confirming my doubts! Really appreciate!

I thought about this idea of streaming iterators as well, but eventually, I want to have all the result together in memory. So this is not going to cut it.

The other option I am considering is to pass a vector of vector as input and having the function fill the outside vector. It seems like a better idea...

Another option is to yield iterators, then your users can do whatever they want with those. But this may not always be possible, and may be hard to do.

I am myself the user of this API fortunately :slight_smile:

If you always will want all the rows, why make an Iterator at all? Just write a function that returns a Vec, and you can organize the allocation however you care to, including iterating over the rows.

Wait! I am saying something stupid.

If I start doing something like:

impl<'s> SQLiteResultIterator<'s> {
    fn fill(&mut self) -> Vec<Vec<sql::Entity>> {
        let result = Vec::with_capacity(20)
        ...
    }
}

result will be now a pre-allocated vector of 20 .... pointers... Which is completely pointless. I need to pre-allocate the 20 inner vectors.

Is there a simple way to do it?

The only thing I am coming up at the moment is a

How do I pre-allocate the internal vector? That is the main question...

Yes you are right though

const EXPECTED_NUM_ROWS: usize = 20;

let mut result: Vec<Vec<_>> =
    (0 .. EXPECTED_NUM_ROWS)
        .map(|_| Vec::with_capacity(self.num_columns))
        .collect()
;
1 Like

This will make one allocation or 20, one after the other?

Anyway thanks, is going to be faster than what I am doing anyway!

That will make 21, 1 for each of the 20 inner vecs and 1 for the vec of vecs

1 Like

If each inner vector can contain a different number of elements, you have no choice except to allocate each of them.

1 Like

The other option being to have a huge Vec that you subslice to interpret it as a matrix.

No each inner vector will contain the same number of elements! Are columns from a SQL query.

This is indeed another interesting approach!

For reference I believe that the method Yandros is refering is Vec::chunks_mut()

That's one approach, but doesn't use the type system as well as it could. You really want an extensible 2-D matrix, as @Yandros pointed out, so define a type for a row and make a vector of that type.

But how? The row is an ordered list of an arbitrary (definitely non-constant) number of elements. To define a type, I will need to have such number know at compile time.

Apparently I misinterpreted your earlier reply. From what you've just written, it appears that all rows in a given instance are the same length, but that length is not knowable until runtime. The clean way to implement that is to provide runtime functions (or macros) to get and set a slice, using Vec::chunks() and Vec::chunks_mut(), respectively, together with the 1-dimensional index of the slice.

1 Like

Sorry! My bad! Each query result will have different numbers of columns. But inside one query result, all the row will have the same number of columns.

:wink:

Yes, I believe we have converged into the best solution together!

Cheers and thanks for helping!

1 Like

What if you let your iterator have the row Vec :

struct SQLiteResultIterator<'s> {
    num_columns: usize,
    previous_status: i32,
    stmt: &'s Statement,
    row: Vec<Entity>,
}

Then when you choose what's next() return by replacing each columns.

for i in 0..self.num_columns {
    let entity_value = Entity {};
    mem::replace(&mut self.row[i], entity_value);
}

And then return a reference to the row Vec.

Actually it does not compile because of the lifetime of Item. But there is a crate that allow you easily to break this wall : streaming_iterator.
You just have to divide your impl : the for loop in advance() and the return in get().

For detail look at impl :

1 Like