Library for fast aggregations over vector of structs?

Imagine a vector of structs like the following:

struct User {
    active: bool,
    premium: bool,
    username: String,
    email: String,
    logins: u64,
    group_id: u64,
    site_id: u64,
}

I'm currently iterating over the vector and then creating small buckets of my counts using hashmaps. This allows me to know how count users grouping by group_id or site_id, max logins of a user per site, etc. So far, everything works and I'm happy with the code.

However, I was wondering if there is a smarter and faster way of doing this? Do people use libraries for this kind of thing? Or am I doing the right way?

ps.: My use case is stream analysis. Basically my rust code is listening on a queue and as events come through, I collect them and save into a vector. Once I judge that vector to be large enough, I do my aggregations and save into a table.

If your primary purpose is aggregation of each field, why store a vector of Users at all? Why not take the incoming data and feed it directly to the accumulators?

1 Like

Durability. I currently serialize that vector into a file from time to time to prevent losing data.

This small service implements a tumbling window, where I fill that vector based on a configured time interval aligned to the top of the hour. In this case, I aggregate all the data every 5 minutes, and save it into a SQL database. But if my service dies before I start to aggregate, I need the data, and thus the vector.

The reason why I don't always save into SQL is just the sheer volume of data. This is around 700 events per second, which would mean 700 rows in the database. So I opted to aggregate on a 5 minute basis using a tumbling window. Saves space in the SQL database.