Are there any ways to use stack (not heap) when handling a sequence of elements whose length is determined at runtime (not compilation time)?


#1

I’m writing a JSON parser ( https://github.com/pikkr/pikkr ) which uses many Vectors to handle sequences of elements (sequences of byte of a JSON record).

The length of these sequences is determined at runtime (not at compilation time), so for now, I use Vectors to handle these sequences.

However, a Vector uses heap and that makes an application slower than using stack. So, I would like to use other solutions which use stack if possible.

Are there any ways to use stack when handling a sequence of elements whose length is determined at runtime? Or, do I have to use Vectors in that case?

Thanks.


#2

You may want to consider “small vector” optimization. It is implemented by smallvec crate. Here is its description:

“Small vector” optimization for Rust: store up to a small number of items on the stack


#3

Another possible strategy is to stick with heap allocation, but try to reuse the allocations that you make instead of throwing them away and starting over from scratch. There are many variations on this theme, depending on your requirements, but one recurring notion is that whenever feasible, you need to stop handing over owned heap storage (e.g. vectors) to your clients, and instead only let them borrow slices of data for the duration of their processing tasks.

The reason for this is that as soon as you’ve “given away” a heap data block, getting it back is more complicated. You basically need some garbage collection mechanism which tracks whenever the block is not used by the client anymore, and gives it back to you so that you can use it again. It’s feasible (e.g. using a modified Rc/Arc which puts back unused Vecs on a garbage stack), but if you can do without that complexity, it’s obviously better for performance and code clarity alike :slight_smile:

For “streaming” use cases where you send vectors to clients one after the other (and the client never holds more than one vector of data at once), one thing which you can do is to have a single vector, keep clearing and re-filling it whenever new data comes in, and hand over the slice of its current contents to the client. Since clearing does not reclaim storage, the vector will eventually reach an asymptotic capacity and stop reallocating. For the last few % of performance, you can take note of what the typical asymptotic capacity is on a representative input dataset and allocate your vector with that capacity right from the start.


#4

Thanks. I’ll check smallvec.


#5

Thanks for your suggestion. I applied the strategy you suggested ( https://github.com/pikkr/pikkr/pull/55/files ) and that improved my JSON parser’s performance!


#6

Along the lines of smallvec, there’s also an arrayvec crate which implements a vector-like interface but backed by an array on the stack. This is more limited in that you must define a length up front and if you go over that length it’ll error instead of reallocating to a larger buffer.