Looking for crate with append only Vec that grows without reallocations

I need to be able to grow Vecs without reallocations. I was going to implement this myself, by having an indexer vec and additional vecs with actual data, that would be allocated as needed, but thought to look around before.

For my case, I need to allocate bytes, with 100-1500MBs in len, depending on the input. It should be append only and if it can deallocate from the tail this would be nice.

When you access those bytes, will it be in the same size chunks as the input. Or will your access be needing random access to the whole stream of bytes as one slice?

The actual data are unsized structs and i'll be accessing them one at a time, so it doesn't need to be continuous. Basically the interface are 2 functions:

  • store (&struct) -> address
  • get (address) -> &struct

Did you implement this? I'm looking for something similar, basically I want to have a Vec like container that does not reallocate automatically if len() is the same with cap()

Not yet. I've settled for regular Vec for now. In my bench on Apple M1, total amount of work was done under 0.7s, and the Vec grew from 8MB to 2.95GB, so the reallocations are fast enough for now. I will do this sometime later, after we launch.

I've also found some crates with this query https://crates.io/search?q=vec%20reallocation. For me those won't work, cause I store unsized structs in the vec, but might for you.

Thanks, I will look deeper into crates.io.

The need for a Vec without re-allocations is only really there in quite specific use-cases. In general, through the “magic” of doubling the size every time, the overhead turns out to be really low. Here’s some further explanations I found via Google. The benefits one gets in turn is a simpler data structure, which – I suppose – could mean some performance benefits when doing lots of reads; and e.g. in Rust in particular, it allows the whole Vec to become a single slice (“&[T]”) and thus be used in API agnostic over whether a Vec, and array, or something else is used.

The specific use-cases where this re-allocation strategy really doesn’t work are, off the top of my head (so I might be missing other use-cases)

  • since the time analysis is only concerned with amortized costs, and the resizing operations on their own do take quite a significant amount of time at once, there’s a problem when your application is not doing one large batch-processing-style computation, but instead is supposed to react to inputs on-the-fly and in real time. For for real-time applications, avoiding the pauses for re-allocations can be beneficial.
  • if you want to use the Vec in interesting ways, re-allocation might be problematic: If you want to hold references to elements while pushing new ones, you cannot support re-allocation anymore, since that would invalidate the existing references. This is why types like this one are implemented using a non-reallocating approach, and in turn their (equivalent of a) push method can have a more flexible signature (&self, value: T) -> &mut T with a shared-reference self type; not (&mut self, value: T) -> &mut T or (&mut self, value: T).

This is not to say that re-allocations are no overhead. They certainly are. If you know the amount of data in advance, reserving it in advance is commonly recommended when using Vec, because some overhead is saved. And if a use-case of a Vec spends some (small but) measurable percentage of time on re-allocations, of course, changing the data structure and seeing whether it gives performance benefits is a very reasonable thing to do.

2 Likes

One other scenario where this could be beneficial is if most of your memory is in that one vector. With reallocation, you're using 2x memory (and 3x virtual memory).

1 Like

This crate might interest you. It's focus is on storing unsized objects continously.

That crate is nightly-only (relying on unstable features) and as far as I can tell also buggy.

Thank you for filing an issue!