Data structures for high performance code

A recently released nice talk about data structures used in LLVM:

The talk first shows three general data structures, a small vec (similar to GitHub - servo/rust-smallvec: "Small vector" optimization for Rust: store up to a small number of items on the stack ) a small set, and a small associative array, Chandler Carruth says that they are used all over the place in LLVM. Later he shows several kinds of tagged pointers, and few more data structures that are a little more complex.

The recurring theme in the talk is to use the stack as much as possible, and to pack data and bits as much as possible. In future Rust will need to encourage/help programmers more strongly to do both things. I think that if you rewrite some major piece of LLVM using Rust without data structures like those, you risk producing slower code that eats more RAM.

As the Rust community matures I think the average Rust programmers should use more data structures like those. To encourage the usage of those data structures is it worth having them in the standard library? Perhaps having them just in, plus having them used in the standard library and their usage explained in the standard documentation and common Rust books is enough :slight_smile:

Edit: perhaps it's also worth having a simple compiler option (or a tool) that instruments the binary so after a run of the program it prints how much large the small data structures need to have their stack-allocated part. This helps tune the code.


Speaking of packing data, I'm concerned about adding small overheads everywhere by overuse of usize. Because usability of all other integer types in Rust is quite bad, I end up using usize for almost everything, including values that would fit in u32 or even u8.

The combination of usize-only indexing, absolute lack of integer promotion, incomplete Into implementations and lack of type hinting for expressions all conspire to make non-usize types painful to use and require unreasonable number of unchecked as casts in the code.


Where I see performance is important, I use smaller integral numbers, and I cast to usize later.


That's probably the right way to do it but it's just annoying that whenever you end up using a u8 for indexing you have to cast it to a usize which just doesn't feel ergonomic. I am not sure if there is a better solution, I guess you could just implement indexing with u8's but while not technically unsafe it is also a really strange thing to do.

1 Like

This is a great video, thanks for sharing. Perhaps the best way to spur interest in the community is by teaching the community. I suspect that a few blog posts that compares the speed differences between these container types to common practices found in the wild can go a long way. I think there might be a good home for such discussion in the design patterns repository.

Another pattern that I personally would like to see studied are your quasi-allocator types Arena and TypedArena.

Impressive. Great video, and LLVM is not just being used with Rust, but other languages as well. In fact, I have been thinking on using LLVM myself on one particular project. Thanks.