I quite like ndarray. However, there is some wrapper overhead and is slightly slower when creating the data structure and is slower on reads than a vec of vecs.
Somehow, something I can't explain yet and must be due to some Rust / LLVM SIMD optimizations, I can loop over each element of every vec in the set of 1B x 60 vecs of vecs in 27ns on a single core, which seems impossible on the surface.
Vecs in Rust in general, are crazy fast; faster than I can replicate in C. Amazing.
Using a flat indexed vec, this slows to 13ms, which is also fast, but slower than vec of vecs.