Sunday musings about arrays and indexing


(Here I call “Arrays” arrays, slices, vects and library-defined very array-like data structures).

When I use Arrays in Rust I am often unsure if a specific Array access will have bound tests. Such tests are good to catch bugs, but the presence of such tests slows down the code a little inside loops, and worse they usually stop LLVM from vectorizing the code. So I inspect the asm code manually looking for calls to core9panicking18panic_bounds_check sections.

This isn’t handy at all. The minimum the compiler should do is to list what bound tests it didn’t manage to remove (on request, using a compiler switch that also allows to specify the function name to avoid being flooded by useless output). It’s the back-end that removes the useless Array bound tests, but it’s still better for the compiler to perform this job instead of me.

Time ago I suggested to introduce two slice access functions that give a compilation error if the back-end isn’t able to remove a specific bound test. Probably this is not easy to do because the compiler front-end needs feedback from the back-end. And LLVS skills at removing a particular Array bound test are not formallly specified, so a back-end update could introduce new compilation errors… This is not great, I guess.

When I write code that has to handle many arrays and I juggle many indexes, I’d like a syntax to specify strongly typed indexes and associate them with specific Array types (this is allowed in Ada language). This saves you from bugs if you try to access an Array with the wrong indexing variable (a classic bug is to access a 2D array row with a column index. Calling them ‘r’ and ‘c’ or ‘row’ and ‘col’ helps, but I’d prefer the type system to catch all bugs at compile-time. Even if sometimes you need to cast or swap index types, like when you transpose a matrix). This idea also goes well with another Ada feature, ranged values, an index value is verified to be inside its Array bounds even when you create/modify it. So the Array access itself doesn’t need to be bounds-tested.

One of the features I’d like in a future language is it to have logic tools to allow me to write down a proof that specific critical Array accesses are inside their bounds. This allows the compiler front-end to remove a bound test even in debug builds and it bypasses all the troubles with back-end optimizations.


I think this has been experimented before:

But I don’t know what came out of it. It seems legitimately useful, although the use of phantom lifetimes (as opposed to phantom types) means that (a) error messages are atrocious and (b) spurious lifetime errors can occur (Index<'id> should really be 'static).