Match statement efficiency?


#1

How efficient is the match statement with a lot of values? I have a
table of about 350 items that map integer values to strings. The
integers are sparse with range from 254 to 51125. I started to use a
HashMap but had problems getting it initialized. In my case it would
be convenient to use the match statement, if it runs faster than O(n).


#2

Matches on enums with undeclared discriminants (such that the compiler can choose sequential values) are optimized down to a jump table, but matching on sparse integer values I’m afraid will probably degrade to a linear search unless the compiler is smart enough to make it a binary search, which I’m not sure that it is.

However, you can create a static slice of integer/string-literal pairs and use slice::binary_search_by() (only testing the integer value in the closure) to perform an actual binary search which will get you guaranteed O(log n) performance. You will have to make sure the slice is ordered correctly, however.

If you’d prefer to have a hashmap for the amortized constant-time search, you have a couple options.

The one everyone is going to suggest is lazy_static which allows you to create lazily initialized static values, even ones with destructors; unfortunately, this does mean leaking memory since there’s no safe way to free the allocation when the program exits (of course, the memory will still be reclaimed by the OS when the process is destroyed; the problem here is that tools like Valgrind will report this as a leak because the program didn’t explicitly free the allocation). Being that it’s a regular dynamic hashmap, it’s also still prone to collisions and thus will not have 100% constant-time lookup (thus “amortized”).

You do have another option, though. Check out rust-phf, which is a framework for compile-time hashmaps (and sets) with perfect hash functions, i.e. hash functions which will never collide because they can be optimized for the complete dataset all at once. The downside here is that you either have to use a compiler plugin which only works on the nightly channel, or create a build script which runs the PHF code-generator and writes the output to a new Rust source file to be imported into your crate either via include!() or as a regular module. There’s examples for both approaches on the project’s README.

I had a similar use-case for my mime_guess crate, which is basically just a wrapper around a static mapping of file extensions to MIME types. My initial release used the binary-search method, and it was satisfactory, but sensitive to both ordering, and case. After refactoring to use phf-codegen, I got a sweet little performance boost and was able to drop the strict requirement on case for the input mapping–I could drop the ordering requirement too, but I’d prefer to keep that in the interest of the maintainers’ (i.e. my) sanity.


#3

Thanks for the detailed reply and the links to more info.

Lazy initialization was what I was trying to do initially. I will study the example to learn how its done. Not freeing the memory is fine with my case.

I really like perfect hashing and that sounds like the best thing to me.

Building it into the match command would be cool for a future version, in fact it would be “perfect”.