It's sort of expected. The debug mode generates dumb and inefficient code on purpose. I'm not sure if it's worth worrying about it, because without the optimizer there's not much you can do. In debug mode abstractions are not zero-cost, so the only thing you can do is to remove abstractions and make code do less work.
When the application is too slow to use in the debug mode, the easy way out is to enable optimizations in your Cargo.toml:
An interesting thing is, if I order the inside of the match statement alphabetically, the 'non-match' takes twice as long to run.
I wouldn't have expected the order of the inside of the match statement to affect performance. I expected the compiler to magically do its best and get the same result no matter what..
I've not understood all of this, but I'm seeing this match statement on byte slices, and reminded me of a similar case I've optimized. You could have a look at this commit on how I used the byteorder crate here.
One thing there was pretty instructional for me, and pointed me to this optimization: My first version mainly contained a match statement on byte slices of length 8, all of which ended with a blank. I figured I could remove that blank, so the code has to read one byte less. But: performance was way worse! It was shown to me that this was the llvm recognized reading 8 bytes as reading a u64, but reading 7 bytes could not be optimized this way. So I employed byteorder to leverage that even further, and maybe that's something that can help you, too.
This should be using a precomputed HashMap to dispatch it: I recommend using ::phf and its precomputed perfect hash functions to do this:
use ::phf::{
phf_map,
Map,
};
fn parse_hdr<'bytes> (
data: &'bytes [u8],
buf: &'bytes [u8; 64],
table: &'_ [u8; 256],
) -> Result<
HdrName<'bytes>,
InvalidHeaderName,
>
{
let len = data.len();
assert!(
len < MAX_HEADER_NAME_LEN,
"header name too long: max length is {}",
MAX_HEADER_NAME_LEN,
);
Ok(match len {
| 0 => {
return Err(InvalidHeaderName::default());
},
| len if len > 64 => {
HdrName::custom(data, false)
},
| _ => {
use self::StandardHeader::*;
static STANDARD_HEADERS: Map<&[u8], StandardHeader> = phf_map! {
b"te" => Te,
b"age" => Age,
b"via" => Via,
b"dnt" => Dnt,
// ...
};
let buf = &buf[.. len];
match STANDARD_HEADERS.get(buf) {
| Some(&standard_header) => {
standard_header.into()
},
| None => {
if buf.contains(&b'\0') {
return Err(InvalidHeaderName::default());
} else {
HdrName::custom(buf, true)
}
},
}
},
})
}
The currently released version, 0.7.24,
requires the nightly #![feature(proc_macro_hygiene)],
but the repository is currently working on the next 0.8.0 release,
which fixes it so that the macro can be used in stable rust.
you can thus, in the meantime, work with it in stable rust by using the following Cargo.toml:
[dependencies]
phf = { version = "0.7.24", features = ["macros"] }
[patch.crates-io]
phf = { git = "https://github.com/sfackler/rust-phf" }
I don’t have a ton of experience with this kind of problem, but the aho-corasick crate generates a finite state machine that I would think would work better than a simple match statement.
Edit:
After reading some more it sounds like the crate I suggested does more than you need and that a precomputed hash table as suggested may be the best bet. You could also try a BTreeMap or a radix trie if you want some other ideas of things to try.
Does this mean the compiler is not smart enough to transform a match statement into hashed lookups on its own? Why can't this (or a similar optimization) be done automatically?
And then suddenly we are back to "slow" / classic match statement because our newtype is not hashable, and all this, again, happening implicitly.
another reason would be that hash-table lookup only becomes more efficient after a certain point, very hard to guess exactly by a "smart" compiler. This would mean that in some cases the compiler would choose a construct slower than the optimal one, while the programmer is oblivious to all this because of it being implicit.
On the other hand, this is a classical situation where a macro can provide this behavior: using a macro would make the hash-based nature of the lookup no longer implicit, thus, for instance, allowing it to make compilation fail with the above newtype example (complaining against the lack of a Hash implementation on the newtype).