Efficient parsing of byte array


#1

Say I have a byte array like

let x: &[u8; 4] = &[...];

and I want to interpret it as a big-endian u32. My approach is

let x_parsed = u32::from_be(mem::transmute(*x))

My question is: is there a faster way to do this, or is there a safe way that will optimize the same? I’d rather not turn off the type checker with transmute, since for example x (a pointer) would be accepted on a 32-bit system (though obviously be wrong).

I’ve seen things like byteorder, but I don’t know if they incur a performance penalty with unnecessary bounds checking (since I’m working on fixed-length arrays I don’t need runtime checking).


#2

Rustc is clever enough to optimize away bound checks when it knows the length of the input array, even when you access the array through a high-level abstraction like byteorder (which sees the array as a slice, and accesses it via the Read trait).

Here’s a playground snippet that proves it. If you look at the assembly of to_be_u32 in release mode, you will find it to be as simple as it gets, and certainly devoid of bound checks.


#3

I agree with @HadrienG, and anyway the proof is in the pudding (ie asm output). The only thing to keep in mind is if inlining of byteorder functions doesn’t happen then you’ll get length checks. They’re marked #[inline] but not #[inline(always) so it’s up to compiler’s whim a bit. If this is a serious concern I’d just inspect the assembly of the hot paths and make sure they’re devoid of bounds checks. Debug builds will be slower though, if that’s at all a concern.


#4

Thanks for the advice. Makes for neater code :slight_smile:

EDIT: I tried out the gist myself, and you are right exactly the same :smiley: https://play.rust-lang.org/?gist=aa1f6963a1307e1447e83af6ccf4af1d&version=stable


#5

Also note that the transmute could trigger undefined behaviour because the byte array may not be aligned enough for u32.


#6

Is that actually true? My understanding is transmute does not mandate alignment requirements, specifically that source and destination have same alignment.


#7

Ah yes, it’s the value being transmuted not the reference, my mistake.


#8

BTW, the as cast here may be enough, and it’s safe than transmute.