Integer encoding vs tuple

I met a situation where information can be encoded into an u64 integer. And then I realized that I can achieve the same functionality using a tuple (u32, u8, u8, u8, u8). To get information out of an u64 integer, bit operations on bytes are performed. Should I pick tuple representation in this case for its simplicity and readability? Is there any performance penalty of using a tuple here? How fast (cpu cycles) is casting of u8 to larger int types (u32, u64, etc.)?

1 Like

There's performance penalty of using an integer here namely those bit operations. But don't trust humans without benchmarks on performance. Human brains are known to be bad at guessing performance.

2 Likes

The answer to a first approximation is "it (probably) doesn't matter". A lot of things in your code (I/O, memory allocation, random access/pointer hopping in data structures) are going to dominate the runtime, and so whether or not you are using tuples or integer operations will be down to the noise in most cases.

And if it truly matters, then you'll need to benchmark it anyway. There's no simple rule saying how many "CPU cycles" it takes to cast a u8 to larger types. If it happens at all, then it can be zero, or 1 (by sign- or 0-extending) or more than one – it depends on optimizations, the exact CPU model itself, what other code surrounds this operation, etc.

4 Likes

Depending on your usecase the biggest difference could be due to the changed alignment requirements of the type which you can manually alter by slapping a repr(align(8)) on the type.

I would go for readability. Depending on the calling code the compiler might optimize it to the same machine code anyways.

4 Likes

You could look to the bytemuck crate to help with converting between integers and a more-readable configuration:

use bytemuck::{Pod,Zeroable,self}; // 1.12.3

#[derive(Copy,Clone,Debug,Zeroable,Pod)]
#[repr(C,align(8))]
struct Bitfield {
    a: u32,
    b: u8,
    c: u8,
    d: u8,
    e: u8
}

impl From<u64> for Bitfield {
    fn from(packed:u64)->Self {
        bytemuck::cast(packed)
    }
}


impl From<Bitfield> for u64 {
    fn from(bf:Bitfield)->Self {
        bytemuck::cast(bf)
    }
}

impl<'a> From<&'a u64> for &'a Bitfield {
    fn from(packed:&'a u64)->Self {
        bytemuck::cast_ref(packed)
    }
}


impl<'a> From<&'a Bitfield> for &'a u64 {
    fn from(bf:&'a Bitfield)->Self {
        bytemuck::cast_ref(bf)
    }
}


impl<'a> From<&'a mut u64> for &'a mut Bitfield {
    fn from(packed:&'a mut u64)->Self {
        bytemuck::cast_mut(packed)
    }
}


impl<'a> From<&'a mut Bitfield> for &'a mut u64 {
    fn from(bf:&'a mut Bitfield)->Self {
        bytemuck::cast_mut(bf)
    }
}


fn main() {
    dbg!(Bitfield::from(0x0102_0304_0000_0005_u64));
    
    let mut buf:u64 = 0;
    let view: &mut Bitfield = (&mut buf).into();
    view.c = 0x42;
    
    println!("0x{buf:x}");
}
4 Likes

If the semantics of your code correspond to a tuple, I would prefer a tuple (or rather a proper struct with named fields, if it's not a one-off type). The compiler is free to rearrange the fields of the tuple so that it's packed in u64. But note that it's not guaranteed, other things may cause the compiler to use a different field ordering, or to insert extra padding. For example, loading data with bigger alignment may be faster.

Generally the compiler will optimize your code for speed rather than memory usage. If you want a smaller memory footprint, you should either choose a more explicit approach (e.g. using #[repr(C)] structs with handcrafted layout), or compile with -C opt-level=s. With regards to performance, there are too many obscure factors at play to say anything certain about which approach is faster, other than trying your options and benchmarking. Generally, giving the compiler extra flexibility is better, because it may e.g. decide to store parts of your struct in registers.

If you go with the u64 approach, note that you cannot just transmute u64 into a (u32, u8, u8, u8, u8). Since the field ordering of a tuple is unspecified, even if its size is 8 bytes one cannot say which parts of u64 are which field. You need to manually implement the packing and unpacking logic, encoding fields in specific bits of u64. For this reason alone, the tuple approach is likely to be faster and more ergonomic.

8 Likes

Unless you actually use “memory footprint” to mean “size of the compiled binary”, I would like to emphasize that opt-level=s is not a flag to minimize memory-usage at run-time, but to minimize the size of the compiled binary. While the machine code will take up memory at run-time, too, it’s usually not the most significant memory usage.

9 Likes

@steffahn does there exist a way to tell the compiler to optimize for memory footprint, or can one only update their struct layouts to minimize padding etc.?

No, AFAIK there’s no way to tell the compiler that, but…

…as far as I’m aware, the compiler generally by default already makes structs as small as possible; the point about the unstable nature of the default layout of structs and tuples in Rust is mainly that you are not supposed to rely on anything that is not absolutely guaranteed for correctness (and even more importantly also for soundness), which is why people will point out the possibility that the compiler could – theoretically – even chose to insert extra padding or alignment to a struct. I would assume this could definitely happen reasonably in specific cases like optimizing under profile-guided-optimizations or the like… or generally everywhere where “wasting” extra memory is universally considered a good trade-off. I am not sure if there are any situations at all where something like this can happen in today’s rustc – @afetisov [or anyone else] if you know of any code-examples where rustc makes structs larger than the minimum possible size, please share them :slight_smile: – regardless, it’s always important to keep code compatible with legal future compiler changes!

The TL;DR of the above discussion of course is: The compiler does – in practice – already makes structs as small as possible by default, as far as I’m aware, so you don’t need any flags. Future changes to this means you must not rely on this for correctness of your code… but if performance (which includes memory consumption, I think?) is (significantly) negatively affected by a compiler change to struct layouts, that would probably qualify as a performance regression, so you should be able to rely on this behavior for “performance” reasons. Or… who knows… if there are future (or present?) optimizations where extra memory consumption is really worth it for (time) performance benefits, then it might be time to add such a flag, but as of now, I don’t think we’d need it.

4 Likes

Normally there shouldn't be any performance penalty. It depends on the precise operations you're doing. As other people have already said you can benchmark if you care. But a quick way to see what's going on at a low level is to look at the generated assembly.

For instance, adding two u8 fields generates essentially the same assembly either way: godbolt.

3 Likes

core::mem::size_of::<(i16, i8)>() is 4, not 3. IOW: compiler tries to make things as small as possible, but wouldn't go after unaligned int's.

Usually it's good trade-offs between memory use and speed, but sometimes you may need something else.

1 Like

Well… it’s not possible to have the i16 be unaligned. After all, you can get a &i16 reference from a &(i16, i8), and (values behind) &i16 references are guaranteed to be aligned. So I’d call size 4 the “minimum possible size” in this case; it’s nothing that different optimization behavior from an additional compiler flag could improve upon.

Of course, opting out of the ability to create references to the fields is possible, and packed representation can then reduce memory requirements… well, I guess arguably the special case where you don’t opt into packed representation but never take references to the fields are possibly not perfectly optimized then?

2 Likes

Why do you say so? Suppose I'm using this tuple to represent two FAT12 entries. I would expect them to be 3 bytes in size and 2nd one would start right after 1st.

This would be adequate layout for 5150 where FAT12 was actually used, but on modern computer memory savings are usually not worth the introduced slowness.

Something is definitely wrong with that explanation. Here:

#[repr(packed)]
struct FOO(i16, i8);

pub fn main() { 
    println!("{}", core::mem::size_of::<FOO>());
}

sizeof is most definitely 3. If that's “impossible” then how that does happen?

It disallows taking references to the i16 field :slight_smile:

That's new thing. size_of<(i16, i8)> was 4 even when that was allowed.

As far as I’m aware, it was always disallowed, it was just not properly enforced. As far as I know, creating unaligned references is UB.

Edit: There we go: Behavior considered undefined - The Rust Reference

Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):

  • […]
  • A reference or Box<T> that is dangling, unaligned, or points to an invalid value.
1 Like

The language guarantees that the alignment of the type is a multiple of all alignments of its fields. In this case i16 has alignment 2, while i8 has alignment 1. This means the struct has alignment 2 (or greater). The size of a type is always a multiple of its alignment, so the size can never be 3, but can be 4.

The alignment of type must be a multiple of alignments of all its fields, because the field order is unspecified, so any field should be able to be the first one. A pointer to it would have the same alignment as the pointer to the type, so it must be properly aligned for the field. Also, all fields must be referenceable regardless of the allocation address of the value of the type. Since the type can have an address which is any multiple of alignment, it means that its alignment must be at least the multiple of its largest field alignment.

The size of type must be a multiple of alignment, because the language guarantees that the elements of an array are packed without any padding between them. Consecutive elements are offset by size_of::<T>() and must all be properly aligned, implying that size_of::<T>() is a multiple of alignment.

1 Like

Interestingly, reading the 1.10 version of that reference section “behavior considered undefined”, they don’t list it there. It’s rather vague on packed, stating

  • repr - specifies the representation to use for this struct. Takes a list of options. The currently accepted ones are C and packed, which may be combined. C will use a C ABI compatible struct layout, and packed will remove any padding between fields (note that this is very fragile and may break platforms which require aligned access).

Yeah… whatever “very fragile” and “may break” means :sweat_smile:.

Also that note is about “platforms which require aligned access”. Most platforms that were supported back then weren't like that: ARMv7+, x86, etc… they all support unaligned access just fine.

That's why they could use such a vague language.

1 Like

On nightly, there is the randomize-layout flag, which will shuffle the field order with no regard for minimizing the size. Demo: Compiler Explorer (you can change the seed in the compilation flags for different results)

3 Likes