Custom u24 type

I am trying to create what is essentially a u24 (three byte integer).

I don't need it to have all the functionality of the built-in integers, just a few things:

It must be exactly three bytes (hard requirement) and support basic addition, subtraction, and multiplication, all with options for both checked and wrapping behavior.

I have it working using a struct with three fields of type u8. It has a size of 3 but my math functions all require converting into u32 to perform arithmetic. Is there a better way? Also, is my naive struct guaranteed to have the same size on all platforms, or do I need to do something else to help ensure that is true?

2 Likes

You can also implement math operating directly on the u8 fields, but converting to u32 is probably the easiest. There is no guarantee about the layout of Rust structs, but it probably will be three bytes. To guarantee that it is three bytes, use the #[repr(C)] annotation.

1 Like

@m51 why not convert to u32 when loading and back to u24 by discarding the upper byte when writing?

1 Like

It even compiles down to a single instruction: Compiler Explorer

5 Likes

Keep in mind that due to alignment restrictions, a type that has a size less than the alignment will likely be padded anyway in order to meet those restrictions.

In other words, a 3 byte type on a machine that uses 4-byte padding will cause the 4th byte to be lost to padding. Given that, it seems better then to keep the byte around as a functional part of your type and increase the range of values that is representable.
In other words, just use a u32 for the RAM representation. If you want to send those values over the network, you can create a newtype over u32 that effectively de/serializes to a u24 when in transit, and back to the newtyped u32 when receiving it on the other side.

1 Like

s3bk,
Thanks, that may be what I need. Using an array is something I hadn't considered. Calling std::mem::size_of on your solution does yield 3, but I thought an array would need a length value stored somewhere. I guess that isn't true unless using slices as opposed to fixed-length arrays?

More detail:
Values of this struct will be a field in a larger struct which will be stored in memory in very large quantities, and that larger struct has a fixed size of 8 bytes of which 5 are already taken. If this field is more than three, it throws the larger struct off. I thought of converting back and forth between the 3-byte representation while in the larger struct and a u32 when used in isolation, but I don't like that I would have two different types representing the same thing.

1 Like

The length is stored in the type information.

2 Likes

Arrays ([T; N]) have a fixed size of N in Rust, so the compiler doesn't need to store their length in memory.


Indeed having massive amounts of that u24 and having a case where one more byte would bump the overall size up by a lot, is a valid use.
My u24 should fit there, as [u8; N] has an alignment of 1, so it should work.

2 Likes

In that case you can use #[repr(C)] to control the field order in the layout, including field alignment, in which case it might be more code- or cycle-efficient to store the u24 as one u16 and one u8, ordering them so that the u16 is properly aligned on a 2-byte boundary. Benchmark both representations to determine whether one offers an advantage over the other.

1 Like

@Tom: I would assume that the array and care to ensure it maps to a u32 with the highest bits unset, is the best way. The compiler can then generate a unaligned load and mask the upper bits out.

Storing my struct as a u16 and a u8 would make math operations on them as a whole more difficult, wouldn't it?

It depends on the instruction set of the processor. For example, if the item is stored as three u8s, the generated code to retrieve it could involve three loads, two 8-bit left shifts, and two logical-ORs; whereas if it's stored as u8 and u16 that would be two loads, one 8-bit or 16-bit shift, and one logical-OR. Or the best alternative for a fetch might be a 32-bit load followed by masking via an AND instruction to the lower 24 bits. Different ISAs and different memory layouts result in different sequences of optimal code.

Note that you did not specify what ISA your system is using. X86? X86-64? An ARM variant? RV64G?

Currently I am building on X86-64, but it is for a cross-platform library, so no assumptions.

I was thinking more of the rust code to convert my type to and from u32 values for addition and multiplication. Separate u8 and u16's would both have to be converted to u32 and then a bit shift and or-ed together. Then again, I suppose that's what the u32::from_le_bytes call is doing itself, right?

It generates u16 + u8 loads and stores and shifts the data together.

1 Like

I suppose an 8 byte struct containing an “u24” can be both ergonomic and efficient if you introduce a helper struct with an actual u32. This does not seem to have any efficiency implications, take a look at this example:

Another way might be to make it a big array of u64 (or a struct wrapping a u64) and have getters/setters to pull the two values (effectively a u40 and a u24) from that value through shifting and masking. Then there are no packing or alignment worries.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.