I am trying to create what is essentially a u24 (three byte integer).
I don't need it to have all the functionality of the built-in integers, just a few things:
It must be exactly three bytes (hard requirement) and support basic addition, subtraction, and multiplication, all with options for both checked and wrapping behavior.
I have it working using a struct with three fields of type u8. It has a size of 3 but my math functions all require converting into u32 to perform arithmetic. Is there a better way? Also, is my naive struct guaranteed to have the same size on all platforms, or do I need to do something else to help ensure that is true?
You can also implement math operating directly on the u8 fields, but converting to u32 is probably the easiest. There is no guarantee about the layout of Rust structs, but it probably will be three bytes. To guarantee that it is three bytes, use the #[repr(C)] annotation.
Keep in mind that due to alignment restrictions, a type that has a size less than the alignment will likely be padded anyway in order to meet those restrictions.
In other words, a 3 byte type on a machine that uses 4-byte padding will cause the 4th byte to be lost to padding. Given that, it seems better then to keep the byte around as a functional part of your type and increase the range of values that is representable.
In other words, just use a u32 for the RAM representation. If you want to send those values over the network, you can create a newtype over u32 that effectively de/serializes to a u24 when in transit, and back to the newtyped u32 when receiving it on the other side.
Thanks, that may be what I need. Using an array is something I hadn't considered. Calling std::mem::size_of on your solution does yield 3, but I thought an array would need a length value stored somewhere. I guess that isn't true unless using slices as opposed to fixed-length arrays?
Values of this struct will be a field in a larger struct which will be stored in memory in very large quantities, and that larger struct has a fixed size of 8 bytes of which 5 are already taken. If this field is more than three, it throws the larger struct off. I thought of converting back and forth between the 3-byte representation while in the larger struct and a u32 when used in isolation, but I don't like that I would have two different types representing the same thing.
Arrays ([T; N]) have a fixed size of N in Rust, so the compiler doesn't need to store their length in memory.
Indeed having massive amounts of that u24 and having a case where one more byte would bump the overall size up by a lot, is a valid use.
My u24 should fit there, as [u8; N] has an alignment of 1, so it should work.
In that case you can use #[repr(C)] to control the field order in the layout, including field alignment, in which case it might be more code- or cycle-efficient to store the u24 as one u16 and one u8, ordering them so that the u16 is properly aligned on a 2-byte boundary. Benchmark both representations to determine whether one offers an advantage over the other.
It depends on the instruction set of the processor. For example, if the item is stored as three u8s, the generated code to retrieve it could involve three loads, two 8-bit left shifts, and two logical-ORs; whereas if it's stored as u8 and u16 that would be two loads, one 8-bit or 16-bit shift, and one logical-OR. Or the best alternative for a fetch might be a 32-bit load followed by masking via an AND instruction to the lower 24 bits. Different ISAs and different memory layouts result in different sequences of optimal code.
Note that you did not specify what ISA your system is using. X86? X86-64? An ARM variant? RV64G?
Currently I am building on X86-64, but it is for a cross-platform library, so no assumptions.
I was thinking more of the rust code to convert my type to and from u32 values for addition and multiplication. Separate u8 and u16's would both have to be converted to u32 and then a bit shift and or-ed together. Then again, I suppose that's what the u32::from_le_bytes call is doing itself, right?
I suppose an 8 byte struct containing an “u24” can be both ergonomic and efficient if you introduce a helper struct with an actual u32. This does not seem to have any efficiency implications, take a look at this example:
Another way might be to make it a big array of u64 (or a struct wrapping a u64) and have getters/setters to pull the two values (effectively a u40 and a u24) from that value through shifting and masking. Then there are no packing or alignment worries.