Packet structure and endianness review

I just published a crate for representing semantic versions as integers (i32, i64, etc.) (see https://docs.rs/embedded-semver/0.1.0/embedded_semver), but wanted to make sure that there's nothing fancy going on in the docs or internals relating to the endianness (disclaimer: I haven't done much work with bit-mangling).

Packet structure in 32-bit environments:

0         2        12        22        32
├────┴────┼┴┴┴┴┴┴┴┴┴┼┴┴┴┴┴┴┴┴┴┼┴┴┴┴┴┴┴┴┴┤
│ API ver │ Major   │ Minor   │ Patch   │
│ u2      │ u10     │ u10     │ u10     │
└─────────┴─────────┴─────────┴─────────┘

api version = magic number that allows extending the crate in future for different storage representations.

So basically one can do:

let version = Semver::new(1, 0, 20);
let int_semver = version.to_i32().unwrap();
assert_eq!(int_semver, 83886081);
assert_eq!(&int_semver.to_le_bytes(), &[0b0000_0001, 0b0000_0000, 0b0000_0000, 0b0000_0101]);

What I'm concerned about, is the ordering of bytes "right" by convention here? E.g. in the slice before &[0b00 the first bits 00 would be the API version (00 = 0, 01 = 1, 10 = 2, etc.).

What about the whole byte array, does the actual slice above conform to the documentation? Are bytes read from left-to-right (0-32) or right-to-left?

So if I understand correctly, these fields (api ver, major, minor, patch) are big-endian (most significant bytes first), but what about the whole 4-byte field ordering?

There's also the to_le_bytes conversion that results to the byte slice above.

Big Endian means the most significant bytes are read first in series. That would typically be left-to-write when written as an integer or array (because lower addresses are read first, which would be the start of the byte array.)

It's hard to tell what is correct here because the endianness of the u32 and the endianness of your struct are conceptually independent. You're encoding Semver as a u32 with some endianness, then trying to decode the u32 into an array, using its own endianness. Which means you could get different results by converting the struct to an array directly than if you converted to a u32 on the way. You'd probably want to avoid that, but either way, only you can say if you got the correct u32. From there, the to_le_bytes et al must be correct because they came from a correct u32.

In other words, you maybe shouldn't be trying to encode your Semver as a u32, but as an integer, and then you chose the u32 that appropriately represents that integer. Then you can use whatever endianness the u32 uses by default.

For example, Semver::new(1, 0, 20) could be mapped to the integer 0b00 0000000001 0000000000 000010100, which is definitely not 83886081u32. (It is 524308u32) This way you practically don't have to care about the endianness at all.

But again, this is just one way to do it. There might be reasons to force a particular bit encoding, (but then you probably shouldn't care which u32 it actually ends up as, because you just need the bits for masking, etc. You almost certainly won't be doing any u32 operations on it...)

Thank you for a great reply! I did some further reading and testing based on that, and this is the conclusion I've come into so far. There are three different endianess considerations in this case:

  1. i32/u32 representation and conversion to internal byte array (3).
    Matters, because this is the 'api' between code internals and
    outside world. Must choose consistant representation. I'm leaning for BE (now LE).
    I could also leave out the integer-representation, and use only byte array.

  2. Field -specific endianess. Relevant (internally) only if a field overlaps
    multiple bytes (as per bitvec crate documentation). Using BE.

  3. Internal byte array [0b0, 0b0, 0b0, 0b0] endianess.
    Using Msb0 (most significant bit first) feature of bitvec crate.

Longer explanation through examples:

bit positions of a 4-byte array
ASSUMPTION 1: the convention is to start from zero at the left
ASSUMPTION 2: this representation doesn't depend on system endianness

00_00_00_00    00_00_00_00    00_00_00_00    00_00_00_00
0  2  4  6     8 10 12 14    16 18 20 22    24 26 28 30
 1  3  5  7     9 11 13 15    17 19 20 23    25 27 29 31

Number 1 into BE bytes => last bit (no. 32) will be set to 1 on all systems

assert_eq!(
    1u32.to_be_bytes(),
    [0b00_00_00_00, 0b00_00_00_00, 0b00_00_00_00, 0b00_00_00_01]
);

Number 1 into LE bytes => bit no. 8 will be set to 1 on all systems

assert_eq!(
    1u32.to_le_bytes(),
    [0b00_00_00_01, 0b00_00_00_00, 0b00_00_00_00, 0b00_00_00_00]
);

ASSUMPTION 1) storing [0b0, 0b0, 0b0, 0b1] into a file, or sending it through a
network will always result on the same byte array regardless of architecture

ASSUMPTION 2) doing the above asserts will always be successful regardless of platform

HENCE: one can do u32::from_be_bytes([0b0, 0b0, 0b0, 0b1] and it will result to
1u32 on every platform regardless endianness

using bitvec crate to manipulate individual bits

let mut bv: BitArray<Msb0, [u8; 4]> = BitArray::zeroed();

storing 2u8 into first 2 bits of slice will store [1, 0] into the first bits

bv[0..2].store_be(2u8);
assert_eq!(bv.as_buffer()[0], 0b10_00_00_00);

[1, 0] because 2u8 to BE bytes equals to that

assert_eq!(2u8.to_be_bytes(), [0b10]);

ASSUMPTION: big endianness here doesn't really matter, since the storage size is <8 bits
most significant bit is at the left side. Storing in LE format would produce the same:

bv[0..2].store_le(2u8);
assert_eq!(bv.as_buffer()[0], 0b10_00_00_00);

This is where I got a few more grey hair:

bv[2..12].store_le(2u16);
assert_eq!(&bv.as_buffer()[..2], &[0b10_00_00_10, 0b00_00_00_00]);
                                        xx xx xx    xx xx
                                        2  4  6     8 10

notice that the 2u16 [1, 0] was stored at the end of the last byte
had to use store_be

bv[2..12].store_be(2u16);
assert_eq!(&bv.as_buffer()[..2], &[0b10_00_00_00, 0b00_10_00_00]);
                                        xx xx xx    xx xx
                                        2  4  6     8 10

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.