Question about simple bit manipulation

axyswert · December 20, 2024, 12:14am

Hello, everyone! I’ve recently started learning Rust, and this is my first post on this forum. I’d greatly appreciate any tips or suggestions you might have!

I was trying to learn something about bit manipulation and the core::memory module in Rust. I started with a simple task of converting between the octet and decimal representations of IPv4 addresses. For instance, the IPv4 address 192.229.162.211 translates to 3236274899 in decimal. Similarly, 167824216 translates back to 10.0.203.88. This was relatively easy.

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct IPv4(u32);

impl IPv4 {
    pub const fn from_octets(address: &[u8; 4]) -> Self {
        Self(u32::from_be_bytes(*address))
    }

    pub const fn to_octets(self) -> [u8; 4] {
        self.0.to_be_bytes()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ipv4_1a() {
        let input = [192, 229, 162, 211];
        let result = 3236274899;
        assert_eq!(IPv4::from_octets(&input), IPv4(result));
    }

    #[test]
    fn test_ipv4_1b() {
        let input = 167824216;
        let result = [10, 0, 203, 88];
        assert_eq!(IPv4(input).to_octets(), result);
    }
}

Then, I wanted to solve an analogous problem for IPv6. In this case, an address decomposes into segments rather than octets, so the previous approach doesn’t really work. In the end, I came up with the following code:

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct IPv6(u128);

impl IPv6 {
    pub fn from_segments(address: &[u16; 8]) -> Self {
        Self(if cfg!(target_endian = "big") {
            unsafe { transmute::<[u16; 8], u128>(*address) }
        } else {
            unsafe { transmute::<[u16; 8], u128>(address.map(|x| x.reverse_bits())) }.reverse_bits()
        })
    }

    pub fn to_segments(self) -> [u16; 8] {
        if cfg!(target_endian = "big") {
            unsafe { transmute::<u128, [u16; 8]>(self.0) }
        } else {
            unsafe { transmute::<u128, [u16; 8]>(self.0.reverse_bits()) }.map(|x| x.reverse_bits())
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ipv6_1a() {
        let input = [0x2607, 0xf8b0, 0x4005, 0x808, 0x0, 0x0, 0x0, 0x2004];
        let result = 0x2607f8b0400508080000000000002004;
        assert_eq!(IPv6::from_segments(&input), IPv6(result));
    }

    #[test]
    fn test_ipv6_1b() {
        let input = 0x20010db8000000000000ff0000428329;
        let result = [0x2001, 0x0db8, 0x0, 0x0, 0x0, 0xff00, 0x0042, 0x8329];
        assert_eq!(IPv6(input).to_segments(), result);
    }
}

I am not really satisfied with my solution, though. Compared to the IPv4 implementation, it has several issues:

It seems overengineered to me, given how simple the problem is (just bit manipulation, after all).
I’m not sure if the code is portable: does it really work correctly on big-endian machines?
I can no longer declare the functions as const, even though converting between representations is exactly something one would reasonably expect to be done at the compile time.

How could this code be improved? Many thanks!

edit: corrected typos.

kpreid · December 20, 2024, 5:11am

The solution to all your problems — unsafe, const, and portability — is to write the code the tedious, boring way.

impl IPv6 {
    pub const fn from_segments(address: [u16; 8]) -> Self {
        Self(
            ((address[0] as u128) << 112)
                | ((address[1] as u128) << 96)
                | ((address[2] as u128) << 80)
                | ((address[3] as u128) << 64)
                | ((address[4] as u128) << 48)
                | ((address[5] as u128) << 32)
                | ((address[6] as u128) << 16)
                | address[7] as u128,
        )
    }

    pub const fn to_segments(self) -> [u16; 8] {
        let n = self.0;
        [
            (n >> 112) as u16,
            (n >> 96) as u16,
            (n >> 80) as u16,
            (n >> 64) as u16,
            (n >> 48) as u16,
            (n >> 32) as u16,
            (n >> 16) as u16,
            n as u16,
        ]
    }
}

Passes your tests, and it is certain to work the same on big-endian machines because it never touches the byte representation of anything.

These could be written as while loops for slightly less repetition, but I don't like that style here, for no particular reason.

Kyllingene · December 20, 2024, 7:11am

Why not use u128::{from,to}_{le,be,ne}_bytes?

steffahn · December 20, 2024, 7:15am

I thought of these, too. The issue why it's not quite as simple here is that it's producing u16 not u8 parts.

Kyllingene · December 20, 2024, 7:17am

True, but I still think it'd be cleaner... then again, I haven't tried it.

cod10129 · December 20, 2024, 7:44am

Well I did.

OP’s code can be written like this:

impl IPv6 {
    pub const fn from_segments([a, b, c, d, e, f, g, h]: [u16; 8]) -> Self {
        let [[a, b], [c, d], [e, f], [g, h], [i, j], [k, l], [m, n], [o, p]] = [
            a.to_be_bytes(),
            b.to_be_bytes(),
            c.to_be_bytes(),
            d.to_be_bytes(),
            e.to_be_bytes(),
            f.to_be_bytes(),
            g.to_be_bytes(),
            h.to_be_bytes(),
        ];
        Self(u128::from_be_bytes([
            a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
        ]))
    }

    pub const fn to_segments(self) -> [u16; 8] {
        let [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p] = self.0.to_be_bytes();
        [
            u16::from_be_bytes([a, b]),
            u16::from_be_bytes([c, d]),
            u16::from_be_bytes([e, f]),
            u16::from_be_bytes([g, h]),
            u16::from_be_bytes([i, j]),
            u16::from_be_bytes([k, l]),
            u16::from_be_bytes([m, n]),
            u16::from_be_bytes([o, p]),
        ]
    }
}

If you don’t need const, you can use <[T; N]>::map in from_segments to shorten it a lot.

You can also introduce helper functions to shorten the repeated names of to_be_bytes and from_be_bytes:

impl IPv6 {
    pub const fn from_segments([a, b, c, d, e, f, g, h]: [u16; 8]) -> Self {
        const fn tb(n: u16) -> [u8; 2] { n.to_be_bytes() }
        let [[a, b], [c, d], [e, f], [g, h], [i, j], [k, l], [m, n], [o, p]] =
            [tb(a),  tb(b),  tb(c),  tb(d),  tb(e),  tb(f),  tb(g),  tb(h)];
        Self(u128::from_be_bytes([
            a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
        ]))
    }

    pub const fn to_segments(self) -> [u16; 8] {
        const fn fb(l: u8, r: u8) -> u16 { u16::from_be_bytes([l, r]) }
        let [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p] = self.0.to_be_bytes();
        [
            fb(a, b), fb(c, d), fb(e, f), fb(g, h),
            fb(i, j), fb(k, l), fb(m, n), fb(o, p),
        ]
    }
}

Comparing your code to std::net::Ipv6Addr, you use a u128 for storage, std uses [u8; 16]. Std’s method for converting [u16; 8] -> Ipv6Addr^[1] converts all the u16s to big-endian and then transmutes. Ipv6Addr::segments^[2] transmutes [u8; 16] -> [u16; 8] and then converts the u16s from big-endian. The std methods are also const, because they manually write out the [T; N]::map instead of calling the function.

Ipv6Addr::new source ↩︎
source ↩︎

axyswert · December 20, 2024, 3:12pm

Thanks for the replies. I just checked with cargo miri test --target s390x-unknown-linux-gnu that both @cod10129's and @kpreid's solutions work on big-endian architectures.

Topic		Replies	Views
&Vec<u8> to u128 help	11	2003	May 16, 2023
In place network bytes parsing	8	2873	August 20, 2021
Extracting Bits From Bytes help	4	3800	September 16, 2022
Any bit type available in rust ? u8 is byte type , but i need a bit type help	5	502	December 5, 2023
Modify floats' bits help	4	1073	January 5, 2020

Question about simple bit manipulation

Related topics