Question about simple bit manipulation

Hello, everyone! I’ve recently started learning Rust, and this is my first post on this forum. I’d greatly appreciate any tips or suggestions you might have!

I was trying to learn something about bit manipulation and the core::memory module in Rust. I started with a simple task of converting between the octet and decimal representations of IPv4 addresses. For instance, the IPv4 address 192.229.162.211 translates to 3236274899 in decimal. Similarly, 167824216 translates back to 10.0.203.88. This was relatively easy.

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct IPv4(u32);

impl IPv4 {
    pub const fn from_octets(address: &[u8; 4]) -> Self {
        Self(u32::from_be_bytes(*address))
    }

    pub const fn to_octets(self) -> [u8; 4] {
        self.0.to_be_bytes()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ipv4_1a() {
        let input = [192, 229, 162, 211];
        let result = 3236274899;
        assert_eq!(IPv4::from_octets(&input), IPv4(result));
    }

    #[test]
    fn test_ipv4_1b() {
        let input = 167824216;
        let result = [10, 0, 203, 88];
        assert_eq!(IPv4(input).to_octets(), result);
    }
}

Then, I wanted to solve an analogous problem for IPv6. In this case, an address decomposes into segments rather than octets, so the previous approach doesn’t really work. In the end, I came up with the following code:

#[derive(Debug, PartialEq, Eq, Clone, Copy)]
pub struct IPv6(u128);

impl IPv6 {
    pub fn from_segments(address: &[u16; 8]) -> Self {
        Self(if cfg!(target_endian = "big") {
            unsafe { transmute::<[u16; 8], u128>(*address) }
        } else {
            unsafe { transmute::<[u16; 8], u128>(address.map(|x| x.reverse_bits())) }.reverse_bits()
        })
    }

    pub fn to_segments(self) -> [u16; 8] {
        if cfg!(target_endian = "big") {
            unsafe { transmute::<u128, [u16; 8]>(self.0) }
        } else {
            unsafe { transmute::<u128, [u16; 8]>(self.0.reverse_bits()) }.map(|x| x.reverse_bits())
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_ipv6_1a() {
        let input = [0x2607, 0xf8b0, 0x4005, 0x808, 0x0, 0x0, 0x0, 0x2004];
        let result = 0x2607f8b0400508080000000000002004;
        assert_eq!(IPv6::from_segments(&input), IPv6(result));
    }

    #[test]
    fn test_ipv6_1b() {
        let input = 0x20010db8000000000000ff0000428329;
        let result = [0x2001, 0x0db8, 0x0, 0x0, 0x0, 0xff00, 0x0042, 0x8329];
        assert_eq!(IPv6(input).to_segments(), result);
    }
}

I am not really satisfied with my solution, though. Compared to the IPv4 implementation, it has several issues:

  1. It seems overengineered to me, given how simple the problem is (just bit manipulation, after all).
  2. I’m not sure if the code is portable: does it really work correctly on big-endian machines?
  3. I can no longer declare the functions as const, even though converting between representations is exactly something one would reasonably expect to be done at the compile time.

How could this code be improved? Many thanks!

edit: corrected typos.

The solution to all your problems — unsafe, const, and portability — is to write the code the tedious, boring way.

impl IPv6 {
    pub const fn from_segments(address: [u16; 8]) -> Self {
        Self(
            ((address[0] as u128) << 112)
                | ((address[1] as u128) << 96)
                | ((address[2] as u128) << 80)
                | ((address[3] as u128) << 64)
                | ((address[4] as u128) << 48)
                | ((address[5] as u128) << 32)
                | ((address[6] as u128) << 16)
                | address[7] as u128,
        )
    }

    pub const fn to_segments(self) -> [u16; 8] {
        let n = self.0;
        [
            (n >> 112) as u16,
            (n >> 96) as u16,
            (n >> 80) as u16,
            (n >> 64) as u16,
            (n >> 48) as u16,
            (n >> 32) as u16,
            (n >> 16) as u16,
            n as u16,
        ]
    }
}

Passes your tests, and it is certain to work the same on big-endian machines because it never touches the byte representation of anything.

These could be written as while loops for slightly less repetition, but I don't like that style here, for no particular reason.

5 Likes

Why not use u128::{from,to}_{le,be,ne}_bytes?

1 Like

I thought of these, too. The issue why it's not quite as simple here is that it's producing u16 not u8 parts.

True, but I still think it'd be cleaner... then again, I haven't tried it.

Well I did.

OP’s code can be written like this:

impl IPv6 {
    pub const fn from_segments([a, b, c, d, e, f, g, h]: [u16; 8]) -> Self {
        let [[a, b], [c, d], [e, f], [g, h], [i, j], [k, l], [m, n], [o, p]] = [
            a.to_be_bytes(),
            b.to_be_bytes(),
            c.to_be_bytes(),
            d.to_be_bytes(),
            e.to_be_bytes(),
            f.to_be_bytes(),
            g.to_be_bytes(),
            h.to_be_bytes(),
        ];
        Self(u128::from_be_bytes([
            a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
        ]))
    }

    pub const fn to_segments(self) -> [u16; 8] {
        let [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p] = self.0.to_be_bytes();
        [
            u16::from_be_bytes([a, b]),
            u16::from_be_bytes([c, d]),
            u16::from_be_bytes([e, f]),
            u16::from_be_bytes([g, h]),
            u16::from_be_bytes([i, j]),
            u16::from_be_bytes([k, l]),
            u16::from_be_bytes([m, n]),
            u16::from_be_bytes([o, p]),
        ]
    }
}

If you don’t need const, you can use <[T; N]>::map in from_segments to shorten it a lot.

You can also introduce helper functions to shorten the repeated names of to_be_bytes and from_be_bytes:

impl IPv6 {
    pub const fn from_segments([a, b, c, d, e, f, g, h]: [u16; 8]) -> Self {
        const fn tb(n: u16) -> [u8; 2] { n.to_be_bytes() }
        let [[a, b], [c, d], [e, f], [g, h], [i, j], [k, l], [m, n], [o, p]] =
            [tb(a),  tb(b),  tb(c),  tb(d),  tb(e),  tb(f),  tb(g),  tb(h)];
        Self(u128::from_be_bytes([
            a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
        ]))
    }

    pub const fn to_segments(self) -> [u16; 8] {
        const fn fb(l: u8, r: u8) -> u16 { u16::from_be_bytes([l, r]) }
        let [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p] = self.0.to_be_bytes();
        [
            fb(a, b), fb(c, d), fb(e, f), fb(g, h),
            fb(i, j), fb(k, l), fb(m, n), fb(o, p),
        ]
    }
}

Comparing your code to std::net::Ipv6Addr, you use a u128 for storage, std uses [u8; 16]. Std’s method for converting [u16; 8] -> Ipv6Addr[1] converts all the u16s to big-endian and then transmutes. Ipv6Addr::segments[2] transmutes [u8; 16] -> [u16; 8] and then converts the u16s from big-endian. The std methods are also const, because they manually write out the [T; N]::map instead of calling the function.


  1. Ipv6Addr::new source ↩︎

  2. source ↩︎

3 Likes

Thanks for the replies. I just checked with cargo miri test --target s390x-unknown-linux-gnu that both @cod10129's and @kpreid's solutions work on big-endian architectures.

1 Like