Bitfield and Union for low level data structures and type conversion in rust

Hi everybody,
I am really new to Rust (just a few days). In my startup (Pollen Robotics) we choose to switch from C to Rust but I still have C running on my head. I will need some help to switch my thought flow to Rust.

My first struggle is due to data structures and unions. In my company we do some pretty low level and embeded things so I need to manage bitfield data and unions. Here is the code like I think it in C :

typedef struct __attribute__((__packed__)){
    union {
        struct __attribute__((__packed__)){
            unsigned short protocol : 4;
            unsigned short target : 12;
            unsigned short target_mode : 4;
            unsigned short source : 12;
            unsigned char cmd;
            unsigned char size;
        };
        unsigned char unmap[6]; // Unmapped form.
    };
}header_t;

I use this union to switch easily from a mapped to an unmapped form. I can write to header_t.protocol or header_t.source and get it back as u8 array using header_t.unmap. This switch uses no time and shares the same memory block.

I tried to do the same thing in Rust but I didn't find a clean way to do it. I succeeded in making it using two structures and a dedicated impl to switch between them:

#[allow(dead_code)]
pub struct Header {
    protocol:    u8,  // 4 bits used
    target:      u16, // 12 bits used
    target_mode: u8,  // 4 bits used
    source:      u16, // 12 bits used
    cmd:         u8,  // 8 bits used
    size:        u8,  // 8 bits used
}

#[allow(dead_code)]
pub struct UnmapHeader{
    tab:[u8; 6],
}

impl Header {
    #[allow(dead_code)]
    pub fn unmap(&self) -> UnmapHeader {
        let mut unmap_header = UnmapHeader { tab: [0; 6],};
        unmap_header.tab[0] = (self.protocol & 0b0000_1111) | (self.target << 4) as u8;
        unmap_header.tab[1] = (self.target >> 4) as u8;
        unmap_header.tab[2] = ((self.target_mode as u8) & 0b0000_1111) | (self.source << 4) as u8;
        unmap_header.tab[3] = (self.source >> 4) as u8;
        unmap_header.tab[4] = self.cmd;
        unmap_header.tab[5] = self.size;
        unmap_header
    }
}

impl UnmapHeader {
    #[allow(dead_code)]
    pub fn map(&self) -> Header {
        Header{
        protocol: self.tab[0] & 0b0000_1111,
        target: ((self.tab[0] & 0b1111_0000) >> 4) as u16 & (self.tab[1] << 4) as u16,
        target_mode: self.tab[2] & 0b0000_1111,
        source: ((self.tab[2] & 0b1111_0000) >> 4) as u16 & (self.tab[3] << 4) as u16,
        cmd: self.tab[4],
        size: self.tab[5],
        }
    }
}

#[test]
fn switch() {
    let header = Header {
        protocol: 0b0000_1000,
        target: 0b0000_0100_0000_0001,
        target_mode: 0b0000_0100,
        source: 0b0000_0100_0000_0001,
        cmd: 0xAA,
        size: 10,
    };
    let unmap_header = header.unmap();
    assert_eq!(unmap_header.tab[0], 0b0001_1000);
    assert_eq!(unmap_header.tab[1], 0b0100_0000);
    assert_eq!(unmap_header.tab[2], 0b0001_0100);
    assert_eq!(unmap_header.tab[3], 0b0100_0000);
    assert_eq!(unmap_header.tab[4], 0xAA);
    assert_eq!(unmap_header.tab[5], 10);
}

Is there a more rustacean solution?

In Rust there's union{} as in C, if you don't want to pay for the tag of enum{}.

And regarding bitfields you have various possibilities like: https://github.com/dzamlo/rust-bitfield

Indeed I see that, but I don't want to think my code as a C programmer. I need to understand how a rust guy do it and a Rust guy probably don't use C style union...

I also try this, and this is great but it create getter and setter, so I can't manage my field as variables.

1 Like

I would probably just keep your Header struct but with the packed [u8; 6] storage (instead of named fields). Then I'd add getters and setters to read/write the various components. I don't think having Header and UnmapHeader buys you much.

Alternatively, keep the Header struct unpacked with plain fields. You'll waste 2 additional bytes, which I don't know if it matters to you or not (presumably it does or else you wouldn't have been packing in the first place).

1 Like

thanks @vitalyd, Your first solution seems to be the better.

Yes it matters in this case...

So if I understand well your proposition is something like :

pub struct Header {
    unmap:[u8; 6],
}
impl Header {
    pub fn get_protocol(&self) -> u8 {
        self.unmap[0] & 0b0000_1111
    }

    pub fn set_protocol(&mut self, value: u8) {
        self.unmap[0] = (self.unmap[0] & 0b1111_0000) | (value & 0b0000_1111);
    }
    ...
}

Ok, I figured as much but wanted to ask anyway :slight_smile:. On most CPUs, 48 vs 64 bytes doesn't matter too much because it's a single cacheline either way. But you're probably on microcontrollers.

That's right.

Edit: by the way, cool startup! :+1:

1 Like

you write :wink:

Thank's a lot for your help and your support.

union{} is a Rust feature. You should not overuse it, but if you're in a memory constrained system then it could be the right place to use few untagged unions.

1 Like

What would a union buy you here given there's no native bitfield support?

Indeed in this example the most important thing to manage is bitfield, so the solution manage mainly bitfield.

To explore the other side of the question, here is the next data structure (written in C) I have to create :

typedef struct __attribute__((__packed__)){
    union {
        struct __attribute__((__packed__)){
            header_t header;
            unsigned char data[MAX_DATA_MSG_SIZE];
        };
        unsigned char stream[sizeof(header_t) + MAX_DATA_MSG_SIZE];
    };
    unsigned short crc;
    volatile unsigned char ack;
}msg_t;

To explain it as text, I need to access to a memory bloc as (header_t + u8 array) or as (u8 array). This two optionns need exactly the same space. Using C pointer (and insecurity) this is completely free to read and convert this memory bloc.
There is only one level in this structure, I can write :
msg_t.header => give me the pevious example header_t structure type
msg_t.data => give me a data array
msg_t.stream => give me both previous data streamed into an array
msg_t.crc => give me the u16 of crc
msg_t.ack => give me the u8 of ack

In this example we have structure on union on structure. As I understand it, rust doesn't allow to make struct on struct. In my C example I use "unamed struct" in an union to pack data together without adding a free level.
Is this possible in rust?

What is the best solution to convert my struct header + data into/from stream easily?

1 Like

I think the closest analog would be something like:

#![feature(const_size_of, untagged_unions)]

use std::mem;

const MAX_DATA_MSG_SIZE: usize = 64;

#[derive(Debug)]
#[repr(C, packed)]
struct Header(i32, i32); // Just picking some fields here

#[repr(C, packed)]
struct HeaderAndData(Header, [u8; MAX_DATA_MSG_SIZE]);

#[repr(C, packed)]
union U {
    hd: HeaderAndData,
    stream: [u8; mem::size_of::<HeaderAndData>()]
}

Playground

This requires the nightly compiler and the two feature gates at the top.

Rust unions require unsafe to read/write their fields, which doesn't make them all that appealing in general. I think mostly they serve the purpose of matching C union layout when working with FFI.

A less efficient approach, although safe and robust, would be to serialize your structs into binary payloads using something like the bincode crate (coupled with serde). This is now serialization rather than memory overlays, of course, so it's not the same thing.

Another approach would be to replace the union with std::mem::transmute calls to when you want to "reinterpret" the struct as a byte array - that's also unsafe, and so isn't much of a gain apart from obviating the need for the union.

Maybe someone has a more elegant suggestion though.

1 Like

Thank's for this example!
You succeed to have approximately the same memory usage and time access, great!
This is a little bit verbose than C but I can deal with it.

The default I see here is the unsafe use propagation, I don't know if there is some general cons-indications about that in rust. If I understand your idea, the std::mem::transmute way keep unsafe access into the struct impl so "users" are less likely to do dirty things.

In general you want to avoid/minimize unsafe usage, both internal to your impls and of course externally visible to your callers. I think you might be able to hide the union from your callers here as well by providing only safe getters/setters but hard to say for sure without knowing more context. Of course if you expose the stream directly to callers, then they can do nasty things to it, knowingly or not. So you're back to C level of safety, pretty much :).

I would also time the serialization (bincode + serde) to see if it's too costly. If it's not, then it's the cleanest solution, I think (and doesn't require use of unstable features, as of today).

2 Likes

To explore other possibilities about bitfield, is this a solution to keep my bitfield structure in C and bind it using FFI in rust?

As I understand it, I have to reproduce the structure in Rust, so I can't use my C bitfield struct on rust?

You could pass that C struct to Rust as an opaque pointer and expose a C API to work with that pointer that the Rust code would call. So C would own that memory and layout, and Rust just sees a pointer with a bunch of functions that can operate on it.

Personally, I’d just stick to C if this case comes up very often :slight_smile:. If, however, this is a minor part of the overall code and the rest can be done in safe Rust, then it’s a reasonable alternative.

1 Like

bindgen generates Rust bindings from C header files. (Here's the result for your header.)

1 Like