Best way of dynamic arrays in structures (c-like)?

Sorry for the vague/misleading topic title but I have a question about this. I'd like to use the zerocopy crate to convert an array of bytes into a structure. The structure I'd like to convert to and from bytes looks like this in C:

typedef volatile struct tagHBA_MEM
{
	// 0x00 - 0x2B, Generic Host Control
	uint32_t cap;		// 0x00, Host capability
	uint32_t ghc;		// 0x04, Global host control
	uint32_t is;		// 0x08, Interrupt status
	uint32_t pi;		// 0x0C, Port implemented
	uint32_t vs;		// 0x10, Version
	uint32_t ccc_ctl;	// 0x14, Command completion coalescing control
	uint32_t ccc_pts;	// 0x18, Command completion coalescing ports
	uint32_t em_loc;		// 0x1C, Enclosure management location
	uint32_t em_ctl;		// 0x20, Enclosure management control
	uint32_t cap2;		// 0x24, Host capabilities extended
	uint32_t bohc;		// 0x28, BIOS/OS handoff control and status
 
	// 0x2C - 0x9F, Reserved
	uint8_t  rsv[0xA0-0x2C];
 
	// 0xA0 - 0xFF, Vendor specific registers
	uint8_t  vendor[0x100-0xA0];
 
	// 0x100 - 0x10FF, Port control registers
	HBA_PORT	ports[1];	// 1 ~ 32
} HBA_MEM;

HBA_PORT looks like this in C:

typedef volatile struct tagHBA_PORT
{
	uint32_t clb;		// 0x00, command list base address, 1K-byte aligned
	uint32_t clbu;		// 0x04, command list base address upper 32 bits
	uint32_t fb;		// 0x08, FIS base address, 256-byte aligned
	uint32_t fbu;		// 0x0C, FIS base address upper 32 bits
	uint32_t is;		// 0x10, interrupt status
	uint32_t ie;		// 0x14, interrupt enable
	uint32_t cmd;		// 0x18, command and status
	uint32_t rsv0;		// 0x1C, Reserved
	uint32_t tfd;		// 0x20, task file data
	uint32_t sig;		// 0x24, signature
	uint32_t ssts;		// 0x28, SATA status (SCR0:SStatus)
	uint32_t sctl;		// 0x2C, SATA control (SCR2:SControl)
	uint32_t serr;		// 0x30, SATA error (SCR1:SError)
	uint32_t sact;		// 0x34, SATA active (SCR3:SActive)
	uint32_t ci;		// 0x38, command issue
	uint32_t sntf;		// 0x3C, SATA notification (SCR4:SNotification)
	uint32_t fbs;		// 0x40, FIS-based switch control
	uint32_t rsv1[11];	// 0x44 ~ 0x6F, Reserved
	uint32_t vendor[4];	// 0x70 ~ 0x7F, vendor specific
} HBA_PORT;

(Taken from https://wiki.osdev.org/AHCI.) As you can see, the author defines the array to only hold a single item, however this array can hold up to 32 items. Defined in Rust, this neatly translates into:

#[repr(C)]
#[derive(FromBytes)]
pub struct HbaMem {
    pub cap: u32,
    pub ghc: u32,
    pub is: u32,
    pub pi: u32,
    pub vs: u32,
    pub ccc_ctl: u32,
    pub ccc_pts: u32,
    pub em_loc: u32,
    pub em_ctl: u32,
    pub cap2: u32,
    pub bohc: u32,
    rsv: [u8; 116],
    pub vendor: [u8; 96],
    // Problem: dynamically sized type, may not decode into a structure properly
    pub ports: &[HbaPort]
}

#[repr(C)]
#[derive(FromBytes)]
pub struct HbaPort {
    pub clb: u32,
    pub clbu: u32,
    pub fb: u32,
    pub fbu: u32,
    pub is: u32,
    pub ie: u32,
    pub cmd: u32,
    rsv0: u32,
    pub tfd: u32,
    pub sig: u32,
    pub ssts: u32,
    pub sctl: u32,
    pub serr: u32,
    pub sact: u32,
    pub ci: u32,
    pub sntf: u32,
    pub fbs: u32,
    rsv1: [u32; 11],
    pub vendor: [u32; 4],
}

The problem is that zerocopy derivation requires structs (and their fields) to implement the Sized trait. Implementation of the FromBytes trait looks like this:

pub unsafe trait FromBytes {
    // NOTE: The Self: Sized bound makes it so that FromBytes is still object
    // safe.
    #[doc(hidden)]
    fn only_derive_is_allowed_to_implement_this_trait()
    where
        Self: Sized;
}

So, my question is, do structs implement Sized even with dynamically-sized fields? Or what would be the best, most idiomatic way of dealing with this problem while making zerocopy work (hopefully)?

What is wrong with this?

#[repr(C)]
#[derive(FromBytes)]
pub struct HbaMem {
    pub cap: u32,
    pub ghc: u32,
    pub is: u32,
    pub pi: u32,
    pub vs: u32,
    pub ccc_ctl: u32,
    pub ccc_pts: u32,
    pub em_loc: u32,
    pub em_ctl: u32,
    pub cap2: u32,
    pub bohc: u32,
    rsv: [u8; 116],
    pub vendor: [u8; 96],
    pub ports: [HbaPort; 1],
}

Would that array dynamically change depending on how many bytes I read? From what I know of rust, that would enforce array bounds, and so I'd only be able to access the first port -- right?

No, my example has a fixed length of one. Can you elaborate on how the HbaMem object is constructed? In C or in Rust?

The original code is in C, however I'm porting it to rust. The author notes that the ports array can have between 1-32 items (it is hardware-dependent).

If you want to use zerocopy, you would have to call it repeatedly on each item in the array. If you are porting it, is there any reason you must keep this exact structure?

Since the final array doesn’t have a fixed size, and const generics are not stable yet, a dynamically sized type may be the best option:

I’ve not used these myself, but the pattern is similar to the C code above: fixed size data followed by a final unsized field.

1 Like

That would, in principle, be one possible way to do it. That kind of type is very hard to create right now, and can only be accessed behind a pointer type.

I suppose one way I could do it would be to reconstruct the integer bytes only, then build the array manually (by using something like heapless::Vec). I'd need to do that anyway. If I construct it using something like Vec<HbaPort, U32>, then I can just loop through each port and set it up that way -- I don't need to fill the Vec to capacity.

I think I would declare ports as a [MaybeUninit<HbaPort>; 32]. Access to its members will (rightly) be unsafe. If you have some way to encode which are initialized, you can wrap a safe accessor method around it.

That might work. I take it zerocopy wouldn't work on MaybeUninit items?

No, that's not right. &[HbaPort] is not a dynamically sized type, it is a pointer. The spiritual equivalent to the original C code would be

Whether zerocopy knows what to do with that or not I have no idea (clearly you'd have to in some way indicate the length of the array when constructing HbaMem, and you could only use it behind a pointer of some kind, just as in C).

You couldn't construct an HbaMem (or even a reference to one) in safe Rust if it were defined like above. But you can make it generic over the array type of ports:

#[repr(C)]
pub struct HbaMem<P:?Sized> {
    // ...
    pub ports: P,
}

Then create HbaMem<[HbaPort; 10]> (let's say) and unsize it via coercion to HbaMem<[HbaPort]>. This is the only way to actually create a custom unsized type in safe Rust today. Again, I don't know how zerocopy may or may not handle this.

(I am aware you're talking about creating them in C and passing to Rust... you may consider this just accessory information)

2 Likes

Thanks. I'll give that a try.
Edit: No, I am talking about reading bytes from memory in a kernel-mode environment and then parsing those into actually useful struct fields. This isn't precisely what I'm doing, but I used the C code as an example of what I will have to do later in Rust, hopefully in a safe manner. However, it doesn't seem like that's the case. Both options are possible -- use MaybeUninit<T> and initialize the array manually or try creating a generic unsized type.

I've never worked with zerocopy, so I have no idea. Honestly, I've never used MaybeUninit either, except working through some guided exercises.

The downside of this approach is that it takes up the memory of the full 32 member array regardless of the actual size. Using an unsized type with a slice at the end seems to fit most accurately with what the C code is doing, but I'd have no idea how to make it work.

You do not want to do this ([MaybeUninit<T>; MAX_LEN]) if the C code is actually using FAM. If the structure has been allocated with space for e.g. 10 elements and you create a Rust reference to [MaybeUninit<HbaPort>; 32], you have UB. (Latent UB that is unlikely to cause problems, but UB.)

How do you know how many port control registers any given codebase is going to have to handle? If it's known at compile time, one (though not completely fault tolerant) approach would be to set the size at compile time (via target triple or environment variable). If it's not discoverable at runtime, this is your only choice (beyond just making access unsafe).

[self-plug] If the size of the FAM is discoverable at runtime, I have the slice-dst crate which is designed to make working with FAMs in Rust take as little localized unsafe as possible.[/self-plug]

But, given IIUC this is a pointer into kernel memory, I think your best bet may be just to accept access to the ports as unsafe (does a port with that number exist) and do something like

#[derive(Copy, Clone)]
struct HbaMemPtr {
    raw: ptr::NonNull<c_void>,
}

impl HbaMemPtr {
    fn cap(self) -> u32 {unsafe{
        ptr::read(self.as_ptr().offset(0x00).cast())
    }}
    fn ghc(self) -> u32 {unsafe{
        ptr::read(self.as_ptr().offset(0x04).cast())
    }}
    // etc
    fn vendor(self) -> &'static [u8; 0x60] {unsafe{
        &*(self.as_ptr().offset(0xA0).cast())
    }}
    // Then either give a raw pointer to ports
    fn ports(self) -> *const HbaPort {unsafe{
        self.as_ptr().offset(0x100).cast()
    }}
    // Or do the index inline
    unsafe fn ports(self, ix: usize) -> &'static HbaPort {
        &*(self.as_ptr().offset(0x100).cast::<HbaPort>().offset(ix))
    }
}

(This assumes that the pointed memory is live for 'static, but if it isn't, you can easily add the lifetime it is valid for into HbaMemPtr.)

Alternative design:

#[derive(Copy, Clone)]
struct HbaMemPtr {
    // SAFETY: cannot be reference,
    // as we access outside the sized portion
    raw: ptr::NonNull<HbaMem>,
}

#[repr(C, align(0x100))]
struct HbaMem {
    cap: u32,
    ghc: u32,
    // etc
    vendor: [u8; 0x60],
    // no ports
}

impl Deref for HbaMemPtr {
    type Target = HbaMem;
    fn deref(&self) -> &HbaMem {
        unsafe { self.raw.as_ref() }
    }
}

impl HbaMemPtr {
    // Or give out the raw pointer like above
    unsafe fn port(&self, ix: usize) -> &HbaMem {
        &*(self.raw.as_ptr()
            .offset(1) // at the end of the prefix
            .cast::<HbaMem>()
            .offset(ix)) // at the right index
    }
}

This design is more honest than a compile-time array size, because you are still just unsafely promising only to run the code on a machine with (at least) that many ports.

2 Likes

Right, if the C code has allocated the space, you don't want to treat it as this, but I understood the original request to be about data coming in in a zero-copy serialized format, in which case the rust code would free to allocate as much space as needed to fill out the full array, and then in a safe wrapper function (assuming they have a way to identify the initialized prefix), return an appropriately-sized reference to a slice of definitely-init HbaPorts.

If there's a need to use this to reinterpret an already existing memory region that isn't long enough, or if there's a space-saving need not to allocate the full amount of memory for the [HbaPort; 32], then you'll probably need to use a reference to an unsized struct, with whatever complication that adds.

I'm not honestly sure myself. There are so many ways of handling this problem. The AHCI specification says that the ports implemented (pi) register determines how many HBA ports are available (so its determinable at runtime). I was thinking of using a heapless::Vec to store the ports, and then just pushing each port that I find into that Vec, with a max of 32. This is how I might do it:

#[repr(packed)]
#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct HBAMemory {
pub cap: u32,
pub ghc: u32,
pub is: u32,
pub pi: u32,
pub vs: u32,
pub ccc_ctl: u32,
pub ccc_ports: u32,
pub em_loc: u32,
pub em_ctl: u32,
pub cap2: u32,
pub bohc: u32
}

#[repr(packed)]
#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct HBAPort {
pub clb: u64,
pub fb: u64,
pub is: u32,
pub ie: u32,
pub cmd: u32,
pub tfd: u32,
pub sig: u32,
pub ssts: u32,
pub sctl: u32,
pub serr: u32,
pub sact: u32,
pub ci: u32,
pub sntf: u32,
pub fbs: u32,
pub devslp: u32,
}

#[repr(C)]
#[derive(Clone, Debug)]
pub struct AhciDevice {
abar: usize,
pub mem: *mut HBAMemory,
pub ports: Vec<*mut HBAPort, U32>
}

impl AhciDevice {
pub fn new (abar: usize) -> Self {
let mut dev = AhciDevice {
abar,
ports: Vec::new(),
mem: (abar as *mut HBAMemory)
};
unsafe {
for port in 0 .. (*dev.mem).pi {
let addr = abar + 0x100 + ((port as usize) * 0x80);
dev.ports.push(addr as *mut HBAPort).unwrap();
}
}
dev
}
}

Of course, this means I need to use unsafe practically everywhere because I'm dereferencing an unverifiable pointer that might be dangling. And the other problem with this implementation is that if I implement methods on the HBAPort structure itself, I don't know if using "self" in those methods, then dereferencing the pointer I have to each port and calling a method on that port, will actually update the values in memory and not just the values in the structure. So I'm not really sure how to do this right now.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.