How to write & read a vec of struct with unaligned fields?

This is a continuation of this thread. I have a struct like this:

struct MyStruct {
	other_items_offset: u32,
	foo: u32,
	bar: u8,
	baz: u8
}

It's not 32- or 64-bit aligned, I know. I just created it this way to see it with the smallest possible fields.

And I have a vector of them (about 100K items), that I want to save into a file, and then in another Rust binary in the same project, I want to read them into a vector (not mem-map, because I want to resize the vector, and for now it's just too complicated for me).

I tried to write it with Zerocopy. If a struct has just several u64's, everything is fine. But with this one, the approach won't work.

The derive macros don't work:

use zerocopy::{FromBytes, IntoBytes, Immutable};

#[derive(Debug, FromBytes, IntoBytes, Immutable)]
                           ^^^^^^^^^
`graph::VertexData` has inter-field padding
consider using `zerocopy::Unalign` to lower the alignment of individual fields
consider adding explicit fields where padding would be
consider using `#[repr(packed)]` to remove inter-field padding
the trait `PaddingFree<graph::VertexData, true>` is not implemented for `()`
but trait `PaddingFree<graph::VertexData, false>` is implemented for it
see issue #48214

I add #[repr(packed)] and then my test cases fail:

data_vec.iter().enumerate().for_each(|(i, v)| println!("rec {}, {}", i, v.other_items_offset));
                                                                        ^^^^^^^^^^^^^^^^^^^^
reference to packed field is unaligned
packed structs are only aligned by one byte, and many modern architectures penalize unaligned field accesses
creating a misaligned reference is undefined behavior (even if that reference is never dereferenced)
copy the field contents to a local variable, or replace the reference with a raw pointer and use `read_unaligned`/`write_unaligned` (loads and stores via `*p` must be properly aligned even when using raw pointers)

What can I do to pack it and unpack into binary files?

I could, of course, write my own pack/unpack routine, but I hope there's a better way.

So, you’ve got this struct:

struct MyStruct {
    other_items_offset: u32,
    foo: u32,
    bar: u8,
    baz: u8,
}

and you’d like to save a big Vec<MyStruct> to disk, then load it back later. You tried zerocopy, but it complained about padding. And when you slapped #[repr(packed)] on it, Rust yelled about unaligned accesses. That’s the heart of the issue: padding vs. alignment.


Here’s the deal:

  1. By default, Rust inserts padding between your fields to keep them aligned. With your struct, you’ve got two u32s, then two u8s. After those two u8s, Rust slips in two bytes of invisible padding so the whole struct size is a multiple of 4. That’s why zerocopy says, “hey, I can’t handle hidden padding.”
  2. You tried #[repr(packed)], which removes the padding. But then every field access becomes unsafe, because the CPU expects those numbers to be aligned. That’s why the compiler complains.

Now, how do we fix this? You’ve got a couple of paths:

  • One way is to just make the padding explicit. Add a little _pad: [u8; 2] field at the end. Now the struct has no hidden padding — zerocopy is happy, and you can write/read raw bytes directly. That works fine if you don’t mind those two extra bytes.
  • Another way, and the one I recommend, is to keep two versions of the struct.
    The “real” struct (MyStruct) is safe, aligned, and nice to work with.
    The “on-disk” struct (MyStructOnDisk) is marked #[repr(C, packed)], so it has exactly the layout you want to save. You only use that one when you’re writing to or reading from disk.
    Then, you just convert back and forth with a couple of From implementations. That way, you never actually do normal work with the unsafe packed type — you just treat it as bytes at the boundary.
  • And of course, the simplest option is to skip all the layout fuss and just use serde with something like bincode. That’ll serialize the struct compactly for you, though it won’t match the raw in-memory layout. For 100k items, it’ll still be pretty fast.

So, if I were in your shoes? I’d go with that second option:
define a packed MyStructOnDisk for the file format, keep MyStruct for your actual program, and just convert at the edges. That way you don’t trip over padding or alignment, and you can still use zerocopy safely to read and write chunks of data.

1 Like

This is not true. It is safe to copy a repr(packed) field. The catch is that println!() and other formatting macros implicitly take references. You can add a block to copy the value instead:

#[repr(packed)]
struct MyStruct {
    other_items_offset: u32,
    foo: u32,
    bar: u8,
    baz: u8,
}

fn example(i: usize, v: &MyStruct) {
    println!("rec {}, {}", i, { v.other_items_offset });
}

(Also, taking a reference isn't unsafe, it’s prohibited entirely.)

1 Like

Well, try this code.

file_data.iter().enumerate().for_each(|(i, v)| {
let offset = v.other_items_offset; // <-- copy, no reference
println!("rec {} {}", i, offset);
});

  • Copy types like u32 and u8 are fine to move out of a packed struct.
  • The problem only happens when you create a reference (&v.field).
  • By writing let offset = v.other_items_offset;, you’re just copying the raw value, not borrowing it.
1 Like

Well, try this code. I think this can be helpful for you.

use zerocopy::{FromZeroes, FromBytes, AsBytes};

#[repr(C)]
#[derive(Debug, FromZeroes, FromBytes, AsBytes)]
struct MyStruct {
other_items_offset: u32,
foo: u32,
baz: u8,
bar: u8,
_pad: [u8; 2], // explicit padding
}

fn main() {
let item = MyStruct {
other_items_offset: 1,
foo: 2,
baz: 3,
bar: 4,
_pad: [0; 2],
};

// Turn into bytes
let bytes: &[u8] = item.as_bytes();
println!("Bytes: {:?}", bytes);

// Back into struct
let restored = MyStruct::read_from(bytes).unwrap();
println!("Restored: {:?}", restored);

}

The result is as follows.
Bytes: [1, 0, 0, 0, 2, 0, 0, 0, 3, 4, 0, 0]
Restored: MyStruct { other_items_offset: 1, foo: 2, baz: 3, bar: 4, _pad: [0, 0] }

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.