Zero-cost serialization of integer slice and re-alignment during deserialization

I'm serializing a slice of integers like this:

fn serialize(value: &[i32]) -> &[u8] {
    let size = value.len() * 4;
    unsafe {
        slice::from_raw_parts(value as *const [i32] as *const u8, size)
    }
}

Apart from the multiplication with 4, it should be zero-cost.

Now the bytes get transferred or stored and reloaded from somewhere, and I have an unaligned slice of u8's that I want to deserialize. I can't use the same trick because the slice of bytes might be misaligned in memory.

Thus, I came up with the following function:

fn deserialize(bytes: &[u8]) -> Vec<i32> {
    let len = bytes.len() / 4;
    unsafe {
        let chunks: &[[u8; 4]] = slice::from_raw_parts(
            bytes as *const [u8] as *const [u8; 4],
            len,
        );
        let mut vec: Vec<i32> = Vec::with_capacity(len);
        for i in 0..len {
            vec.push(transmute_copy::<[u8; 4], i32>(&chunks[i]));
        }
        vec
    }
}

But this feels overly complex and non-efficient. Is there any better way to achieve this re-alignment?

Example below:

use std::mem::transmute_copy;
use std::slice;

fn serialize(value: &[i32]) -> &[u8] {
    /* … */
}

fn deserialize(bytes: &[u8]) -> Vec<i32> {
    /* … */
}

fn main() {
    let vec = vec![15, 11, -2, 300];
    println!("vec = {vec:?}");
    let bytes = serialize(&vec);
    println!("bytes = {bytes:?}");
    let copied_bytes = Vec::from(bytes);
    let restored = deserialize(&copied_bytes);
    println!("restored = {restored:?}");
}

(Playground)

Output:

vec = [15, 11, -2, 300]
bytes = [15, 0, 0, 0, 11, 0, 0, 0, 254, 255, 255, 255, 44, 1, 0, 0]
restored = [15, 11, -2, 300]

You could use ptr::read_unaligned in loop to read your integers from the bytes slice to vector. But can't you simply read raw data into a properly aligned buffer?

Do not forget about potential endianness issues, i.e. if serialization will be done on a big endian machine and deserialization on little endian, the final result will be wrong. Unless (de)serialization performance is super critical, I recommend using the byterorder crate.

The loop is what I would like to avoid, and instead just copy a memory region in an optimized way.

Edit: But maybe read_unaligned would let me avoid creating the chunks slice at least.

Yes, I think that's what I want to do. But how do I do it?

Yes, I'm aware of those. Endianess is not a problem in my use-case.

Since you're already using unsafe you can just allocate a Vec with the right capacity and then memcpy (i.e. copy_nonoverlapping) the bytes into it. Rust Playground

Alternatively you can use bytemuck to avoid any unsafe Rust Playground
Looking at the assembly you'll see that the loop is mostly optimized into a memcpy call.

2 Likes

Yeah, thanks, I was looking for something like memcpy.

So these were the functions/methods which I was missing:

And from your example, I also learned pointer.cast (or mutpointer.cast), which is probably better than using as.

Also, you are right I need an assertion on bytes.len() % 4 == 0 if the overall function is not unsafe.

I will probably use copy_nonoverlapping then. Thanks a lot!

Another consideration here is re-interpreting a &[u8] as a &[u32]. That is legal and fine to do so long as your &[u8] is aligned to at least a 4-byte boundary. So for example, something like this should work:

use core::mem::{align_of, size_of};

fn serialize(raw: &[u32]) -> &[u8] {
    unsafe {
        core::slice::from_raw_parts(raw.as_ptr().cast(), raw.len() * size_of::<u32>())
    }
}

fn deserialize(raw: &[u8]) -> &[u32] {
    assert_eq!(0, (raw.as_ptr() as usize) % align_of::<u32>());
    unsafe {
        core::slice::from_raw_parts(raw.as_ptr().cast(), raw.len() / size_of::<u32>())
    }
}

fn main() {
    let nums = &[899999, 23432, 325893205];
    assert_eq!(nums, deserialize(serialize(nums)));
}

Now, if you can't require your caller to deal with alignment like this, then this approach is out the window. At that point, you can do what you're doing (copy into a buffer with copy_nonoverlapping), or, if you control the wire format, you can use padding bytes to get your desired alignment.

1 Like

I can't require my caller to deal with alignment. It might be correct in most of the cases, but there's no guarantee (reading stored data from LMDB). But I could check if alignment is right and then return a Cow::Borrowed. But I doubt introducing Cow would make things better. I will just go with @SkiFire13's first solution.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.