Typecasting [u8] to [u128]

amarao · March 12, 2021, 2:23pm

I want to work with a bytearray as a list of native-endian u128. I have a function:

fn foo(data: &mut [u8]){
    for b in data{
        *b = some_u8_funct();
    }
}

Inside that function I want to iterate every byte of data with 'steps' of 16 bytes, modifying them. I want to do it without conversion or any additional cost. (I can do invariant check that data.len() % 16 == 0).

In C I can just cast char* to long long long* (or what C have currently for 128 bits), but it sounds wrong on many level in Rust. What is the best/idiomatic way to do this?

Basically, I want something like that:

fn foo(data: &mut [u8]){
    if data.len() % 16 !=0 {
        painc!("Bad aligment");
    }
    for my_u128  in data.some_magic_function(){
        *my_u128 = some_u128_funct();
    }
}

I've tried to use for b in data as &mut [u128]{}, but Rust said it's not a primitive type for 'as' conversion.

alice · March 12, 2021, 2:34pm

This may or may not be possible since the alignment of the data reference must be large enough. Note that alignment is different from the length.

In the cases where the alignment is large enough for the cast to be valid, the bytemuck crate can do it with bytemuck::cast_slice. However this will fail if the address of the first byte in the array is not divisible by 8.

drewkett · March 12, 2021, 2:47pm

If you do have alignment issues that make bytemuck not possible to use. The next best thing is to use byteorder. This is would have a bit of overhead if the data isn’t aligned right but I think it’s unavoidable in that case.

amarao · March 12, 2021, 2:52pm

I loved it!

That's exactly what I wanted!

fn foo(data: &mut [u8]) {
    let u128_slice: &mut [u128] = bytemuck::cast_slice_mut(data);
    for b in u128_slice{
        *b = 0x112233445566778899AABBCCDDEEFF;
    }
}
fn main(){
   let mut data:Vec<u8> = vec![0;32];
   foo(& mut data);
   for b in data {
       println!("{}", b);
   }
}

2e71828 · March 12, 2021, 2:54pm

Using one of the pre-existing solutions is always best when using unsafe code. If you wanted to do it yourself, it might look something like this:

use std::marker::PhantomData;
pub struct UnalignedU128<'a>(*mut u128, PhantomData<&'a mut u128>);

impl<'a> UnalignedU128<'a> {
    pub fn read(&self)->u128 {
        // Safety: pointer validity is guaranteed by the lifetime 'a
        unsafe { std::ptr::read_unaligned(self.0) }
    }
    
    pub fn write(&mut self, data: u128) {
        // Safety: pointer validity is guaranteed by the lifetime 'a
        unsafe { std::ptr::write_unaligned(self.0, data) }
    }
}

pub fn iter_as_u128<'a>(bytes: &'a mut [u8]) -> impl Iterator<Item = UnalignedU128<'a>> {
    let base: *mut u128 = bytes.as_mut_ptr() as *mut u128;
    let len = bytes.len() / std::mem::size_of::<u128>();
    
    (0..len).map(move |idx|
        // Safety: offset is valid due to the calculation of len above
        UnalignedU128(unsafe { base.offset(idx as isize) }, PhantomData)
    )
}

fn foo(data: &mut [u8]) {
    for mut b in iter_as_u128(data) {
        b.write(0x112233445566778899AABBCCDDEEFF);
    }
}

2e71828 · March 12, 2021, 2:59pm

I don't believe that the Vec's allocation is guaranteed to be properly aligned here. You can see the error by passing a different slice from the vector:

fn foo(data: &mut [u8]) {
    let u128_slice: &mut [u128] = bytemuck::cast_slice_mut(data);
    for b in u128_slice{
        *b = 0x112233445566778899AABBCCDDEEFF;
    }
}
fn main(){
   let mut data:Vec<u8> = vec![0;35];
   foo(& mut data[1..]);
   for b in data {
       println!("{}", b);
   }
}

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.15s
     Running `target/debug/playground`
thread 'main' panicked at 'cast_slice_mut>TargetAlignmentGreaterAndInputNotAligned', /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/bytemuck-1.5.1/src/lib.rs:106:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

amarao · March 12, 2021, 3:09pm

It's actually looks like a platform-independent overkill. As far as I understand, Intel allows unaligned read/writes (but hurts your performance for doing that).

Cerber-Ursi · March 12, 2021, 3:19pm

And that's why they must not be implicit, I think.

cole-miller · March 12, 2021, 4:25pm

Creating an unaligned reference or dereferencing an unaligned raw pointer is UB at the language level even if the ISA would allow it, though.

2e71828 · March 12, 2021, 4:44pm

I sometimes wonder if a memory-page aligned variant of Vec would be useful. It might be able to help with efficient manipulation of large datasets. I've never run into issues along those lines, though, so I haven't looked into whether it would actually help.

mbrubeck · March 12, 2021, 5:56pm

If you know which platform(s) and allocator(s) your code will be running on, you can rely on properties of the allocator. For example, on platforms where the default allocator uses malloc, heap allocations are guaranteed to be suitably aligned for any built-in C type.

(However, if you are writing a library for use by third parties, or a binary that uses the system allocator and may run on arbitrary platforms, then you can't assume such things.)

SkiFire13 · March 12, 2021, 6:09pm

For a completly safe (and panic free) version:

pub fn foo(data: &mut [u8]) {
    for b in data.chunks_exact_mut(std::mem::size_of::<u128>()) {
        b.copy_from_slice(&0x112233445566778899AABBCCDDEEFFu128.to_ne_bytes())
    }
}

It doesn't look like it generates more assembly (I switched the panic to std::process::exit to reduce noise in the generated assembly of the bytemuck version since that shouldn't affect the happy path) Compiler Explorer

scottmcm · March 12, 2021, 6:25pm

Nightly has some const-generics-using APIs that can do this elegantly, too:

#![feature(slice_as_chunks)]
pub fn foo(data: &mut [u8]) {
    let (chunks, remainder) = data.as_chunks_mut();
    assert_eq!(remainder, &[]);
    for b in chunks {
        *b = 0x112233445566778899AABBCCDDEEFFu128.to_ne_bytes();
    }
}

(I added an assert on the remainder being empty because I wasn't sure what was supposed to happen if the input wasn't a multiple of 16.)

H2CO3 · March 12, 2021, 6:46pm

FYI, you aren't allowed to do that in C either, unless the original pointer does indeed point to an allocated object of type long long long.

system · June 10, 2021, 6:46pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Transmutting a u32 array to a u8 array help	20	6717	January 12, 2023
Safe way to cast &[u64;8] to &[u8;8]?	6	1151	February 9, 2022
How to cast *mut u8 as [u8]? help	5	1062	June 11, 2023
Efficient parsing of byte array help	8	2286	January 12, 2023
Types convert problem help	6	478	August 13, 2020

Typecasting [u8] to [u128]

Related Topics