How can I reinterpret [u8] to [u64]?

Hi, how can I reinterpret a [u8] to [u64], so I can later for each u64 in this [u64]?

Basically, somewhat like in Go:

Might be related: Re-interpret slice of bytes (e.g. [u8]) as slice of [f32]

1 Like

bytemuck::try_cast_slice and friends

2 Likes

Is there any standard ways without 3rd party crates?

You can just see how it is implemented, if you want.

4 Likes

Without third-party crates, you'd use unsafe code to get a *const u64 pointing to the head of the buffer and then call from_raw_parts to build the resulting slice. You'll need checks for alignment and length, at least.

If you are ok with an iterator instead of a slice as output, you can do something like this (untested):

for dword in buffer.chunks_exact(8).map(|chunk| u64::from_ne_bytes(chunk.try_into().unwrap()) {
   ...
}
7 Likes

Since you do an operation on each of the numbers you might as well play safe and use one of from_be_bytes, from_le_bytes, or (beware the endianness bugs) from_ne_bytes:

4 Likes

There is <[T]>::align_to() as well.

8 Likes

Hi, all. I figured out the following solution:

P.S. I need a high-performance solution

    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        let bytes = s.as_bytes();
        let ptr = bytes.as_ptr() as *const u64;
        let n = bytes.len() / 8;
        for i in 0..n {
            let v = unsafe { *ptr.add(i) };
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }
0x4443424164636261
0x4847464568676665
0x4c4b4a496c6b6a69

That code is wrong. You have to use read_unaligned.

let v = unsafe { std::ptr::read_unaligned(ptr.add(i)) };
13 Likes

If you don't have a guarantee that the array is aligned, then you should consider using a type like this:

#[derive(Copy, Clone)]
#[repr(transparent)]
struct UnalignedU64 {
    value: [u8; 8],
}

impl UnalignedU64 {
    pub fn new(value: u64) -> Self {
        Self {
            value: u64::to_ne_bytes(value)
        }
    }
    pub fn to_u64(self) -> u64 {
        u64::from_ne_bytes(self.value)
    }
    
    pub fn from_u8_array(array: &[u8]) -> &[UnalignedU64] {
        let len = array.len() / 8;
        let ptr = array.as_ptr() as *const UnalignedU64;
        unsafe {
            std::slice::from_raw_parts(ptr, len)
        }
    }
}
4 Likes

*ptr.add(i) is get the pointer index at i, and then dereference it. I wonder why it's incorrect?

https://doc.rust-lang.org/reference/types/pointer.html#raw-pointers-const-and-mut

If the pointer is not properly aligned, then you are not allowed to dereference it.

3 Likes

There are several reasons for this:

  1. The compiler will in some situations make optimizations whose math is incorrect if the pointer is not aligned.
  2. Some types of CPUs will crash if you make a read with unsufficient alignment.

If you are not familiar with alignment, then it means that the address must be divisible by the alignment. In the case of u64, the alignment is eight, but 0x4443424164636261 is not divisible by eight, so the pointer is not sufficiently aligned.

6 Likes

OK, thx. I got it.

1 Like

Hi, if I replace the wrong code with let v = unsafe { ptr.add(0).read_unaligned() };, it should be correct and working then, am I right?

You probably want .add(i), but otherwise yes.

1 Like

Yeah, cool! damn typo! I just need the *const u64 for my use case literally. thx anyway

1 Like

FYI, final code:

#[cfg(test)]
mod tests {
    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        let bytes = s.as_bytes();
        let ptr = bytes.as_ptr() as *const u64;
        for i in (0..=bytes.len() - 8).step_by(8) {
            // You have to use unaligned read instead of dereference directly
            // see: https://doc.rust-lang.org/nomicon/what-unsafe-does.html
            let v = unsafe { ptr.add(i >> 3).read_unaligned() };
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }
}

Thank you all, very inspiring!

1 Like

For the record, I still think your code would be easier to read using my struct from before:

#[cfg(test)]
mod tests {
    use super::UnalignedU64;

    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        
        for v in UnalignedU64::from_u8_array(s.as_bytes()) {
            let v = v.to_u64();
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }
}

This should compile to something just as fast as something using read_unaligned.

1 Like

There is no point in doing it like this. It's also dangerous since you don't take target endianness into account (where is that slice coming from anyway?).

The proper way to do this in Rust is to explicitly iterate over the chunks of the slice and convert them explicitly. The function from your example would look like this:

pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
    use std::mem;

    const OVERFLOW64: u64 = 0x8080808080808080;
    const OVERFLOW32: u32 = 0x80808080;
    const RUNE_SELF: u8 = 0x80;

    let mut chunk_start: usize = 0;

    while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u64>()) {
        if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
            let chunk_u64 = u64::from_le_bytes(chunk.try_into().expect("array size mismatch"));
            if OVERFLOW64 & chunk_u64 != 0 {
                return Err(chunk_start);
            }
        } else { break; }
        chunk_start += mem::size_of::<u64>();
    }

    while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u32>()) {
        if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
            let chunk_u32 = u32::from_le_bytes(chunk.try_into().expect("array size mismatch"));
            if OVERFLOW32 & chunk_u32 != 0 {
                return Err(chunk_start);
            }
        } else { break; }
        chunk_start += mem::size_of::<u32>();
    }

    while let Some(byte) = bytes.get(chunk_start) {
        if *byte >= RUNE_SELF {
            return Err(chunk_start);
        }

        chunk_start += 1;
    }

    Ok(())
}

This is essentially a direct translation of the Go algorithm. Note that we explicitly handle the possibility of overflow while incrementing chunk_start (even though it is extremely unlikely). Also, <&[u8]>::len returns usize rather than the Go len funciton, which returns int (isize in Rust terminology). This also means that we cannot use the checks via subtraction (they would panic on short slices). For the same reason we return usize rather than isize.

The return type is also Result<(), usize> rather than (bool, usize). This means that the result of calling check_ascii must be used (otherwise a warning is issued). We also cannot confuse the pass and fail cases, and we don't need to return a dummy index 0 in case of a valid string.

We explicitly convert the values from little-endian (u64::from_le_bytes, similarly for u32). Since modern processors are little-endian, this would be essentially a no-op.

Personally, in this specific example I would just write the function using the simple iterator, and expect the compiler to do its autovectorization magic. That's a very simple case which is extremely likely to be compiled efficiently.

pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
    const RUNE_SELF: u8 = 0x80;

    if let Some(pos) = bytes.iter().position(|x| *x >= RUNE_SELF) {
        Err(pos)
    } else { Ok(()) }
}

Although, actually checking it with Godbolt shows that the compiler didn't emit any vectorization or word-wide iteration, even on a recent Intel processor. This is somewhat surprising. Is it a missed optimization? Did I not specify some required compiler flag?

Anyway, if you want to see a really optimized version of a (similar to) above function, check the <[u8]>::is_ascii() function source in the stdlib.

4 Likes