How can I reinterpret [u8] to [u64]?

foobar · May 7, 2022, 11:14am

Hi, how can I reinterpret a [u8] to [u64], so I can later for each u64 in this [u64]?

Basically, somewhat like in Go:

junegunn/fzf/blob/master/src/util/chars.go#L26-L44


      
          func checkAscii(bytes []byte) (bool, int) {
          	i := 0
          	for ; i <= len(bytes)-8; i += 8 {
          		if (overflow64 & *(*uint64)(unsafe.Pointer(&bytes[i]))) > 0 {
          			return false, i
          		}
          	}
          	for ; i <= len(bytes)-4; i += 4 {
          		if (overflow32 & *(*uint32)(unsafe.Pointer(&bytes[i]))) > 0 {
          			return false, i
          		}
          	}
          	for ; i < len(bytes); i++ {
          		if bytes[i] >= utf8.RuneSelf {
          			return false, i
          		}
          	}
          	return true, 0
          }

Might be related: Re-interpret slice of bytes (e.g. [u8]) as slice of [f32]

2e71828 · May 7, 2022, 11:29am

bytemuck::try_cast_slice and friends

foobar · May 7, 2022, 11:31am

Is there any standard ways without 3rd party crates?

Cerber-Ursi · May 7, 2022, 11:35am

You can just see how it is implemented, if you want.

2e71828 · May 7, 2022, 11:41am

Without third-party crates, you'd use unsafe code to get a *const u64 pointing to the head of the buffer and then call from_raw_parts to build the resulting slice. You'll need checks for alignment and length, at least.

If you are ok with an iterator instead of a slice as output, you can do something like this (untested):

for dword in buffer.chunks_exact(8).map(|chunk| u64::from_ne_bytes(chunk.try_into().unwrap()) {
   ...
}

starblue · May 7, 2022, 11:46am

Since you do an operation on each of the numbers you might as well play safe and use one of from_be_bytes, from_le_bytes, or (beware the endianness bugs) from_ne_bytes:

H2CO3 · May 7, 2022, 11:52am

There is <[T]>::align_to() as well.

foobar · May 7, 2022, 12:14pm

Hi, all. I figured out the following solution:

P.S. I need a high-performance solution

    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        let bytes = s.as_bytes();
        let ptr = bytes.as_ptr() as *const u64;
        let n = bytes.len() / 8;
        for i in 0..n {
            let v = unsafe { *ptr.add(i) };
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }

0x4443424164636261
0x4847464568676665
0x4c4b4a496c6b6a69

alice · May 7, 2022, 12:17pm

That code is wrong. You have to use read_unaligned.

let v = unsafe { std::ptr::read_unaligned(ptr.add(i)) };

alice · May 7, 2022, 12:21pm

If you don't have a guarantee that the array is aligned, then you should consider using a type like this:

#[derive(Copy, Clone)]
#[repr(transparent)]
struct UnalignedU64 {
    value: [u8; 8],
}

impl UnalignedU64 {
    pub fn new(value: u64) -> Self {
        Self {
            value: u64::to_ne_bytes(value)
        }
    }
    pub fn to_u64(self) -> u64 {
        u64::from_ne_bytes(self.value)
    }
    
    pub fn from_u8_array(array: &[u8]) -> &[UnalignedU64] {
        let len = array.len() / 8;
        let ptr = array.as_ptr() as *const UnalignedU64;
        unsafe {
            std::slice::from_raw_parts(ptr, len)
        }
    }
}

foobar · May 7, 2022, 12:25pm

*ptr.add(i) is get the pointer index at i, and then dereference it. I wonder why it's incorrect?

https://doc.rust-lang.org/reference/types/pointer.html#raw-pointers-const-and-mut

alice · May 7, 2022, 12:29pm

If the pointer is not properly aligned, then you are not allowed to dereference it.

alice · May 7, 2022, 12:32pm

There are several reasons for this:

The compiler will in some situations make optimizations whose math is incorrect if the pointer is not aligned.
Some types of CPUs will crash if you make a read with unsufficient alignment.

If you are not familiar with alignment, then it means that the address must be divisible by the alignment. In the case of u64, the alignment is eight, but 0x4443424164636261 is not divisible by eight, so the pointer is not sufficiently aligned.

foobar · May 7, 2022, 12:32pm

OK, thx. I got it.

github.com/rust-lang/nomicon

Nomicon doesn't mention that dereferencing an unaligned pointer is UB

opened 05:35AM - 01 May 17 UTC

closed 04:02PM - 19 Aug 17 UTC

jimblandy

The Nomicon and the Reference don't say that dereferencing a misaligned pointer …is undefined behavior, but I certainly would expect it to be. Checked with @gankro on IRC and he agreed. To verify, I compiled the following: extern { fn f() -> *const i32; } fn main() { println!("{}", unsafe { *f() + 1 }); } This produces LLVM IR containing the following code for the call: %3 = tail call i32* @f() %4 = load i32, i32* %3, align 4 %5 = add i32 %4, 1 The align 4 in the load instruction indicates that LLVM should assume that the pointer is aligned on a four-byte boundary. The LLVM IR docs for load say, "Overestimating the alignment results in undefined behavior." See also: https://github.com/rust-lang-nursery/reference/issues/49

foobar · May 7, 2022, 12:36pm

Hi, if I replace the wrong code with let v = unsafe { ptr.add(0).read_unaligned() };, it should be correct and working then, am I right?

alice · May 7, 2022, 12:37pm

You probably want .add(i), but otherwise yes.

foobar · May 7, 2022, 12:39pm

Yeah, cool! damn typo! I just need the *const u64 for my use case literally. thx anyway

foobar · May 7, 2022, 12:43pm

FYI, final code:

#[cfg(test)]
mod tests {
    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        let bytes = s.as_bytes();
        let ptr = bytes.as_ptr() as *const u64;
        for i in (0..=bytes.len() - 8).step_by(8) {
            // You have to use unaligned read instead of dereference directly
            // see: https://doc.rust-lang.org/nomicon/what-unsafe-does.html
            let v = unsafe { ptr.add(i >> 3).read_unaligned() };
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }
}

Thank you all, very inspiring!

alice · May 7, 2022, 12:47pm

For the record, I still think your code would be easier to read using my struct from before:

#[cfg(test)]
mod tests {
    use super::UnalignedU64;

    #[test]
    fn unaligned_u8_to_u64_access() {
        // https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
        let s = "abcdABCDefghEFGHijklIJKL1234567";
        
        for v in UnalignedU64::from_u8_array(s.as_bytes()) {
            let v = v.to_u64();
            // NOTE: you have to handle endianness by yourself
            println!("{:#018x}", v);
        }
    }
}

This should compile to something just as fast as something using read_unaligned.

afetisov · May 7, 2022, 1:04pm

There is no point in doing it like this. It's also dangerous since you don't take target endianness into account (where is that slice coming from anyway?).

The proper way to do this in Rust is to explicitly iterate over the chunks of the slice and convert them explicitly. The function from your example would look like this:

pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
    use std::mem;

    const OVERFLOW64: u64 = 0x8080808080808080;
    const OVERFLOW32: u32 = 0x80808080;
    const RUNE_SELF: u8 = 0x80;

    let mut chunk_start: usize = 0;

    while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u64>()) {
        if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
            let chunk_u64 = u64::from_le_bytes(chunk.try_into().expect("array size mismatch"));
            if OVERFLOW64 & chunk_u64 != 0 {
                return Err(chunk_start);
            }
        } else { break; }
        chunk_start += mem::size_of::<u64>();
    }

    while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u32>()) {
        if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
            let chunk_u32 = u32::from_le_bytes(chunk.try_into().expect("array size mismatch"));
            if OVERFLOW32 & chunk_u32 != 0 {
                return Err(chunk_start);
            }
        } else { break; }
        chunk_start += mem::size_of::<u32>();
    }

    while let Some(byte) = bytes.get(chunk_start) {
        if *byte >= RUNE_SELF {
            return Err(chunk_start);
        }

        chunk_start += 1;
    }

    Ok(())
}

This is essentially a direct translation of the Go algorithm. Note that we explicitly handle the possibility of overflow while incrementing chunk_start (even though it is extremely unlikely). Also, <&[u8]>::len returns usize rather than the Go len funciton, which returns int (isize in Rust terminology). This also means that we cannot use the checks via subtraction (they would panic on short slices). For the same reason we return usize rather than isize.

The return type is also Result<(), usize> rather than (bool, usize). This means that the result of calling check_ascii must be used (otherwise a warning is issued). We also cannot confuse the pass and fail cases, and we don't need to return a dummy index 0 in case of a valid string.

We explicitly convert the values from little-endian (u64::from_le_bytes, similarly for u32). Since modern processors are little-endian, this would be essentially a no-op.

Personally, in this specific example I would just write the function using the simple iterator, and expect the compiler to do its autovectorization magic. That's a very simple case which is extremely likely to be compiled efficiently.

pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
    const RUNE_SELF: u8 = 0x80;

    if let Some(pos) = bytes.iter().position(|x| *x >= RUNE_SELF) {
        Err(pos)
    } else { Ok(()) }
}

Although, actually checking it with Godbolt shows that the compiler didn't emit any vectorization or word-wide iteration, even on a recent Intel processor. This is somewhat surprising. Is it a missed optimization? Did I not specify some required compiler flag?

Anyway, if you want to see a really optimized version of a (similar to) above function, check the <[u8]>::is_ascii() function source in the stdlib.

Topic		Replies	Views
Safe way to cast &[u64;8] to &[u8;8]?	6	1144	February 9, 2022
[u8; 8] to two [u8; 4]	19	1750	March 11, 2022
Typecasting [u8] to [u128] help	14	2050	June 10, 2021
How can I convert an u64 into a native byte order &str? help	11	1024	October 23, 2022
Converting String to an array of u8 help	5	7839	August 27, 2022

How can I reinterpret [u8] to [u64]?

Related Topics