Hi, how can I reinterpret a [u8]
to [u64]
, so I can later for each u64
in this [u64]
?
Basically, somewhat like in Go:
Might be related: Re-interpret slice of bytes (e.g. [u8]) as slice of [f32]
Hi, how can I reinterpret a [u8]
to [u64]
, so I can later for each u64
in this [u64]
?
Basically, somewhat like in Go:
Might be related: Re-interpret slice of bytes (e.g. [u8]) as slice of [f32]
Is there any standard ways without 3rd party crates?
Without third-party crates, you'd use unsafe
code to get a *const u64
pointing to the head of the buffer and then call from_raw_parts
to build the resulting slice. You'll need checks for alignment and length, at least.
If you are ok with an iterator instead of a slice as output, you can do something like this (untested):
for dword in buffer.chunks_exact(8).map(|chunk| u64::from_ne_bytes(chunk.try_into().unwrap()) {
...
}
Since you do an operation on each of the numbers you might as well play safe and use one of from_be_bytes, from_le_bytes, or (beware the endianness bugs) from_ne_bytes:
Hi, all. I figured out the following solution:
P.S. I need a high-performance solution
#[test]
fn unaligned_u8_to_u64_access() {
// https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
let s = "abcdABCDefghEFGHijklIJKL1234567";
let bytes = s.as_bytes();
let ptr = bytes.as_ptr() as *const u64;
let n = bytes.len() / 8;
for i in 0..n {
let v = unsafe { *ptr.add(i) };
// NOTE: you have to handle endianness by yourself
println!("{:#018x}", v);
}
}
0x4443424164636261
0x4847464568676665
0x4c4b4a496c6b6a69
That code is wrong. You have to use read_unaligned
.
let v = unsafe { std::ptr::read_unaligned(ptr.add(i)) };
If you don't have a guarantee that the array is aligned, then you should consider using a type like this:
#[derive(Copy, Clone)]
#[repr(transparent)]
struct UnalignedU64 {
value: [u8; 8],
}
impl UnalignedU64 {
pub fn new(value: u64) -> Self {
Self {
value: u64::to_ne_bytes(value)
}
}
pub fn to_u64(self) -> u64 {
u64::from_ne_bytes(self.value)
}
pub fn from_u8_array(array: &[u8]) -> &[UnalignedU64] {
let len = array.len() / 8;
let ptr = array.as_ptr() as *const UnalignedU64;
unsafe {
std::slice::from_raw_parts(ptr, len)
}
}
}
*ptr.add(i)
is get the pointer index at i
, and then dereference it. I wonder why it's incorrect?
https://doc.rust-lang.org/reference/types/pointer.html#raw-pointers-const-and-mut
If the pointer is not properly aligned, then you are not allowed to dereference it.
There are several reasons for this:
If you are not familiar with alignment, then it means that the address must be divisible by the alignment. In the case of u64
, the alignment is eight, but 0x4443424164636261
is not divisible by eight, so the pointer is not sufficiently aligned.
OK, thx. I got it.
Hi, if I replace the wrong code with let v = unsafe { ptr.add(0).read_unaligned() };
, it should be correct and working then, am I right?
You probably want .add(i)
, but otherwise yes.
Yeah, cool! damn typo! I just need the *const u64
for my use case literally. thx anyway
FYI, final code:
#[cfg(test)]
mod tests {
#[test]
fn unaligned_u8_to_u64_access() {
// https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
let s = "abcdABCDefghEFGHijklIJKL1234567";
let bytes = s.as_bytes();
let ptr = bytes.as_ptr() as *const u64;
for i in (0..=bytes.len() - 8).step_by(8) {
// You have to use unaligned read instead of dereference directly
// see: https://doc.rust-lang.org/nomicon/what-unsafe-does.html
let v = unsafe { ptr.add(i >> 3).read_unaligned() };
// NOTE: you have to handle endianness by yourself
println!("{:#018x}", v);
}
}
}
Thank you all, very inspiring!
For the record, I still think your code would be easier to read using my struct from before:
#[cfg(test)]
mod tests {
use super::UnalignedU64;
#[test]
fn unaligned_u8_to_u64_access() {
// https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html#examples
let s = "abcdABCDefghEFGHijklIJKL1234567";
for v in UnalignedU64::from_u8_array(s.as_bytes()) {
let v = v.to_u64();
// NOTE: you have to handle endianness by yourself
println!("{:#018x}", v);
}
}
}
This should compile to something just as fast as something using read_unaligned
.
There is no point in doing it like this. It's also dangerous since you don't take target endianness into account (where is that slice coming from anyway?).
The proper way to do this in Rust is to explicitly iterate over the chunks of the slice and convert them explicitly. The function from your example would look like this:
pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
use std::mem;
const OVERFLOW64: u64 = 0x8080808080808080;
const OVERFLOW32: u32 = 0x80808080;
const RUNE_SELF: u8 = 0x80;
let mut chunk_start: usize = 0;
while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u64>()) {
if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
let chunk_u64 = u64::from_le_bytes(chunk.try_into().expect("array size mismatch"));
if OVERFLOW64 & chunk_u64 != 0 {
return Err(chunk_start);
}
} else { break; }
chunk_start += mem::size_of::<u64>();
}
while let Some(chunk_end) = chunk_start.checked_add(mem::size_of::<u32>()) {
if let Some(chunk) = bytes.get(chunk_start..chunk_end) {
let chunk_u32 = u32::from_le_bytes(chunk.try_into().expect("array size mismatch"));
if OVERFLOW32 & chunk_u32 != 0 {
return Err(chunk_start);
}
} else { break; }
chunk_start += mem::size_of::<u32>();
}
while let Some(byte) = bytes.get(chunk_start) {
if *byte >= RUNE_SELF {
return Err(chunk_start);
}
chunk_start += 1;
}
Ok(())
}
This is essentially a direct translation of the Go algorithm. Note that we explicitly handle the possibility of overflow while incrementing chunk_start
(even though it is extremely unlikely). Also, <&[u8]>::len
returns usize
rather than the Go len
funciton, which returns int
(isize
in Rust terminology). This also means that we cannot use the checks via subtraction (they would panic on short slices). For the same reason we return usize
rather than isize
.
The return type is also Result<(), usize>
rather than (bool, usize)
. This means that the result of calling check_ascii
must be used (otherwise a warning is issued). We also cannot confuse the pass and fail cases, and we don't need to return a dummy index 0 in case of a valid string.
We explicitly convert the values from little-endian (u64::from_le_bytes
, similarly for u32
). Since modern processors are little-endian, this would be essentially a no-op.
Personally, in this specific example I would just write the function using the simple iterator, and expect the compiler to do its autovectorization magic. That's a very simple case which is extremely likely to be compiled efficiently.
pub fn check_ascii(bytes: &[u8]) -> Result<(), usize> {
const RUNE_SELF: u8 = 0x80;
if let Some(pos) = bytes.iter().position(|x| *x >= RUNE_SELF) {
Err(pos)
} else { Ok(()) }
}
Although, actually checking it with Godbolt shows that the compiler didn't emit any vectorization or word-wide iteration, even on a recent Intel processor. This is somewhat surprising. Is it a missed optimization? Did I not specify some required compiler flag?
Anyway, if you want to see a really optimized version of a (similar to) above function, check the <[u8]>::is_ascii()
function source in the stdlib.