I'm reading binary data from some file, and I can see that the data read is correct when I print it, but when I transmute it, it becomes garbage. What am I doing wrong?
fn main() {
let v: Vec<u8> = vec![61,0,0,0,0,0,0,0];
let data: &[u8] = v.as_slice();
let x = unsafe { std::mem::transmute_copy::<&[u8], i64>(&data) } as i64;
println!("{:x?}", data);
println!("{:x?}", x);
println!("{}", x);
assert_eq!(x, 61);
}
You are transmuting a 4-byte vector and whatever follows in memory into an 8-byte integer. You are also neglecting to take into account whether your machine memory ordering is BigEndian at some granularity, or LittleEndian.
I'm sorry, that's my mistake in this example. In the real code it's 8 bytes. Also don't worry about endianness, it's correct. It's little endian, or am I mistaken?
You are transmuting the pointer not the value. And since you are modifying the length, I don't think a transmute understands it.
This code has an alignment issue.
Thanks for the answer. Apart from endianness, are you saying that the answer from @leudz is wrong or can produce invalid results? I took his answer and I'm currently using it.
Why would rust make any assumptions about alignment if I'm providing a raw array of bytes?
As long as it is treated as an array of bytes, you're fine. It is when you start reading multi-byted data from it through pointer-casting that this data must be suitably aligned in RAM.
IIRC, the reason is that there is hardware in the wild where unaligned memory accesses must go through either special instructions or a special codegen path that are much slower than those for aligned reads. By requiring aligned pointers, Rust makes sure that binaries can use faster aligned access instructions on those architectures.
Specifically in the code, there is a from_raw_parts call that creates a &[i64] from a potentially misaligned *const i64 pointer (created by casting the *const u8 pointer).
You're right. There are instructions that only work for aligned data. I worked with emulators before and had to write these. But usually the solution is to check if, e.g., data & 3 != 0 for 4 bytes alignment... not to assume that everything is aligned. That'd be a very expensive assumption. Rust checks boundaries of arrays and vectors for every access by default, which is a little performance hit. By comparison, it's definitely not a performance hit to do that operation before deciding whether an array is aligned.
Both your solutions gave correct results, but your solution is now in the code. I'm not sure I should prefer one over the other or not. Maybe out of paranoia I should just use his solution. Thanks to you both
I would favor from_le_bytes without any use of unsafe. (Now I see it.) Then only change if profiling flags it as bad. @leudz Both do have a code smell by having the same type. As you say from_le is likely the more appropriate but associate functions typically get disfavored when member function exits.
to_le doesn't exist on f32 or f64 (there is from_le_bytes on nightly).
What you can do is use the method above to get a u32/u64 and then use from_bits/from_bits.
Well, one optimizing compiler's nightmare is to lose microbenchmarks wrt hand-written assembly just because the source language semantics do not allow a well-known optimization to be carried out, and a branch for checking if a pointer is aligned could get expensive if it ended up at the bottom of the wrong loop...
I'm not that worried about that kind of optimization... there's I/O in that program, which will be waaaaaaaay slower than any micro-optimizations I do in my code.