I need to load typically not very large files consisting of arrays of u32 numbers in platform-native endianness from disk into memory. Surprisingly, I can't find an efficient/idiomatic way to do this within the current Std design.
The most trivial and straightforward approach is to read the entire file into a Vec<u8>, then split the buffer into 4-byte chunks and reassemble them into words in a loop. However, this approach is not very efficient, since it requires allocating the same memory twice. Additionally, manually iterating over byte chunks may also introduce performance overhead.
Directly transmuting a &[u8] buffer into a &[u32] may lead to UB due to alignment issues.
An alternative approach is to allocate a Vec<u32>, reinterpret its internal buffer as &mut [u8], and attempt to load the file contents in a loop using Read::read (while ensuring the total length is divisible by 4).
Unfortunately, this approach has its own challenges:
I'm essentially reimplementing Read::read_to_end, but for u32 buffers. This becomes more complicated if the buffer needs to grow.
I could try to preallocate based on file Metadata, but I'm not sure whether OS metadata can always be trusted (e.g., the file might be a socket stream or another source without meaningful metadata).
Initializing such a buffer is also an open question. Passing Vec::spare_capacity_mut to Read::read looks fine at first glance, but formally I'm giving read/write access to uninitialized memory to an abstract 3rd-party function, which would be UB. That said, I'm not entirely sure about this, since u8/u32 slices from a Vec allocation are POD types behind a Unique pointer.
You can read your data into Vec<u8> and then transform it into Vec<u32> using Vec::from/into_raw_parts with appropriate checks and with copying fallback in the case of insufficient aligment. In practice, allocators are likely to give you a 4-byte aligned memory even for Vec<u8>, so the fallback path should be rarely (if ever) exercised.
I like your idea, and I think it probably should work in practice. But my concern is that the original allocation layout is part of the allocation key too (an argument of the alloc and dealloc functions). By freeing Vec<u32> memory that was originally allocated as Vec<u8>, we are asking the allocator to free the same pointer but with a different layout metadata. That could potentially corrupt the allocator's internals. At least this seems risky when using custom global allocators.
How much does efficiency matter in this part of your application?
If you know the size in advance and are okay with various failure modes, using bytemuck or zerocopy to convert a &mut [u32] to &mut [u8] is the way to go. But Rust is going to make you think about everything including your file having an odd number of bytes.
Hm, you are right. It should work fine in practice (especially if allocation size is guarantied to be multiple of the alignment), but it technically breaks the safety contract of the dealloc method. In that case you could use &[u8] -> &[u32] casting instead as you tried previously, but with copying fallback for the unaligned case.
Have you tried the simple way? Just read 4 bytes at time, convert them and push to a vector? If the stream is buffered it should not be that much worse than reading to the end in one pass.
Here are some ways to load u32s directly to a Vec: playground.
They look rather ghastly to be honest, mostly due to the awkward error handling required, but I don't think they're particularly inefficient relative to the slowness of I/O in the first place. Real code should, of course, wrap stuff in a BufReader appropriately, and create the vectors using with_capacity(). Rather than reading only four bytes at a time, a larger stack buffer could of course be used – or use BufReader's own buffer via the low-level methods it provides.
Thank you! The most reasonable solution I've come up with so far is close to what @newpavlov suggested above. It doesn't read from IO directly, the user still has to load file content into a vector externally. But the function provides a fairly simple way to "transmute" the content of the vector into a slice of u32 words without allocation, with a relatively cheap memcpy fallback when needed.
use std::fs::File;
use std::io::Result;
use memmap2::MmapOptions;
use std::vec::Vec;
fn read_file(f: File) -> Result<Vec<u32>> {
let mmap = unsafe { MmapOptions::new().map(&f)? };
assert_eq!(mmap.len() % 4, 0);
let len = mmap.len() / 4;
let mut v = Vec::<u32>::with_capacity(len);
v.extend_from_slice(unsafe { std::slice::from_raw_parts(mmap.as_ptr() as *const u32, len) });
Ok(v)
}
Note that deallocating a Vec with different alignment is UB. It may be possible to have a temporary wrongly aligned vec, that gets from_raw_parts'ed back before deallocation, but that still forbids growing the Vec (it would drop the previous one). For some reason, miri always gave me unaligned vecs playground, so I cannot verify.
afaik the only way to go is OP's "alternative approach":
initialize a Vec<u32>, hopefully from metadata
cast slice to &mut [u8] (note that Read doesn't like &mut [MaybeUninit<u8>])
copy into that slice
So here's the slightly-less-efficient, no-unsafe version, adapted from implementing read-to-end:
use bytemuck::cast_slice_mut; // 1.25.0
use std::io::{Result, BufRead, Cursor};
fn read_to_end(reader: &mut impl BufRead, dest_vec: &mut Vec<u32>) -> Result<usize> {
let initial_vec_len = dest_vec.len();
loop {
let src_buf = reader.fill_buf()?;
if src_buf.is_empty() {
break;
}
let prev_len = dest_vec.len();
let new_len = prev_len+src_buf.len()/4;
// initializing the memory, so sad
dest_vec.resize(new_len,0);
// otherwise we can't copy_from_slice and I don't like casting maybe-unaligned
cast_slice_mut(&mut dest_vec[prev_len..new_len]).copy_from_slice(src_buf);
// Any irreversible side effects should happen after `try_reserve` succeeds,
// to avoid losing data on allocation error.
let read = src_buf.len();
reader.consume(read);
}
Ok(dest_vec.len() - initial_vec_len)
}
fn main() {
let mut reader = Cursor::new([42u8;42*4]);
let mut dest_vec = Vec::new();
read_to_end(&mut reader, &mut dest_vec);
println!("{dest_vec:?}");
}
Thank you for your reply. I also like @mroth's trick with flipping. It utilizes Std's internals automatically and provides idiomatic solution. @newpavlov's comment gave me some useful insight. Of course, directly transforming owned memory would be UB, but I had overlooked the fact that the allocator almost always allocates already aligned data. This opens up the possibility of a fast-path optimization. In practice, we can read the stream to end into a normal Vec<u8> (which assumed to be fast), reinterpret it's slice as &[u32] without UB, and call it a day. See my implementation above.
You're not missing anything. Currently the standard library lacks both the ability to cast Vec types and read efficiently (without pre-zeroing memory) into anything other than Vec<u8>.
You should start by constructing Vec<u32>, because otherwise you won't get the right alignment.
vec![0u32; len] is a special case that may give you cheaper zero pages from the allocator. Then you can cast it (unsafely, or use bytemuck to delegate it) and read_exact into its byte view.
Alternatively, use try_reserve()? and spare_capacity_mut() to get properly aligned uninitialized bytes. Then cast, and use something else than io::Read (it's UB to give it uninit bytes), like direct OS calls. Then use set_len() on the portion you're sure has been written to (len is in u32 units not bytes which makes partial reads awkward).