How best to convert `&[u8]` to `&[u16]`

I am writing a program that parses a binary file and I have a couple of questions about how best to convert a portion of the data read into a sequence of u16 values. I have read the file into memory with:

let contents = fs::read(filename)?;

fs::read() returns a Vec<u8> (wrapped in a Result<>).

Does fs::read() or Vec<T> provide any guarantees about the alignment of the data it returns (other than the useless guarantee that it will be aligned to a u8 boundary, which is no guarantee at all)?

Assuming that I know that I have parsed through the buffer an even number of bytes, what is the best way to convert a portion of that buffer to a sequence of u16 values? One solution I have found is the safe_transmute crate:

use safe_transmute::{guard::AllOrNothingGuard, transmute_many};
let values = transmute_many::<u16, AllOrNothingGuard>(&contents[offset1..offset2]).unwrap();

This seems to me to be a perfectly adequate solution, but I would like to learn if this is the best practice for doing this sort of thing. I am a little concerned about 0.11.1 version of this crate... I see that it was last updated 2 months ago, so perhaps it is not very well developed or stable. I also see on crates.io: safe_transmute_2 (v0.1.1, last updated 3 months ago), totally-safe-transmute (v0.0.3, last updated 1 month ago), dataview (v0.1.1, last updated 1 year ago), etc...

It also seems to me that shouldn't have to use an external crate to tell the compiler that I simply want to reinterpret a sequence of u8 values, especially if I already know that it is perfectly legal to do so(*).

(*) Strictly speaking, I don't know that it's perfectly legal to do so, hence my first question about alignment. But, if there are no guarantees about the alignment, then safe-transmute and its friends won't be able to help me either. And I can should allocate a new vector and manually construct the sequence of u16 values by shifting and adding the u8 values.

I've probably written too much now. So I'll stop and see what I can learn from the community.

Thanks for reading this far.

--wpd

Thanks.

AFAIK a Vec<u8> can have any alignment (to u8 boundary) and this means that your call to transmute_many might actually fail (if the input slice isn’t aligned to u16). Other crates that allow safe transmutations of this kind include the bytemuck crate. I’ve seen some discussion on “safe transmutations” with compiler support and/or standard library support, so these may come to Rust one day.

4 Likes

In this case where you have a Vec<u8>, there are no guarantees about the alignment.

I'm not sure how safe_transmute would work in this case, but I don't think this is the right solution: instead, since a File implements Read, you can put its content in a Vec<u16> buffer yourself: playground

1 Like

You have to be very careful about alignment here. It is much safer to allocate the Vec<u16> and cast the &mut [u16] to &mut [u8] than to allocate an Vec<u8> and try to convert it to an Vec<u16>. You can do it like this:

use std::path::Path;
use std::fs::File;
use std::io::{self, Read};
use std::convert::TryFrom;

pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u16>> {
    let mut file = File::open(path)?;
    
    let len = file.metadata()?.len();
    let len = if len % 2 == 0 {
        usize::try_from(len / 2)
            .map_err(|_| io::Error::new(io::ErrorKind::Other, "File is too large"))?
    } else {
        return Err(io::Error::new(io::ErrorKind::Other, "Length is odd"));
    };
    
    let mut vec = vec![0u16; len];
    
    let slice: &mut [u8] = to_u8_slice(&mut vec);
    
    file.read_exact(slice)?;
    Ok(vec)
}

fn to_u8_slice(slice: &mut [u16]) -> &mut [u8] {
    let byte_len = 2*slice.len();
    unsafe {
        std::slice::from_raw_parts_mut(
            slice.as_mut_ptr().cast::<u8>(),
            byte_len
        )
    }
}

playground

5 Likes

Regarding the question of how to ensure that a file is read to a properly aligned Vec<u8> (so that it can be re-interpreted as a bunch of u16s), some straightforward solutions might be

  • looking for an existing crate that does something like this; or
  • using an extra intermediate buffer; i.e. either copyping the section you’re interested in out of the Vec<u8> or using the read method on an open File directly together with a fixed-sized intermediate u8 array for writing into a Vec<u16>
    • the “copyping the section you’re interested in” approach might also be a “copy the section you’re interested in _only if it isn’t already properly aligned” approach. It might turn out that Rust’s current allocated does usually align stuff nicely so this might be low-overhead in practice while keeping the alternative code path in there for soundness.

Be aware that even if the allocation of the Vec<u8> is properly aligned, this only makes it safe to cast the allocation's &[u8] to &[u16]. You still can't cast the actual vector to Vec<u16> because Rust enforces that deallocation must specify the exact same alignment as when it was allocated.

3 Likes

Thank you so much for the replies. @steffahn it seems that bytemuck does the same thing as the other crates - it would attempt to perform the cast, and return an error (or panic) if the alignment isn't correct.

@arnaudgolfouse and @alice you put together examples amazingly quickly! Yes, I could allocate a buffer with the proper alignment and then cast it back to a u8 when reading from the file. I already know that the file size (by design) is a multiple of 4Kbytes, (and has lots of different data structures within it, not just u16 arrays), so I don't need to worry about a partial read.

You folks are awesome! Thank you so much.

And, yes, I could use some other crate to do this, but the main point of this exercise is to get me more familiar with Rust and the Rust way of doing things (both safe things and unsafe things).

It seems very likely that Vec<u8> will start off with the proper alignment (the same way that malloc() does for C programmers, and memalign() does for paranoid C programmers). I was mainly curious to learn if there were any guarantees of that alignment. (A question that popped into my head over the course of my exercise). So I asked, and got some amazingly detailed answers.

You folks are awesome!

--wpd

Many have weighed-in, so likely you’re covered. Notwithstanding, a while back I had a related question. I believe it was @alice in particular I recall being spot-on: align the data ahead of time for the alignment that is the most difficult to ensure; in this case u16, then cast the memory to u8. Any time you need the u16 guarantee, you should be good to go.

1 Like

Yes indeed, and there isn't really anything else that a crate yould do soundly. I just mentioned it because it was a crate that I knew of of the top of my head.