How to convert `&[u8]` to `u32`?

For learning purposes, I am trying to convert &[u8] into u32.

I read the compiler's errors applied the suggestions, but could not get it to convert.

Here is one that runs fine, and the commented out fails, I can not understand why.

use num_traits::FromBytes; // 0.2.19

// fn convert<T: num_traits::FromBytes>(data:&[u8]){
//     let value:T = FromBytes::from_le_bytes(data);
//    println!("{value}")
// }

fn main() {
    let a = &[1,0,0,0];
    let value:u32 = FromBytes::from_le_bytes(a);
    println!("{value}");
    // let _ = convert::<u32>(a);
}

Playground

1 Like

The one that works doesn't deal with slices, it works with &[u8; 4] — a fixed size array that's guaranteed to be the same size as u32.

While &[u8] can be arbitrary sized which means you need some kind of failure reporting.

You can achieve that error reporting with TryInto from &[u8] to &[u8; 4].

4 Likes

Yes, I agree with the first part, but I've tried the second approach you suggest as well, without luck.

Oh, I think I was converting it wrong.

It needs the size, so I need to convert the slice to an array or reference to array;

use num_traits::FromBytes; // 0.2.19

fn convert<T: num_traits::FromBytes<Bytes=[u8;4]> + std::fmt::Display>(data:&[u8;4]){
    let value = T::from_le_bytes(data);
    println!("{value}")
}

fn main() {
    let a = [1,0,0,0];
    // let value:u32 = FromBytes::from_le_bytes(&a[..]);
    // println!("{value}");
    let _ = convert::<u32>(a[0..4].try_into().unwrap());
}

I thought I could make this more flexible, but I doubt now. I want to use it to convert from bytes arrays to any array, as long as the length is a multiple of the size of the output type.

Use as_chunks, verify that last part is empty?

Sounds like a plan. Thanks.

You can use TryInto for &[u8] -> [u8; N] conversion:

pub fn read_u32_le(bytes: &[u8]) -> Option<u32> {
    bytes.try_into().map(u32::from_le_bytes).ok()
}

You can use unwrap if you want to panic instead of returning Option.

3 Likes

The issue I'm finding with using as_chunks (to then be able to read the u8s into another number type) is that it will copy the contents to an array IIUC. I think this is unnecessary in my case at least.

It seems I would need as_chunks but as an iterator, to then convert "as it's read" so to speak.

There is chunk_exact which would yield an iterator but has no static size which I think I need, and they say:

If your chunk_size is a constant, consider using as_chunks instead, which will give references to arrays of exactly that length, rather than slices.

It seems I would need to loop over unsized chunks, and use try_into each time. Or something similar.

If it's arbitrary u8 array then it's very much necessary, because it may not be aligned properly to treat elements as u32 pieces. But that's separate issue.

No. It returns reference to array. Array uses the same memory as slice. Reference would be, usually, optimized away by the compiler.

Possible, sure. Chances are hight that this would produce much worse code than as_chunks, though.

It's much harder for the optimizer to optimize away iterator. One extra reference is much easier.

My current draft is:

/// Read bytes into a `Vec<T>` where `T` is another number-type.
fn read_n<T, const N: usize>(data: &[u8], n: usize) -> Vec<T>
where
    T: FromBytes<Bytes = [u8; N]> + Display,
{
    let (items, _) = data.as_chunks::<N>();
    items.iter().take(n).map(T::from_le_bytes).collect()
}

fn main() {
    let a = [1, 0, 0, 0, 4, 5, 6, 7, 8];
    // I want to improve call vvvvvv
    println!("{:?}", read_n::<u32, 4>(&a, 2))
}

Ideally, I would like to be able to write,

let a= read_n<u32>(&a, n);

where the u32 can also be moved to the LHS.

The 4 (which is size_of<T>()) should be available from the const function size_of but this alternative fails:

/// Read bytes into another number type.
fn read_n<T>(data: &[u8], n: usize) -> Vec<T>
where
    T: FromBytes<Bytes = [u8; { size_of::<T>() }]> + Display,
{
    let (items, _) = data.as_chunks::<{ size_of::<T>() }>();
    items.iter().take(n).map(T::from_le_bytes).collect()
}

due to const generic not being allowed to use a generic type there. I am reading here but it's quite long.

Does anyone have any suggestions?

I was kind of surprised this worked, but it seems like the compiler can deduce the intermediate array size:

use std::fmt::Display;
use num_traits::FromBytes;
/// Read bytes into another number type.
fn read_n<T, const N: usize>(data: &[u8], n: usize) -> Vec<T>
where
    T: FromBytes<Bytes = [u8; N]> + Display,
{
    let (items, _) = data.as_chunks();
    items.iter().take(n).map(T::from_le_bytes).collect()
}

fn main() {
    let bytes = [0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0];
    let x: Vec<u32> = read_n(&bytes, 3);
    println!("{:?}", x);
}

playground

2 Likes

Thank you, now I can keep going. This is my current draft of a BufferReader, done for learning purposes as a first project.

I think it is quite horrible still, but am happy of how Rust works. The docs use an ElfReader as example. Need to update that.

Draft
//! Tool for reading data from a `Vec<u8>`.
use std::{fmt::Display, string::FromUtf8Error};
use num_traits::FromBytes;

/// Some index-update functions can start with `_`
/// These do not return the new index.
pub trait BufferReader {
    /// Get the current index in the buffer.
    fn index(&self) -> usize;
    /// Reference to the underlying data.
    fn data(&self) -> &[u8];
    #[allow(dead_code)]
    /// Mutable reference to the underlying data.
    fn data_mut(&mut self) -> &mut [u8];
    /// Set our current index to `new_index` value.
    fn _set_index(&mut self, new_index: usize);
    /// Length of underlying data vector.
    fn len(&self) -> usize {
        self.data().len()
    }
    /// Update index value by `n`.
    /// Panics moving beyond `usize::MAX`.
    fn _ahead(&mut self, n: usize) {
        self._set_index(self.index() + n);
    }
    /// Update index value by `-n`.
    /// Panics moving below `0`.
    fn _back(&mut self, n: usize) {
        self._set_index(self.index() - n);
    }
    /// Set the index and return the value.
    fn set_index(&mut self, new_index: usize) -> usize {
        self._set_index(new_index);
        self.index()
    }
    /// Move `n` items ahead.
    /// Panics moving beyond `usize::MAX`.
    fn ahead(&mut self, n: usize) -> usize {
        self._ahead(n);
        self.index()
    }
    /// Move `n` items back.
    /// Panics moving below 0.
    #[allow(dead_code)]
    fn back(&mut self, n: usize) -> usize {
        self._back(n);
        self.index()
    }
    /// Read a single byte. Updates `index`.
    fn read_byte(&mut self) -> u8 {
        match self.data().get(self.index()) {
            Some(&b) => {
                self._ahead(1);
                b
            }
            None => panic!("Out of bounds access to data."),
        }
    }
    /// Read out a single item of type `T`. Updates `index`.
    /// Example:
    /// ```rust
    /// let elf = ElfReader::new(data:vec![1,1,0,0]);
    /// let result = elf.read_t(1);
    /// assert_eq!(result.len(),4)
    /// assert_eq!(elf.index(),4)
    /// ```
    fn read_t<T, const N: usize>(&mut self) -> T
    where
        T: FromBytes<Bytes = [u8; N]> + Display,
    {
        self.slice(size_of::<T>())
            .try_into()
            .map(T::from_le_bytes)
            .unwrap()
    }

    /// Slice `n` items. Updates index.
    /// Mostly an implementation detail.
    /// Example:
    /// ```rust
    /// let elf = ElfReader::new(data:vec![1,1,0,0]);
    /// let result = elf.slice(4);
    /// assert_eq!(result.len(),4);
    /// assert_eq!(elf.index(),4)
    /// ```
    fn slice(&mut self, n: usize) -> &[u8] {
        if n == 0 {
            panic!("n must be larger than 0.")
        }
        let old_index = self.index();
        let new_index = self.ahead(n);
        // -1 since `end_range` is non inclusive
        // No need to check `old_size`.
        self.assert_within_bounds(new_index - 1);
        &self.data()[old_index..new_index]
    }

    /// Read `n` ascii chars into a utf-8 String.
    /// Updates the index.
    fn read_ascii(&mut self, n: usize) -> Result<String, FromUtf8Error> {
        String::from_utf8(self.slice(n).to_vec())
    }

    /// Read bytes into a `Vec<T>`.
    /// `n` is the number of items of type `T` to read.
    /// And the length of the vector.
    /// Example:
    /// ```rust
    /// let data = vec![0,0,0,0, 1,0,0,0, 2,0,0,0];
    /// let elf = ElfReader::new(data);
    /// let result:Vec<u32> = elf.read_vec(3);
    /// assert_eq!(result, vec![0,1,2]);
    /// assert_eq!(elf.index(), 12);
    /// ```
    #[allow(dead_code)]
    fn read_vec<T, const N: usize>(&mut self, n: usize) -> Vec<T>
    where
        T: FromBytes<Bytes = [u8; N]> + Display,
    {
        if n == 0 {
            panic!("n must be larger than 0.")
        }
        let (items, _) = self.slice(size_of::<T>() * n).as_chunks();
        items.iter().take(n).map(T::from_le_bytes).collect()
    }
    /// Ensure `new_index` is within the data span.
    fn assert_within_bounds(&self, new_index: usize) {
        let l = self.len();
        if new_index >= l {
            panic!("Index {new_index} >= length {l}.",)
        }
    }
}


I'm happy to take any advice from anyone, on incremental improvements (just not something completely advanced since that is not something I will understand.)

Some errors I am aware of are that I do a + b which can fail on a usize if we are above MAX or below MIN, maybe I should use saturation or something. Also, I am using the code in a binary parser as a first test (before writing unit tests, oops) and it panics eventually, so I got some bugs. Fixed.