Generic function for from_be_bytes

Hello,

How would I make a generic function out of the below variants.
Also, is it possible to make a generic function to cover/convert 512 bits ([0u8; 64]), for example.

Want to replace loads of ByteOrder TurboFish statements
(rdr.read_u16::<BigEndian>()?)
with something like
read_be(rdr, num_bytes)
where, reader's cursor is adjusted to move forward like in the below functions.

fn read_be_u32(input: &mut &[u8]) -> u32 {
    let (int_bytes, rest) = input.split_at(std::mem::size_of::<u32>());
    *input = rest;
    u32::from_be_bytes(int_bytes.try_into().unwrap())
}

fn read_be_u16(input: &mut &[u8]) -> u16 {
    let (int_bytes, rest) = input.split_at(std::mem::size_of::<u16>());
    *input = rest;
    u16::from_be_bytes(int_bytes.try_into().unwrap())
}

Source: u32 - Rust

You want one where you can pass the number of bytes as an argument? What should the return value be? It can't depend on a number determined at runtime.

Return should be the same as what read_be_u16 returns, i.e., same bytes read/interpreted as Big-Endian

u16::from_be_bytes(int_bytes.try_into().unwrap())

In most cases, it'll be either u16 or u32 or u64 that I'll be passing to the function.
But also want to cater for the conversion of [0u8; 64].

... so I can do something like:

let fld_32 = read_be(rdr, 32);
let fld_512 = read_be(rdr, 512);
let fld_288 = read_be(rdr, 288);

You might be able to implement that with const generics now:

let fld_32 = read_be::<32>(rdr);

or at least using an older array hack:

let fld_32 = read_be::<[_; 32]>(rdr);

or use a macro that translates it back to the byteorder calls.

let fld_32 = read_be!(rdr, 32);

but I wonder what is the endian of a 64-byte type in your case. These aren't typical libstd types. Is that a bignum? Is that an array of smaller integers?

You could try something like this:

trait FromBytes {
    fn from_be_bytes(a: &mut &[u8]) -> Self;
}

impl<const N: usize> FromBytes for [u8; N] {
    fn from_be_bytes(a: &mut &[u8]) -> [u8; N] {
        let (int_bytes, rest) = a.split_at(N);
        
        let mut me = [0u8; N];
        me.copy_from_slice(int_bytes);
        
        *a = rest;
        me
    }
}

impl FromBytes for u64 {
    fn from_be_bytes(a: &mut &[u8]) -> u64 {
        u64::from_be_bytes(FromBytes::from_be_bytes(a))
    }
}
impl FromBytes for u32 {
    fn from_be_bytes(a: &mut &[u8]) -> u32 {
        u32::from_be_bytes(FromBytes::from_be_bytes(a))
    }
}

fn read_be<T: FromBytes>(input: &mut &[u8]) -> T {
    T::from_be_bytes(input)
}

Here it will figure out how many bytes it should consume depending on what type it returns.

Oooh.. I like the simplicity of the first one.
How would I go about implementing it though?

but I wonder what is the endian of a 64-byte type in your case. These aren't typical libstd types. Is that a bignum? Is that an array of smaller integers?

I'm passing a &[u8] into a function, using read_be within that function to 'consume' & keep moving the cursor forward.
The 64-byte ones are actually char SOME_VAR[64]; in the reference C/C++ code.

They're actually strings but I need to take those bytes as a slice or an array, so that I can pass them to a codepage conversion function.

Thank you, as always, for your detailed examples.
Congrats on becoming a paid contributor to tokio recently!

I'm afraid this is about the same size as having separate functions like in the first post.
Except that your soln. also caters for variable-sized arrays.
Just for that factor, I might need to use this.

In the first impl block, why are we copying int_bytes into me and returning it.
That is, why these 2 lines?

me.copy_from_slice(int_bytes);
me

There is nothing you can do that would beat defining several functions in code length.

It is just a way to go from &[u8] to [u8; N].

Your examples are always beautifully clean.
Incredible articulation of speech & code.

Thank you once again!

1 Like

I recently built a crate for a similar purpose that implements this using const generics that you might like.

See https://crates.io/crates/eio.

It does this using a FromBytes trait which is defined like the the following. Which is implemented for all the std library number types.

pub trait FromBytes<const N: usize> {
    fn from_be_bytes(bytes: [u8; N]) -> Self;
    fn from_le_bytes(bytes: [u8; N]) -> Self;
}

Using this trait you could implement your functions like this:

use std::io;
use eio::FromBytes;

fn read_be<T: FromBytes<N>, const N: usize>(mut rdr: &mut &[u8]) -> io::Result<T> {
    let mut buf = [0u8; N];
    io::Read::read_exact(rdr, &mut buf)?;
    Ok(T::from_be_bytes(buf))
}

But eio actually provides a ReadExt trait which does exactly this, it also works with anything that implements Read, for example with a cursor:

use eio::ReadExt;

let mut data = io::Cursor::new([0x37, 0x13, 0x12, 0x34, 0x56, 0x78]);

let a: u16 = data.read_be()?;
let b: i32 = data.read_be()?;

Comparison to byteorder

eio provides a lot of the same capabilities as the popular byteorder crate but with a very different API. The advantages of eio are the following:

  • It is extendible, anyone can implement FromBytes or ToBytes for their own integer types.
  • Uses the core/std {from,to}_{le,be}_bytes functions to do the conversion for floats and integers. byteorder reimplements these.
  • Doesn't require turbofish type annotations all the time.
// byteorder
let i = rdr.read_u16::<BigEndian>()?;
// eio
let i: u16 = rdr.read_be()?;

That is again excellent.
Why not push this out as 1.0?

A couple of questions:

  • Does data have to be a cursor, or can it just be a &[u8]
  • Will this work, then?
let c: [u8; 64] = data.read_be()?;

Why not push this out as 1.0?

Well it's quite new and there are still some unanswered questions with regards to the API. I was hoping to get feedback from people using it first.

Does data have to be a cursor, or can it just be a &[u8]

You need something that implements Read. &[u8] doesn't implement read because in order to read from it you would need to store the current position as a state. You can use an &mut &[u8] however.

Will this work, then?

let c: [u8; 64] = data.read_be()?;

Not yet, but it wouldn't be too hard to implement. By the way you can already do this with anything implementing Read like this:

let buf = [u8; 64];
data.read_exact(&mut buf)?;

So it wouldn't be too hard to write your own function.

fn read_array<const N: usize>(mut rdr: &mut &[u8]) -> io::Result<[u8; N]> {
    let buf = [0; N];
    data.read_exact(&mut buf)?;
    Ok(buf)
}

Also I'm not sure that endianness is something that makes sense for an array :thinking: ? What would you expect the following to return?

let data = io::Cursor::new([1, 2, 3, 4])?
let a: [u8; 2] = data.read_be()?;  // [1, 2] or [2, 1] ?
let b: [u8; 2] = data.read_le()?;  // [3, 4] or [4, 3] ?

Maybe the following is better, because it is unambiguous? This could be something that I could add to eio.

let c: [u8; 2] data.read_array()?;

That's good for me, for now.

This will be good, because again, I can avoid initialising 'n' potentially different-sized arrays all over the source.

I see what you mean.
For u16, u32, u64, u128, the native from_be_bytes function will suffice, as I'll be dealing with numbers.

For larger slices like [u8; 64], I can use your read_array example to get the bytes.
These are mostly strings (char var[64] in the reference C/C++ code).
Then, I pass those bytes to a codepage conversion function, where i use that array as a slice.
If it can be processed by from_be_bytes, then that handles the endian conversion.
If it's an array/slice, then I don't need to do endian conversion, I just pass it to the codepage conversion function, where the slice will be assumed (rightly so) as be (because the source is be).

Yeah, that'll be great.

Apart from these, I've to decide if I have to use a BufRead or stick with &[u8].
In the initial phase, I'm loading the whole file in memory (vec), so BufRead wouldn't make a difference (says docs.rs).
Goal is to take virtual records (varying-length records) from a binary file, then pass each virtual record to a function that'll parse the slice.