Not quite getting it: ergonomic handling of slices/arrays with known fixed sizes

Hello!

This is my first post here, so I apologise if this has ended up in the wrong place. Have made an attempt to search for the answer to this question, but can't really piece it together. I am quite the beginner when it comes to Rust, so my searches may be hindered by me lacking the right terminology.

In essence, I seem to be having problems with getting slices and arrays working happily/ergonomically together. Specifically, I need to write/read a little binary blob that needs to be in a specific format, i.e. basically

struct SomeRecord {
 ... values of various types ...
}

impl SomeRecord {
  fn encode(&self) -> [u8; 32] { ... }
  fn decode(bytes: [u8; 32]) -> Self { ... }

I can get it to work, but I suspect I am missing something because it seems a bit clunky. Beginning from the beginning, this works fine:

  fn encode1(number: u64) -> [u8; 8] {
    number.to_le_bytes()
  }

  fn decode1(bytes: [u8; 8]) -> u64 {
    u64::from_le_bytes(bytes)
  }

Trying to pack more than one thing into an array, I would naïvely start with something like this:

  fn encode(...) -> [u8; 32] {
    // a zero-initialised u8 array of the right length
    let mut output [u8; 32] = [0; 32];
    // won't compile: left side is a slice, right side is an array
    output[0..4] = self.some_u32.to_le_bytes();
    ...
    output
  }

After a few iterations, I end up with something like this instead:

  fn encode2(n1: u64, n2: u64) -> [u8; 16] {
    let mut output : [u8; 16] = [0;16];

    output[0..8].copy_from_slice(n1.to_le_bytes().as_slice());
    output[8..16].copy_from_slice(n2.to_le_bytes().as_slice());

    output
  }

As for decoding, I have a [u8; 16] (for example) and conceptually I want to do this:

u64::from_le_bytes(bytes[8..16])

but now I have the same problem again: from_le_bytes takes an array and bytes[8..16] is a slice. Here, again it seems to me like the size should be known at compile time since 8..16 is rather constant. Anyway, what I end up with is something like:

  fn decode2(bytes: [u8;16]) -> (u64, u64) {
    let first = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
    let second = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
    (first, second)
  }

Given that I am working with fixed-size records and fixed-size data types, I do not worry about the try_into().unwrap(). But it seems like it defeats the purpose a bit: given we are working with an array and constant ranges, is there a more ergonomic way of doing this?

I have tried looking into the serde codebase but it is a bit overwhelming for a noob, I have to admit :sweat:

Not sure what you are trying to use Serde for, here – getting inspiration as to how to implement the serializer for a binary format? You won't find that there – Serde is generic glue between a serialization format and a data structure that wants to (de)serialize itself. It isn't concerned with the particular representation of either your own custom types, or any particular data encoding format.

Anyway, the problem here is that slicing is an indexing operation, realized by the Index trait. As such, the trait knows nothing about constant sizes – it simply wasn't designed to do that. So if you want a compile-time bounds-checked, non-panicking implementation (I'm assuming that's what you are after – "more ergonomic" in itself isn't a particularly helpful request to make), you'll have to use something else.

A silly example would be to (ab)use the fact that arrays are infallible patterns, and thus you can get the individual elements out:

fn decode2(bytes: [u8; 16]) -> (u64, u64) {
    let [
        b0, b1, b2, b3, b4, b5, b6, b7,
        b8, b9, ba, bb, bc, bd, be, bf,
    ] = bytes;
    let first = u64::from_le_bytes([b0, b1, b2, b3, b4, b5, b6, b7]);
    let second = u64::from_le_bytes([b8, b9, ba, bb, bc, bd, be, bf]);
    (first, second)
}

I would obviously not recommend doing this, but it's technically infallible. Another way to make the code shorter would be to hide the potential panics, and use the byteorder crate, which reads from slices rather than arrays:

use byteorder::{ByteOrder, LittleEndian};

fn decode2(bytes: [u8; 16]) -> (u64, u64) {
    let first = LittleEndian::read_u64(&bytes[0..8]);
    let second = LittleEndian::read_u64(&bytes[8..16]);
    (first, second)
}

This will still panic if you change your array to something that has less than 16 elements, but there's no explicit unwrap() (you might regard that as an advantage or a disadvantage).

Once const generics become smarter, it might become possible to write a function of the form

fn split<const N: usize, const K: usize>(bytes: [u8; {N + K}]) -> ([u8; N], [u8; K]) {
    todo!()
}

but currently this isn't a thing.

Thanks for your detailed reply!

I actually did ponder "abusing" using the arrays similarly to your suggestion. It works relatively well things like numbers, which are never longer than 16 bytes. My first attempt at that was along the lines of

  let first = u64::from_le_bytes([bytes[0], bytes[1], bytes[2], ...]);

which I am assuming will panic the first time I try it if I get it wrong, and never thereafter, since the size of bytes is known. Functionally equivalent to try_into().unwrap(), given those circumstances.

If what I am doing is close to the way it "should be done", I am satisfied. I just wanted to make sure I am not missing some puzzle pieces that could make this type of task more idiomatic and easier to read. Given how clumsy the solution seemed to me, I assumed I must be missing something.

I'm not sure what you are describing here. This will panic if any of the indices is out of bounds, and it will always panic if that's the case (and never if it's not). This is the difference between indexing and my "silly" pattern matching example – this particular pattern match never fails/panics.

Two function calls hardly counts as "clumsy".

Yes, it was the behaviour you describe that I meant. In my unclear post, "the first time" referred to during testing/development when I will notice it failing because I possibly got some indices wrong, but once they are correct I would expect it to never panic, just as you say.

Definitely acceptable :+1:

1 Like

Some ideas for future alternative implementations using currently unstable APIs

#![feature(slice_flatten)]
#![feature(slice_as_chunks)]

pub fn encode(n1: u64, n2: u64) -> [u8; 16] {
    [n1, n2].map(u64::to_le_bytes).flatten().try_into().unwrap()
}

pub fn decode(bytes: [u8; 16]) -> (u64, u64) {
    let [n1, n2] = <&[_; 2]>::try_from(bytes.as_chunks().0)
        .unwrap()
        .map(u64::from_le_bytes);
    (n1, n2)
}
1 Like

You get the basic idea of working with arrays and (de-)serializing values. Indeed, it's often not pretty. Unfortunately current Rust has no way to know that x[a..b] could be represented as [T; b - a], even for literal values of a and b. The reason is that indexing on slices is mostly just an ordinary function call, and has no way to specialize on constant value of the argument a..b. Even if it could, Rust's current compile-time evaluation engine and const generics aren't powerful enough to express constraints like "array of length b-a".

But that doesn't mean that you need to keep ugly error-prone casts and manual copy_from_slice calls all over your code! Indeed, your operations follow a simple pattern which is easy to abstract away. I would suggest taking a look at the Read / Write traits, and at the byteorder crate. The latter defines extension traits ReadBytesExt / WriteBytesExt traits, which allow simple (de-)serialization of integers into an in-memory buffer (or, more generally, any reader/writer). Internally, the extension methods do mostly the same thing as in your code, but they wrap it all into a high-level meaningful API.

1 Like

Thanks! Those are useful pointers.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.