How to handle reading from bytes into different numeric types unknown at compile time?

Hi, I am new here and trying to learn some Rust concepts by trying to code a small library that would be able to read a standardized file format (a simple key-array format).
This format has a header and descriptors that define a list of arrays and their types to be read from the bytes of the file.
I've handled reading the header and descriptors, but now I'm confronted with some issues I don't know how to handle reading the arrays to their correct type, which is only know at execution time.

I've tried sketching some minimal example with just two types, which should give you an idea of the problem.
I feel like this should be handled with generics and traits but I didn't to manage to get there on my own, and trying to follow the compiler indications didn't get me anywhere.
Any pointers or solutions appreciated, Thanks!

fn main() {
    enum Len {
        small,
        long,
    }

    struct Item<T> {
        len: Len,
        array: Vec<T>,
    }

    struct Store<T> {
        items: Vec<Item<T>>,
    }

    trait FromBytes<T> {
        fn from_bytes(&self, bytes: &[u8]) -> Vec<T> {
            vec![]
        }
    }

    impl<T> FromBytes for Item<T> {
        fn from_bytes(&mut self, bytes: &[u8]) {
            self.array = match self.len {
                Len::small => bytes
                    .chunks(4)
                    .map(|c| i32::from_le_bytes(c.try_into().unwrap()))
                    .collect(),
                Len::long => bytes
                    .chunks(8)
                    .map(|c| i64::from_le_bytes(c.try_into().unwrap()))
                    .collect(),
            }
        }
    }

    let mut store: Store = Store { items: vec![] };
    store.items.push(Item::<i32> {
        len: Len::small,
        array: vec![],
    });
    store.items.push(Item::<i64> {
        len: Len::long,
        array: vec![],
    });

    store.items[0].from_bytes(&[1, 2, 3, 4, 5, 6, 7, 8]);
    store.items[1].from_bytes(&[1, 2, 3, 4, 5, 6, 7, 8]);
}

Rust is statically typed, and all enum variants have the type of the enum, so there's no way to have the variant of Len dictate the type of T in something like

    struct Item<T> {
        len: Len,
        array: Vec<T>,
    }

You can either propagate the enum pattern up however many levels makes the most sense...

pub enum Value {
    Small(i32),
    Long(i64),
}

pub struct Store<T> {
    items: Vec<T>,
}

// Use `Store<i32>`, `Store<i64>`, or `Store<Value>`
// ...or...
pub enum Store {
    Small(Vec<i32>),
    Long(Vec<i64>),
}

...which may involve a lot of explicit matching/branching and use of macros to cut down on repetition. (Sketch.)

And/or at some level, you can try to hide that behind type erasure instead...

pub trait UsefulThings {
    fn value_at(&self, idx: usize) -> Option<Value>;
}

impl UsefulThings for Vec<i32> { ... }
impl UsefulThings for Vec<i64> { ... }

struct DynStore {
    items: Box<dyn UsefulThings + Send + Sync>,
}

impl DynStore {
    pub fn from_bytes(bytes: &[u8], len: Len) -> Self { ... }
}

...though defining the trait usefully can be a pain, especially if you're doing a lot of primitive integer operations. (Sketch.)


Perhaps you could find more in-depth inspiration by checking out how the image crate handles static, dynamic, and generic buffers.

1 Like

what about

enum ArrayType{
    Short(Vec<i32>),
    Long(Vec<i64>)
}
struct Store{
    arrays:Vec<ArrayType>
}

also i recommend you use chunks_exact over chunks as it allows you to set an explicit behavior for the reminder rather than having a crash on the unwrap (and i think optimizes a bit better)

Thank you both for your answers, it helped me grasp some additional things. I'll explore those solutions during this week.

For now I went with @giocri's solution as this is the easiest to understand for me.
(I understand the use of chunks_exact but I don't think I would need it in my use case).

I think the length of bytes of each type is implicit now in the Item struct and do not need extra information. Although I'm not sure how I could avoid the byte conversion code repetition in the fill_array function (I will consider 10 types in the end).

fn main() {
    #[derive(Debug, Clone, Eq, PartialEq, Ord, PartialOrd)]
    enum Array {
        I32(Vec<i32>),
        I64(Vec<i64>),
    }

    #[derive(Debug)]
    struct Item {
        array: Array,
    }

    #[derive(Debug)]
    struct Store {
        items: Vec<Item>,
    }

    impl Item {
        fn fill_array(&mut self, bytes: &[u8]) {
            self.array = match self.array {
                Array::I32(_) => Array::I32(
                    bytes
                        .chunks(4)
                        .map(|c| i32::from_le_bytes(c.try_into().unwrap()))
                        .collect(),
                ),
                Array::I64(_) => Array::I64(
                    bytes
                        .chunks(8)
                        .map(|c| i64::from_le_bytes(c.try_into().unwrap()))
                        .collect(),
                ),
            }
        }
    }

    let mut store: Store = Store { items: vec![] };
    store.items.push(Item {
        array: Array::I32(vec![]),
    });
    store.items.push(Item {
        array: Array::I64(vec![]),
    });
    println!("{:?}", &store);

    store.items[0].fill_array(&[1, 2, 3, 4, 5, 6, 7, 8]);
    store.items[1].fill_array(&[1, 2, 3, 4, 5, 6, 7, 8]);

    println!("{:?}", &store);
}