Help: how do I package datacontainers of different types in a single struct while retaining access to the data stored in the container?

Hi everyone! I've been breaking my head over this one for the last couple days and I just can't seem to figure it out.

Background

I want to create a crate for parsing a specific astronomical data format. In this format, a file may (or may not) contain multiple units composed of a header and a data section, called Header-Data-Units.

I want my crate to:

  1. recognise, based on the metadata contained in the unit's header, which kind of header data unit we are dealing with and parse the raw data into an appropriate data container
  2. wrap the whole file in a single user-facing struct that contains the Header-Data-Units in such a way that the user can access the data in its parsed form

I have already written a bunch of code to deal with parsing headers and and the data and such, but packaging everything in such a way that it is available to a user has been a failure so far.

Issue 1

The first thing I tried was representing the header data units as trait objects:

struct AstronomicalFile {
    hdus: Vec<Box<dyn HDUTrait>> //or a hashmap or whatever
}

with the HDUTrait hiding the generic parameter in the HDU struct specifying the data type:

trait HDUTrait {}

struct HDU<T: DataContainer>{
    header: Header,
    data: T
}

Whith T implementing some trait that returns the parsed data

pub trait DataContainer {
    type DataType;
    fn get_data(self) -> Self::DataType; //returns parsed user-friendly data format
}

Problem is that now the user is no longer able to access the get_data() function from the AstronomicalFile struct, since adding a function to the HDUTrait to retreive the data (returning Self or Self::DataType) makes the compiler complain that it isn't a valid trait object anymore.

Issue 2

The user should not be able to choose the data type they get when they open the file. Instead, the type of T in HDU<T: DataContainer> should be determined during parsing. I don't know how to do this either since returning impl DataContainer only allows for returning ONE type of datacontainer, not an arbitrary one.

Any help would be much appreciated!

Is the total number of different units (header-data-units?) fixed? In that case I think using an enum is much better than using trait objects.

1 Like

In principle, yes, but the number is in the thousands :frowning_face:

So how are you parsing it in? Surely you don't have thousands of branches in your parser.

yeah maybe my reasoning is wrong. I parse the data in a step-wise way: I start by determining the data container (few options ~10). Some of those I then want to parse further. For instance, I want to represent Images as ndarrays (from the ndarray crate). Ndarrays have different types if they have different numbers of axes or different data types. The images can have up to 999 axes and u8, i16, i32, i64, f32 and f64 are allowed as data types.

I do most of the actual parsing with generics (just not specifying the number of axes or data type) and a bunch of helper functions that match keywords from the header to determine what those generics should be (most of these parser functions return trait objects)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.