Creating a sparsely populated struct?


#1

I’ve got a whole chunk of data in memory but I’m only really interested in a relatively few locations. Is there a good way to label the memory I’m actually using while ignoring the rest. So far I’m using a struct like the following which looks a bit awkward.

struct Data {
    field1: u32,
    reserved1: [u8; 8], // skip a few
    field2: u16,
    reserved2: [u8, 128]
    // ...
}

I guess I could do something with macros and pointers but I’d have thought that’d be more convoluted and error prone.


#2

What do you mean with “ignoring”? How about using a database, and indexing the key-attributes? if you want to perform in-memory search over large data sets, maybe using Column-oriented_DBMS


#3

It might be overkill, but look at a parser combinator lib like nom - it’ll let you parse out just the data you want, and you can describe the schema via DSL-like macros. In that sense, your Data shouldn’t even have the exact binary layout of the memory - it should have just the data you want (possibly slices referencing the memory).

Alternatively, you can use a crate like byteorder and define your Data more like this:

struct Data<'a>(&'a [u8]);

impl<'a> Data<'a> {
    fn field1(&self) -> u32 { ... use byteorder to read the u32 at the right offset }
    fn field2(&self) -> u16 { ... use byteorder to read the 16 at the right offset }
}

One issue is you’ll need to make sure your offsets are correct, and the layout of the memory isn’t visible here vs the Data in your original example. However, you might be able to make this less error prone by using some macros to generate this boilerplate.


#4

Can you clarify a bit about the data? Is it a single “blob” with lots of fields, or more like a table, where you are interested in only one column?

For a single blob, your “reserved” solution looks very workable. Though I would recommend to name it “ignored”, to better make clear that it is used by others, but uninteresting to you. “Reserved” implies “not used by anyone (yet), reserved for future use”.
Sometimes people start doing “clever” things with reserved areas, which in this case would overwrite the (unused) data you have in there.

For a table-like thing, Frehberg has good suggestions.

It is also important if the “ignored” data should stay in memory. Do other programs need it? Otherwise, extracting only the needed parts into a new struct might be smarter (as Vitaly suggests).


#5

The data is from a C programming which is either shared via an api or loaded from a file (basically a memory dump). In the first case modifying the memory arbitrarily would be an issue but not when loading from a file.

I think @vitalyd solutions best fit my problem so I’ll investigate those.


#6

Make sure you mark this #[repr(C)], otherwise you can’t assume anything about the actual layout.


#7

Thanks for the reminder! Fortunately I do actually have that in my code but I forgot to add it to my example.


#8

As for the file-reading, if you are in the position to define the binary file format, take a look at protobuf serialization/deserialisation. The attributes are tagged in binary, compact fashion.
You would be able to ignore attributes while reading the files, just keeping the fraction of your structures of interest for your tool.