Smaller size of const data?

Hello, I am writing a firmware, and I'd like it to be the smallest possible. As often, in embedded programming, there is a ton of const data, because they end up being in the flash memory and do not consume precious RAM.

To summarize my problem, I have an enum with empty variants, but since it has also a big variant, its size is big as well. I have several arrays holding these enums.

My understanding is that if I use references, the same empty variant will not be duplicated but stored only once in the flash. To give you a concrete example, given the following code:

pub enum Data {
    Small,
    Big([i32; 8]),
}

pub const ARRAY_1: [Data; 4] = [Data::Small, Data::Small, Data::Big([123; 8]), Data::Small];

pub const ARRAY_2: [&Data; 4] = [&Data::Small, &Data::Small, &Data::Big([123; 8]), &Data::Small];

Which version of ARRAY takes the most space in flash: ARRAY_1 or ARRAY_2? I think that the latter is supposed to take less space in total, is that right?

I'll be grateful to be corrected if some of my assumptions are wrong :slightly_smiling_face:

This depends on your pointer size. Let's suppose it's 2 bytes (16 bits).

Data is 8 bytes, so ARRAY_1 takes 8 × 4 = 32 bytes. But ARRAY_2 contains pointers, so in addition to the data itself there is the space occupied by the pointers:

  • 2 × 4 = 8 bytes for the pointers stored in the array
  • 8 bytes for a Data::Small
  • 8 bytes for a Data::Big([123; 8])

for a total of 8 + 8 + 8 = 24 bytes. On the other hand, if your platform has 4-byte pointers, then the total is 16 + 8 + 8 = 32, the same as ARRAY_1.

If you have lots of arrays reusing the same data values, then the [&Data] version will win. If you don't, then the [Data] version could be better because you avoid storing a pointer for every value, in addition to the value itself.

And in general, if enum's size is equal to or smaller than a pointer, then not using a pointer is always better.

1 Like

My example is only an example. In reality, the enum is much bigger than a pointer (something like 24 bytes) and the arrays have a size of several dozens to hundreds. Most of the items are "empty" (variant with no data).

My main question is if this variant is stored once in the flash, or for every occurrence? From your answer, I assume the former?

Yes, you can see it in assembly:

example::ARRAY_2:
        .quad   .L__unnamed_1
        .quad   .L__unnamed_1
        .quad   .L__unnamed_2
        .quad   .L__unnamed_1

These are 4 pointers to static data, 3 of which are the same.

5 Likes

Is it possible to replace Data with something like Option<&[i32; 8]>?

Unfortunately, no :face_exhaling:

This is for a keyboard firmware, and each key can be Nothing, Transparent or Action(ActionData), so there cannot be any null pointer optimization like in Option.

Maybe it is possible to implement a hack with an union, like 0 is a variant, 1 is another one, and otherwise it's a pointer? AFAIK, 1 should be outside of the addressable memory range?

Then just replace Big([i32; 8]) with something like Big(&'static [i32; u8]). Would take place for the discriminant, but still take less space then the entire copied array.

Thought about this. Might even be just a wrapper for a raw pointer, not a union, but the question is about address 1. And, well, it depends on the hardware, the OS, the binary format you use… and while I might use a trick like that myself, I would never say that it’s perfectly sound.

If you can assure that 1 is used by something else, this might work.

1 Like

So, to recapitulate:

pub enum ActionData {
    // 24B
}

pub enum MaybeAction {
    Nothing,
    Transparent,
    Action(ActionData),
}

const LAYER: [MaybeAction; 48];

Since ActionData is an enum as well, and there is an optimization going on, every "maybe action" is 24 bytes.

pub enum MaybeAction {
    Nothing,
    Transparent,
    Action(ActionData),
}

const LAYER: [&MaybeAction; 48];

Here, it is 2 bytes (if we assume addresses are 16 bits) + 24 bytes per unique "maybe action". So, if there are mainly unique pieces of data, it makes things worse.

pub enum MaybeAction {
    Nothing,
    Transparent,
    Action(&'static ActionData),
}

const LAYER: [MaybeAction; 48];

Now, we have 2 bytes for the discriminant (I assume, if the platform is 16 bits) and 2 for the pointer, so it's 4 for the enum + 24 per unique piece of data.

The 3rd one is clearly the worse if I'm right with my calculations.

Whether the 1st or 2nd one is the best depend on the actual layout created by the user (my project is a library) so I assume that I could make my layer generic, with T: AsRef<ActionData> and the user could choose the one with the lowest size?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.