Compact memory layout of enum with shared fields


#1

I am building a compiler intermediate representation similar to the one in librustc/mir/repr.rs, but my Instruction type needs to have some shared fields, so it looks like this:

struct Instruction {
    opcode: u32,
    result_type: u8,
    payload: Payload,
}
enum Payload {
    Unary { arg: u32 },
    Binary { args: [u32; 2] },
    Other(Box<MoreData>),
}
struct MoreData { ... }

I want this type to be 16 bytes large, but the Payload discriminator gets in the way. It is stored inside the Payload type, doubling its size from 8 bytes to 16. Is there any way of storing the Payload discriminator in the padding after result_type to keep the whole Instruction type at 16 bytes?

Related to this, the Box<MoreData> is normally just an 8-byte pointer, but if MoreData is a dynamically sized type, it doubles in size because it also stores the MoreData size. Is there any way of storing the size of a dynamically sized type inside the object itself so references don’t double in size?


#2

There are some proposals to give more control on how to represent the enum tags (and where to put them). For a system language as Rust I think this flexibility will become a growing desire.

This is the “thin traits” proposal, you can search for it with Google, and here:
http://smallcultfollowing.com/babysteps/blog/2015/10/08/virtual-structs-part-4-extended-enums-and-thin-traits/

But are you sure the the fat pointer is a problem for you?


#3

Thanks!

It would increase the size of an important data structure by 50%. My instructions are already twice the size of LuaJIT’s, this would make them 3x.

As a workaround, I can use a Vec<u32> inside the MoreData struct instead of a dynamically sized array. That adds an extra level of pointer chasing, but I can live with that.


#4

No, because alignment. The representation of Payload inside Instruction has to be the same as its representation anywhere else, because of references; Payload must be 12 bytes in size, in order for args to be properly aligned when the discriminator is itself word-aligned. (Theoretically you could get around this by accessing the discriminator by a negative offset from the base reference, but that’d be a rather involved change to memory allocation.)

(At the risk of giving a very superficial answer, does opcode really need to be a u32?)

I’m guessing that the Payload discriminator is functionally determined by opcode. If you’re trying to optimize this as much as possible, you could get rid of the discriminator entirely; unions are not implemented yet, but mem::transmute is an option. Sadly the rules for transmuting references are a bit fuzzy.

  1. Because there is only one reference to the referent of a Box<>, moving the vtable pointer from the reference to the heap would in general be a wash, although in this particular case it could be useful to shorten the longest arm of the enum. It’s more useful for Rc<>.
  2. What you are describing is known as a thin trait; it’s not implemented but has been discussed quite a bit.

#5

A good union proposal should offer a way to wrap in a safe interface the discriminator. Here the discriminator of Payload could go inside Instruction.


#6

result_type will be 4 to 8 bytes as well ( alignment) . ( even if you put it at the end because it becomes struct alignment) . Note no explicit in rust https://github.com/rust-lang/rfcs/issues/325

I remember i did a project once in C++ where i used bit fields and later char fields and used 1 byte alignment and the performance was terrible. Better to use normal alignment and chunk the work , most compilers write “parts” to the disk. Even if 32 bytes 1 Gig would still hold 32M of instructions , strings esp for calls will dwarf this …

One way is just to store as u16 integer and provide a getpayload which creates the enum. So your trading convenience / maint and a small amount of performance for smaller storage . You will need set methods at well.

At this point definitely premature opt unless you have some bench marks,