Compact memory layout of enum with shared fields

I am building a compiler intermediate representation similar to the one in librustc/mir/repr.rs, but my Instruction type needs to have some shared fields, so it looks like this:

struct Instruction {
    opcode: u32,
    result_type: u8,
    payload: Payload,
}
enum Payload {
    Unary { arg: u32 },
    Binary { args: [u32; 2] },
    Other(Box<MoreData>),
}
struct MoreData { ... }

I want this type to be 16 bytes large, but the Payload discriminator gets in the way. It is stored inside the Payload type, doubling its size from 8 bytes to 16. Is there any way of storing the Payload discriminator in the padding after result_type to keep the whole Instruction type at 16 bytes?

Related to this, the Box<MoreData> is normally just an 8-byte pointer, but if MoreData is a dynamically sized type, it doubles in size because it also stores the MoreData size. Is there any way of storing the size of a dynamically sized type inside the object itself so references don't double in size?

There are some proposals to give more control on how to represent the enum tags (and where to put them). For a system language as Rust I think this flexibility will become a growing desire.

This is the "thin traits" proposal, you can search for it with Google, and here:
http://smallcultfollowing.com/babysteps/blog/2015/10/08/virtual-structs-part-4-extended-enums-and-thin-traits/

But are you sure the the fat pointer is a problem for you?

Thanks!

It would increase the size of an important data structure by 50%. My instructions are already twice the size of LuaJIT's, this would make them 3x.

As a workaround, I can use a Vec<u32> inside the MoreData struct instead of a dynamically sized array. That adds an extra level of pointer chasing, but I can live with that.

No, because alignment. The representation of Payload inside Instruction has to be the same as its representation anywhere else, because of references; Payload must be 12 bytes in size, in order for args to be properly aligned when the discriminator is itself word-aligned. (Theoretically you could get around this by accessing the discriminator by a negative offset from the base reference, but that'd be a rather involved change to memory allocation.)

(At the risk of giving a very superficial answer, does opcode really need to be a u32?)

I'm guessing that the Payload discriminator is functionally determined by opcode. If you're trying to optimize this as much as possible, you could get rid of the discriminator entirely; unions are not implemented yet, but mem::transmute is an option. Sadly the rules for transmuting references are a bit fuzzy.

  1. Because there is only one reference to the referent of a Box<>, moving the vtable pointer from the reference to the heap would in general be a wash, although in this particular case it could be useful to shorten the longest arm of the enum. It's more useful for Rc<>.
  2. What you are describing is known as a thin trait; it's not implemented but has been discussed quite a bit.

A good union proposal should offer a way to wrap in a safe interface the discriminator. Here the discriminator of Payload could go inside Instruction.

result_type will be 4 to 8 bytes as well ( alignment) . ( even if you put it at the end because it becomes struct alignment) . Note no explicit in rust There is no syntax for struct alignment · Issue #325 · rust-lang/rfcs · GitHub

I remember i did a project once in C++ where i used bit fields and later char fields and used 1 byte alignment and the performance was terrible. Better to use normal alignment and chunk the work , most compilers write "parts" to the disk. Even if 32 bytes 1 Gig would still hold 32M of instructions , strings esp for calls will dwarf this ..

One way is just to store as u16 integer and provide a getpayload which creates the enum. So your trading convenience / maint and a small amount of performance for smaller storage . You will need set methods at well.

At this point definitely premature opt unless you have some bench marks,