Why does layout cannot be what in comments or similar as long in 1 byte but instead 3 bytes
Because you need to be able to create valid &ABC and &bool from &WholeEnum.
Can you explain further please
See also:
Sorry Im still confused little bit, as what this post is saying
although the exact layout of a #[repr(Rust)] enum is unspecified, rust's ABI requires you must be able to get a reference of the payload of a variant from a reference of the enum. this prohibits the compiler from "merging" the discriminants.
for example:
enum Abc { A, B, C }
enum AbcTImesTwo {
X(Abc),
Y(Abc),
}
impl AbcTimesTwo {
// if the discriminant were merged, this function will be unimplementable
fn as_abc(&self) -> &Abc {
match self {
&Self::X(ref abc) => abc,
&Self::Y(ref abc) => abc,
}
}
}
Thank you
Another way to say this: bool and ABC both have 1-byte (8-bit) alignment. So they can't be packed into a single byte.
Hmm… but wouldn't it be technically possible for WholeEnum to be only 2 bytes in size? If ABC only has one variant, the length of WholeEnum is indeed only 2 bytes, but there should still be enough free niches (due to the bools) in WholeEnum for the other two variants in ABC, shouldn't there?
enum WholeEnum {
A(ABC, bool),
B,
C(ABC, bool),
}
enum ABC {
A,
// B,
// C,
}
fn main() {
println!("{}", core::mem::size_of::<WholeEnum>()) // Output: 2
}
With only one variant in ABC, size_of::<ABC> is 0, so WholeEnum fits in 2 bytes (one for the discriminant, one for the bool).
Niches only help if case there is no need to store an actual value. But both WholeEnum::A and WholeEnum::C contain ABC and bool values, so niches in those types don't help.
In this case, one byte is needed to store the ABC, and one byte is needed to store the bool, and these must all be represented in their usual form to be referenceable, &ABC and &bool, so there is no way to distinguish between WholeEnum::A and WholeEnum::C without adding a third byte as discriminant.
(If the types of the fields of WholeEnum::A and WholeEnum::C differed in some way that gave them non-overlapping representations — for example, if instead of a bool in one of the two, there was an enum Bool2 { False = 2, True = 3 } — then in principle it could be two bytes, but rustc currently doesn’t support that form of niche optimization.)
Well, I believe there actually is, but it would be hard to discover by compiler:
Given that bool is represented by values 0 and 1, if we choose ABC to be represented by values 2, 3 and 4 (which we are free to do iiuc), we can represent WholeEnum as (u8, u8) like this:
WholeEnum::Aas(ABC, bool)WholeEnum::Bas a constant(5, 0)WholeEnum::Cas(bool, ABC)
This way all three variants differ in first byte.
That's an interesting observation, but it would be very hard to achieve and very rare to use optimization.
Way better would be to implement “big references” that may point to individual bits… but in practice you don't want to use these often (because of code bloat) thus can implement such tricks with some kind of ad-hoc machinery (that would be used when WholeEnum needs to be packed/unpacked). It would only be useful if you have millions of these, anyway… in such case ad hoc machinery is justifiable.
The biggest thing blocking this right now is that the compiler expects variant checks to be against a single stored value. So if let WholeEnum::A(_) = foo needs the stored byte in foo to have a single value (basically it compiles to if foo.tag == TagOfA), but using the encoding you described it'd need to check for if (foo.tag - 2u) <= 2 instead.
That's not impossible, of course, but it's something not supported by the current layout algorithm, and it's unclear how much of an impact the extra instructions would have vs the smaller representation. Someone would need to do the work to find out.