Packing enum in u8

What is the current state of the art for mimicking the following

pub enum Foo {
  A(u4),
  B(u4),
  C(u3),
  D(u2),
  E(u2),
  F(bool)
}

and wanting to pack all this into a single u8

Context: storing voxel data, absolutely want to stuff into u8, as u16 = world half as big

You could do it using the unstable rustc_layout_scalar_valid_range_start and _end attributes.

#[rustc_layout_scalar_valid_range_start(0)]
#[rustc_layout_scalar_valid_range_end(16)]
pub struct U4(u8);

Beyond that, the only way I know to do it is by hand.

2 Likes

I don't think it's possible if the values (and thus payload types) overlap. Let's say it was something like

Bits of Foo
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
\           /    \              /
 ` discrim.'      `varnt. data '

If I have a &Foo and I do

if let Foo::F(ref boolean) = *foo { ... }

Now I have a reference to a bool that's not 0 or 1, and that's UB. And the byte is behind a &, so it's UB to change it.[1]

("But the value of the discriminant can be 0 for bool" doesn't defeat this argument because it has to be non-0 for something.)


However it could work if the values are non-overlapping, as then then the value can just be returned / the value indicates which variant it is, like NPO. But this means your two u4 would have to be distinct types, for example; if you wanted two bool-likes at least one of them can't be a 0-or-1 bool, etc.

(And it's not guaranteed you'll get the desired optimization if you do this. I didn't attempt it.)


  1. Without making the entire thing a Cell or such and thus non-Sync. Edit: That's nonsense, you just can't do it; even with a Cell you'd need to leave the discrim. bits unmodified so other existing shared references didn't lose their minds. â†Šī¸Ž

1 Like

I'd implement this by declaring a struct Foo(u8) then doing the bitpacking manually.

I'm sure there's a crate out there with a custom derive that'll achieve the same thing, but for something niche like this it's probably better to write the code yourself and have full control over how the voxels are laid out.

The issue with #[rustc_layout_scalar_valid_range_start] and friends is that you'll be 1) relying on unstable rustc-internal attributes, and 2) relying on the compiler to generate exactly what you want. It's also probably not possible to do with a normal enum anyway, because you are always allowed to take a reference to a field, and then you run into weirdness like how &bool expects to be exactly &0b0000_0001 or &0b0000_0000 and that won't probably be possible if you've also got to include a tag in that byte.

9 Likes

I think I misunderstood your argument. At the core of it, is it this: x86_64 ptrs only address bytes, and having the above "work" would require adressing individual bits of bytes ?

You could use the unpacked Foo by value in your APIs, but then bit-pack it smaller in your mass storage.

My point is in order to pack everything into one byte, you can't actually store a discriminant value at all -- the variant needs to be implicit in the value of the byte. Because you can get a reference to the variant field data, and that must be valid for the field data type -- not contaminated by some discriminant-tracking bits. And yeah, you can't get references to something smaller than a byte in Rust.

Bitfield crates have similar issues (they can't return references[1]), and have to return masked copies or whatnot.


But yeah, your example is small enough to make it unambiguous in terms of how many distinct values you need, but probably doing it yourself is a better bet than trying to maneuver Rust doing it for you in an enum (and as mentioned if you need two bools or something you'd have to do more work yourself anyway, even if you lucked out and got the enum working how you like size-wise).


  1. without a ton of caveats anyway â†Šī¸Ž

1 Like

And yeah, I don't think there's a good way (if a way at all) to get Rust to pack it for you.

Rust allows getting &muts to fields, which means they can't overlap, as writing to the field would overwrite the other fields. One day we'll hopefully have something like "move-only fields" so that pointers aren't allowed and the layout algorithms can be smarter, but that doesn't exist yet.

If you want bitpacking, use a crate -- maybe https://crates.io/crates/bitflags

1 Like

This doesn't prohibit the optimization — otherwise Option<bool> couldn't be 1 byte (which it is). Note that if you have a &bool or &mut bool, then

  • You must have matched the enum to find the bool-holding variant, so the reference points to a valid bool.
  • If you have an &mut bool then you have exclusive access, so nobody else can write a non-bool while you're looking.
  • You can only write, to a &mut bool, a value that is a valid bool, so it's not possible to write a different variant unintentionally.

The problem with bitfield references is that they might conflict with other references to different parts of the same byte but there are no such references here. In principle, there is nothing that stops rustc from using discriminant-less representations of variants whose representations don't overlap — it just doesn't currently implement that, except in the case where at most one variant has a non-ZST field.

Right, but that also means (as you mention indirectly) that no other variant can be a bool.

I'm almost always seeing people looking for enum Foo { A(bool), B(bool) }, not enum Bar { A(ZeroOrOne), B(TwoOrThree) }. Certainly if Rust had 0_u2 and 0_u4 (to pick types from the OP) they'd have the same representation byte as false.

2 Likes

If I may, I would suggest modular-bitfield as a crate. It's very nice. Apart from that I don't really get how that would be useful to OP.

1 Like