EitherBox<A,B>, anyone made one yet?

Anyone built something like this yet: ' an owned pointer to either A or B with the lower bit(s) being a discriminant'
I dont think it would work with match, you'd have to access it with lambdas,

either_box.match_with(|&a|{..what to do with A..}, |&b|{..what to do with B...})

initially just 2 options, but based on alignment you could have more (having a 'none' might be useful)

were there any plans toward making things like this work with match .. generalizing the discriminant mechanism (or maybe building a few more cases into the compiler, like the null-pointer one)

I suppose you could build this in 2 layers, e.g 'a bitfield discriminant' then plug in Box<A>,Box<B> as the two variants

why would you want this:- compared to Box<Either<A,B>> A,B could be different sizes , and it's still one allocation. compared to Either<Box<A>,Box<B>>, it's saving space on the discriminant (which might be padded out, appearing alongside a pointer)

How would EitherBox<EitherBox<...>> work ?

How would EitherBox<EitherBox<ā€¦>> work ?

consistently, I think: the outermost level says it's "a pointer with low bit discriminant", the first option for the thing being pointed at would also be 'a pointer with a low bit discriminant' in the EitherBox<EitherBox<>,...> case.. I suppose you could look into having more variants and common ones accessible through fewer 'hops' in memory, by that sort of structure .. EitherBox<A,EitherBox<B,C>> // if 'A" is expected to be much more common.. probably more interesting to look into generalizing the discriminant size based on expected alignment, e.g. 4byte aligned objects could get 2 bits of discriminant hence up to 4 options, etc..

Would need to have a good reason to mess with bit fields when a byte used by enum is usually insignificantly small.

when a byte used by enum is

but what about alignment

if you're on a 64bit machine, doesn't your byte plus 8byte pointer become 16 bytes. ( I guess if it's part of a structure rust might be able to pack other things in?) but I was just messing with tiled arrays for voxels.. the use of an enum plugged in e.g. Array3d<MyCompressedTileEnum> made it look very elegant but I'm not getting the same bit-packing options when I do this manually in C/C++ (or in unsafe code).

The ultimate would be is if the discriminant mechanism was opened up with some sort of overloading capability .. then we could have both the high level elegance and the low level optimisation .. but I know the capability is there already to match C/C++ with unsafe code, and there's other ways to build fairly useable abstractions (e.g. accessing with lambdas instead of match).

I wonder if a case could be made for building some more cases into the compiler (ie. we have the null-pointer opt inbuilt).. 'if each variant has enough spare bits, use them to pack the discriminant..'

1 Like

This was discussed somewhat in rust-lang/rfcs#1230.

Many of which were recently implemented too Refactor type memory layouts and ABIs, to be more general and easier to optimize. by eddyb Ā· Pull Request #45225 Ā· rust-lang/rust Ā· GitHub

More exotic packing would be cool although I donā€™t know if the (compiler) complexity is warranted. The space savings may also be negated by the underlying allocator using size classes.

I donā€™t know if the (compiler) complexity is warranted.

I would guess it would be quicker to get some more cases inbuilt rather than design and agree on a mechanism for users to generalise it.. although the latter would be amazing. I suppose you could argue with pointer-packing there's reason to question if the unpacking would be a performance hit (size/speed tradeoff) but I suspect that cost would be insignificant compared to the space savings (i.e cache sparing effect)

I dont think language complexity is a problem so long as the features are well designed/fit together etc. I think it's a fallacy that C++'s complexity is a problem - it's rather that it's features are kind of half broken so you have to stretch/combine them in awkward ways

1 Like

One issue is that you couldnā€™t just do this packing by default - thereā€™re going to be users that donā€™t want any perf implications because their working set already fits in the cache hierarchy (or for some other reason). So anything with tradeoffs involved is a bit tricky.

And as mentioned, the underlying allocator may still ā€œoverallocateā€, negating the savings potentially.

I agree but I was referring to compiler implementation complexity. Thereā€™s a budget on that given limited resources and itā€™d be good to have some idea of practical improvements this would create before someone does it. Otherwise, there are bigger fish to fry.

the underlying allocator may still ā€œoverallocateā€, negating the savings potentially.

well thats another issue that would need sorting if it happens: but as I understand Rusts' underlying allocator is actually quite good, it does follow the 'user-passes-size-to-free' idea which enables it to get rid of padding

thereā€™re going to be users that donā€™t want any perf implications because their working set already fits in the cache hierarchy

I suppose it could be a #[] opt-in if its deemed unexpected behaviour; maybe something like #[repr(packed)] (wasn't there an option to specify the discriminant type r.e. making layouts that you can pass to C, #[repr(u32)] guarantees it's a u32, etc.. I can't quite remember how it worked. Conversely , that option could over-ride packing if you wanted to guarantee it's padded out.. but I agree changing default behaviour is less ideal)

I agree but I was referring to compiler implementation complexity.

i wonder if there's a halfway house between 'general language feature' ('discrimnant overloading') and 'hard-coded compiler magic' .. e.g. would it be possible to make something that looks close to a compiler plugin (or just a sane layer in the compiler) that handles discriminant packing logic .. i.e. handling the existing 'null-pointer optimisation' in a generalised way (I should read the link given above.. I didn't really get the detail of it)

As of today, you can select a global allocator for the process - either the system one or jemalloc. In the future itā€™ll be possible to have multiple disparate allocators in the same process. So the notion of the underlying allocator isnā€™t quite valid.

The sized destruction isnā€™t really the issue I was referring to. Instead, itā€™s the slabs/bins/size classes that an allocator may assign to allocation requests (this is usually done to reduce fragmentation). For example, jemalloc has several size classes/bins: JEMALLOC
If you reduce the size of an object then it needs to drop it into a lower size class or else no savings are gained. Hereā€™s one personā€™s fight with that before: Optimizing Rc memory usage in Rust | by Robert Grosse | Medium

So the notion of the underlying allocator isnā€™t quite valid.

I should have really said allocator interface - it's the fact that rusts low level interface does have the size passing from the outset; so long as thats happening, i'm happy (reallocing is a different kettle of fish and in critical situations i think about avoiding it)

If you reduce the size of an object then it needs to drop it into a lower size class or else no savings are gained.

in the cases I"m envisaging, the sizes aren't changing: it's they'd just fill either (boxed) variant slot; the idea is to avoid the padding out that you do for a plain enum { Foo(..), Bar(..)} ( .. at the expense of a bit of pointer chasing , of course the correct tradeoff depends on the exact situation)