Pointer with niche for unaligned values?

I have a struct containing only a pointer to an allocated thing on the heap (think of it kind of like a Box with a fixed inner type), and that allocated thing has an alignment greater than 1. So I know that the pointer will never be null, nor will it ever have an odd address. I'd like to figure out how to express this in a way that the compiler is able to use as a niche for optimizing the layout of structs/enums.

I've found that NonNull<MyTy> enforces the pointer to not be null, such that Option<NonNull<MyTy>> also only needs 8 bytes (and same for things that wrap it, like Box<MyTy>). So right now I have this:

#[align(2)]
struct MyTy { .. }

#[repr(transparent)]
struct MyTyWrapper {
    my_ty_ptr: NonNull<MyTy>,
}

Result<i32, MyTyWrapper> should be able to fit in the same 8 byte space, by using the niche where the pointer is unaligned, but I can't figure out if there's a way to write this in code such that the compiler will actually use it (since nothing forces NonNull<MyTy> or *mut MyTy to actually be aligned, so those types don't have this niche). Is there a way to express this that I'm missing?

2 Likes

There is not yet any support for alignment niches. Here's some previous discussion: RFC: Alignment niches for references types. by moulins · Pull Request #3204 · rust-lang/rfcs · GitHub

In addition to that, fitting an i32 into the alignment niche space would need not just alignment-based niches a new kind of niche. Right now, niches are understood as a contiguous range of numeric values that are available for use to mean other enum variants; what you're asking for would need the compiler to be able to understand "you can put an arbitrary value in the high bytes as long as the lowest byte is odd". Right now, the compiler never packs an arbitrary value into niche space, but only the containing enum’s other variants' discriminants.

Is there any reason why we can't presently use that space to store an i32? Or is this just that it's a hard optimization to do and no one's made it happen yet?

As far as I know, yes, the only obstacle is that it’s a lot of work to implement.

I would say that it's “hard optimization to please all potential users”. Because usually these packing schemes quickly evolve into pretty elaborate dance of what you want to put there (int32, float32, well, maybe bool, and few more enums… or maybe better to support RGB with three 16bit components…).

Expressing all that zoo in unsafe code is relatively simple affair… developing some elaborated scheme that would be usable for all such potential users… it's hard, really hard: often such extra-tightly packaged schemes are used in tight inner loops (unless you have millions of these it's not worth the effort) and thus to achieve best performance very careful benchmark-driven design is needed… bit-based tetris is not something one may expect compiler to do automatically.

Compare to the existing Option-style optimization that's used in almost every program and in an identical way with bazillion different types: just pick one value that's not used – and use it for None.

Cases of this sort would not need any bit-packing at all. Suppose that our pointee’s alignment is 16 and we have several variants:

#[repr(align(16))]
struct MyTy { .. }

enum MyEnum {
    A(*mut MyTy),
    B(i32),
    C([u8; 7),
    // ... possibly more ...
}

Then, we can lay out MyEnum as follows (with X standing for the data):

   Field type    Numeric value
   ----------    ---------------------
A  *mut MyTy     0xXXXX_XXXX_XXXX_XXX0
B  i32           0xXXXX_XXXX_0000_0001
C  [u8; 7]       0xXXXX_XXXX_XXXX_XX02
   ...           ...

Room for 16 variants, the discriminant is obtained with value & 0xF, and all of the fields are stored (and referenceable) in normal fashion without any bit-packing or lowered alignment. The compiler doesn’t know how to do this right now, but there aren’t particular costs to it (that don’t also apply to current niches).

With the smaller alignment of 2, it's harder to write out a digit-wise example, but there is still room for 2 variants (such as Result) this way.

This would depend on what exactly you are trying to pack. Look on dynamic language engines… there are a lot of very clever schemes. Not only the infamous NaN-boxing (something that compiler would, probably, not be able to support), but many other schemes. It's not atypical to squezee a dozen of types in 32bit pointer (that only have three “free”) bits and more in 64bit pointer (with 8bytes alignment).

Sure, some support for some subset of all that zoo may be valuable… but the question of effort and effect is always big in that space. Typical desire to have more than one pointer (so there would be pointer to string with not extra overhead and pointer to some object with vtable) is something that's simply just impossible to create with enum API.

From what I know so far Rust provided some function to make sure one may write sound unsafe code to play bit-tetris (needed because provenance and bit tetris don't like each other all that much) and… that's it for now.

Unaligned in general is hard, because (as someone said) the compiler only supports ranges right now.

Probably the first thing that will happen is the niche where -SIZE..ALIGN are impossible, because that's already (part of) the validity invariant for references and it works as a range so the compiler can support it.

It's difficult, though, because of cycle problems. People are working on it, but doing it without making everything slower is hard.

The point in my post that you are replying to is precisely that cases like the one @JarredAllen was asking for — more generally, cases where the other variant's field is at least one byte smaller than the pointer-to-aligned value, and aligned less than the size of the pointer, do not ever require bit-packing. This is a well-defined subset of all possible layouts, and relevant to the design of Rust because Rust layout today always works on whole bytes and not bits.

I’m not saying that it’s necessarily worth adding support for those cases, but there is no need to insist that bit-packing must also be considered as more of the same. There is a well-defined boundary between these two kinds of layout optimization: don't touch individual bits. That’s all you need. Sure, it doesn't allow "two kinds of pointers", but nobody was asking for that in this conversation.

It's only relevant if we may find out enough places where people employ such structures.

Do we have any idea about how common are they? I know that it's a big of a chicken-and-egg problem because without support from compiler people wouldn't be using them often… but there are significant amount of code that plays these games (currently with unsafe). How often what people are doing in these cases is expressible in the scheme that you are proposing?

I suspect that justification if the form “here we have one potential user which may or may not still with us tomorrow” is a bit too weak to justify all the complexity that such feature would, by necessity, impose on the compiler. That's why I brought all these “short strings”, “interpreters”, “extra-speedy dynamic dispatch schemes” and the like: together they may justify the addition to the language… but how often they fit in the proposed scheme?

In my personal experience these schemes very quickly evolve into something where you want to have few “hot” pointer types encoded and/or many disparate “small” types (more than 8 is typical)… but perhaps my experience is atypical… have anyone done any study of what people actually use in the wild?