Pre-RFC: Zero-Sized References to Zero-Sized Types

Today I found that running

mem::size_of::<&mut ()>()

returns 8. I expected that much, but have many cases where it would be desirable to have it 0 sized. After a short but flamed talk on #rust I came up with the next change to Rust:

RFC: Zero-Sized References

References to any type in rust are represented as a pointer. Usually the pointer is smaller and faster to move around. However for Zero-Sized Types that only have a single value (for example () ) moving around is a no-op, and can be optimized away. Reading the value is a no-op since it has only a single value anyway and therefore it carries no extra information. However currently the compiler can't optimize away the pointer from data structures.

Zero-Sized Types are useful for functions, lifetime guarantees and destructors, and references to them can be used to show these types "exist" (for references) or "you are the only one using it" (for mutable references). The actual value is meaningless and the representation should be optimized to be of size 0 as well. After all Rust tries to make code as fast and small as possible.

Issues

Are references == pointers?

The first issue I encountered is was a conflict in the definition of "Reference". From what I understood, a reference is a way to access the original value without moving it around. It being a pointer was just an implementation detail.

However some people took the definition of "Reference" as a "Safe Pointer". Under their definition the fact the representation is a pointer is an inseparable part of it being a reference.

If the definition is the second case, then this whole RFC is irrelevant.

*mut and void*, and safe pointers

void*, a pointer to an unknown type, is represented in rust as a *mut c_void. However there are cases where people use other representations for it, including *mut (). It might still be fine, but the other issue is they use &mut () as a safe pointer (granting ownership guarantees and non-null guarantees).

For non-null pointers there exists core::nonzero::NonZero, however it is unstable and therefore unusable with stable rust.

In addition, any optimizations to &mut () might break code written in this way. Even if it was wrong to write it like this, rust is trying to prevent code from breaking, and might decide not to add this optimization for their sake.

Even if the optimization will be accepted, there should be either warnings added about turning zero-sized references into zero-sized pointers ( &mut () as *mut () ) and make it return a random pointer (NULL?), or throw an error for converting these.

Inaccessible data

Sometimes the data exists at some address but is not accessible through normal dereference. For example &mut FILE might be used with FFI and contain valid data at some location.

The above example would better be represented as

struct FILE {file: *mut c_void}

But if valid usage of references to inaccessible data can be found there might be a need to implement an escape hatch from this optimization, to hint that the value is not Zero-Sized, possibly

impl !Sized for FILE;
7 Likes

So what is the mem::size_of::<&mut MyType>() if my type is Struct MyType<'a, T> {iner: &'a T}? Are all types that have references unsized?

It should also be 0 if T is Zero-Sized and 8 (assuming 64 bit) if it's not. Since the compiler knows the size of whatever T you use it can also find out the size of the references later. If T is unsized we keep the pointer representation.

I'm not familiar with the inner workings of the compiler in this case though, can the compiler trace types to figure their size? I assume it can (for cases like struct in struct) so it shouldn't be a problem to do the same with reference sizes.

Are you going to write a RFC?

1 Like

Although I couldn't find any reason against this change that feels justified, after thinking for a while I couldn't find a use case for the references either. I decided not to bother with an RFC that doesn't have even a single justification (I was also very busy at that time).

If you have a specific example where you use these references, I'll be happy to push the RFC forward.

The standard library has to deal with them (e.g., in iterators). If we could just say "they're zero sized", we could stop debating what value they should have.

That's actually a pretty good argument, I feel. :wink: The value is ultimately arbitrary, and the fact that they have to have a value is frequently annoying.

Also notice that the size of &mut T already depends on T -- namely, if T is unsized, this is a fat pointer.

One interesting question is what to do with the "coinductive" case:

struct Foo<'a> { field: &'a mut Foo<'a> }

If Foo is non-ZST, then &'a mut Foo<'a> is non-ZST and hence Foo is non-ZST.
If Foo is a ZST, then so is &'a mut Foo<'a> and hence so is Foo.
Lucky enough, there is no contradiction here, just an arbitrary choice.

7 Likes

I don't think that structs and enums are capable of reaching this case, because you can't create the very first instance of this struct since it requires a reference to an earlier version, just like

struct Foo (Foo);

can't be created. enum stops being ZST once you add more than a single case. union can reach the "coinductive" case though, if it ever is ZST in the first place (I don't know?).

I think it's possible to solve it with a similar algorithm to how we calculate the size in the first place - check the size of the dependencies, and if you ever reach yourself then assume not ZST because of the argument above.

It is now open. Enjoy!

But I gave you the code above tor each this case. :wink:
Mind you, you cannot instantiate the type in safe Rust, but the type does exist and you can ask for its size.
Also, it can be instantiated in unsafe code.

My point is that this case is so rare and hard to reach it's safe to just assume non-ZST. Faster and simple to calculate and (almost? or completely?) no performance difference.