Compiler Magic Types

blonk · December 1, 2022, 4:49pm

Just out of random curiosity: What types in Rust are "magic" (in the sense that re-implementation of the types (without special/out-of-the-ordinary compiler support) would be impossible)?

When I've browsed around in std I've seen that some types have special attributes, so I assume it's not necessarily that the type names themselves are made magic by the compiler, but rather that the types are tagged with some magical incantation attributes.

But regardless of how they are made special, is there a list of types that get special treatment from rustc?

(The origin of this question is that I stumbled on a compiler issue on github where someone suggested adding a special case to Box, and someone replied "Please no, Box has too many special cases already").

H2CO3 · December 1, 2022, 4:57pm

Box is one, but other obvious candidates are the built-in primitive types (integers, floats, characters, !/never, etc.).

Another interesting case is UnsafeCell which essentially tells the compiler that "things in this wrapper can be mutated through a shared reference, so don't optimize as if it were unique or read-only".

I'm not sure if there's an exhaustive list of all "magic" types, or if there is even consensus regarding what types do and do not count as "magic" (e.g., Rc can be used as a Self type, and it can perform unsized coercions, but it's nonetheless a library-defined type – now is it magic?). There is, however, a list of language items, which you should be able to find quickly via google.

afetisov · December 1, 2022, 5:18pm

The upper bound on the set of magic types is the set of lang items. These are types and functions which have special treatment in the compiler. For example, the variants of Option<T> are lang items. This is because they are used internally in the desugaring of for-loops into primitive loop blocks.

Now, not all lang items are really magic. The Option<T> above is a perfectly normal type, it's just that for-loops won't compile if the compiler doesn't know how to find it. To get a list of truly magic types, one should carefully go through all lang items and think whether they can be implemented in user code. I don't know of any readymade list, and I won't do it either. Besides the already mentioned primitives, UnsafeCell and Box, this list includes ManuallyDrop (normally you can't have drop fields in unions), MaybeUninit (because it suppresses the niches in inner types), Unpin and Pin (Pin<&mut T> for T: !Unpin are allowed to alias), most stuff in std::marker, and some others.

Note that this list changes with time. Sometimes a former lang item may become a normal language construct, sometimes a normal type is found to require special treatment (I believe MaybeUninit was just an ordinary union when it was first introduced). Some things, like auto traits (Send, Sync, Unpin), depend on unstable language features. If the auto trait feature is stabilized, they may become ordinary traits.

EDIT: Note that the stuff about Pin<&mut T> aliasing I mentioned above is an internal hack in the compiler, and not part of the current stable guarantees. You shouldn't rely on it in your own code.

scottmcm · December 1, 2022, 7:17pm

That's actually the normal behaviour of unions, not something special to MaybeUninit.

Based on the comment in the source code, it's just marked lang because it's used in the generator desugaring.

Aiden2207 · December 1, 2022, 8:14pm

ManuallyDrop isn't magic, and MaybeUninit isn't either. Adding a ZST as a union field automatically suppresses niches. Pin isn't magic either, and it definitely doesn't affect the aliasing rules. Most types in the standard library that seem magic can be entirely recreated on nightly without needing to become a lang item.

Ignoring usage by the compiler or in traits that are lang items, there are only three non-primitive types that can't be entirely recreated on nightly: Box^[1], PhantomData^[2], and UnsafeCell^[3].

There are also intrinsic functions^[4] which are special functions implemented in the compiler, although plenty can be emulated with just nightly, or sometimes even stable rust. There are also a bunch of built-in macros, look for things tagged with the #[rustc_builtin_macro] attribute. Pointer/reference dereferencing is built in, as are integer operations, as casts, Box::new, and slice indexing.

Any trait in std::ops or std::cmp is a lang item, although most of those aren't special beyond their usage in the compiler for implementation of operators (notable expectation is the Drop trait). Most of what's in std::marker are also lang items^[5], and these are the ones that have real magic. That's where Copy and Sized are, among other traits.

There are also some hidden lang items that perform important tasks but never see the light of day. For example, the Receiver trait is used to determine what types can be used as as the self target in methods- any type which implements Deref<Target = Self> + Receiver can be. It's why methods like fn poll(self: Pin<&mut Self>) are permitted. You can even see it in action on nightly.

Another fun trait is the Pointee trait. Not only is it automagically implemented by the compiler, but the associated functions are are straight up undefined behavior, with a note by the implementor that effectively says we're std so this will actually work.

Tldr; there are lots of lang items, and while there are a handful of magic ones that behave in weird ways, most aren't particularly interesting except for the compiler using them when desugaring language constructs, and a lot of the "magic" types are just implemented using generally available if wildly unstable nightly features.

Allows moving struct when dereferencing ↩︎
Has an unbounded generic parameter ↩︎
Allows converting from &UnsafeCell<T> to &mut T soundly ↩︎
Fun fact: these intrinsics are the only place in all of rust you can get monomorphization time errors. ↩︎
The one that isn't is Send, it's an ordinary auto trait. Sync needs to be a lang item, because types used in statics are required to be Sync. ↩︎

chrefr · December 1, 2022, 8:28pm

ManuallyDrop is definitely magic (although it may not be), but MaybeUninit isn't.

Another are the Range* types, as they have the syntax .. and ..=.

CAD97 · December 1, 2022, 9:00pm

Well,

afetisov · December 1, 2022, 9:08pm

I didn't see any associated functions or notes about UB. Broken links?

ManuallyDrop absolutely is magic. You can't prevent the dropping of fields otherwise. Even mem:forget is implemented in terms of ManuallyDrop. In the past, I think mem::forget was an intrinsic. We still have it, although it's now used only in forget_unsized.

Strange, I though ManuallyDrop was implemented as a magic union, but now I see it's a magic struct. Did something change or is my memory faulty?

I thought I saw some discussions about its special behaviour. Maybe something changed, maybe I'm misremembering.

Doesn't it? I actually don't know the current implementation, and it's not a user-facing guarantee either way, but there are a lot of proposals for special-casing Pin<&mut !Unpin> aliasing, and reasons why something like that is inevitable.

scottmcm · December 1, 2022, 9:18pm

unions suppress niches even when they have only a single field (as in my link above), but ManuallyDrop preserves niches, so I think once that was clarified it was changed to a struct to reduce special-casing.

afetisov · December 1, 2022, 9:33pm

Indeed, it became a struct and a lang item in 2018, in commit 591eeff2. Weird, why did I think it was a union? I didn't even program in Rust at that time, and I'm not in a habit of reading old sources.

Besides the layout, this change also allows not to special-case ManuallyDrop with respect to Drop fields in unions. So it's almost a normal type, just suppresses recursive Drop on its field.

CAD97 · December 1, 2022, 10:20pm

Because the way to actually use Pin<&mut _> requires going through Pin::get_mut_unchecked, it's not sufficient to special case Pin<&mut ?Unpin>^[1] aliasing without also special casing &mut ?Unpin aliasing.

There's a possibility that ultimately we'll eventually require the use of some UnsafeAliasCell instead of just containing something !Unpin, but some form of &mut ?Unpin will relax &mut's aliasing requirement the same way &?Freeze relaxes &'s immutability requirement.

In such a potential future it maybe won't be Unpin directly but an implementation-detail unsafe autotrait like Freeze^[2] instead, but the aliasing magic being on the autotrait instead instead of Pin (making Pin only about value address/liveness stability^[3] and not aliasing) seems inherent to the design of Pin.

I've brought this up in development discussion channels before, and IIRC RalfJ (the defacto memory model understander) agrees with me that Pin::get_mut_unchecked makes limiting the aliasing weakening to Pin impractical. Minimally, it'd require doing the same "oops it's basically always UB" to Pin::get_mut_unchecked that we've done with mem::uninitialized as well as probably significant improvement to the ergonomics of working with raw pointers.

The final nail making limiting the magic to Pin impossible, though, is the new documentation of futures::poll_fn^[4] which explicitly allows you to pin things in the closure's captures, as the closure is FnMut and thus calling the closure necessarily goes through a &mut reference without a Pin wrapper. If only Pin suppressed LLVM-noalias, doing so would be UB in the way it very subtly is for an unconditionally Unpin PollFn (like std's was for a single release^[5]).

Yes, ?Unpin (unknown Unpinness), not !Unpin (known not Unpin). &mut dyn Trait is not LLVM-noalias, because it could potentially be a reference to some !Unpin type. This is especially important for &mut dyn Future, since those are quite often !Unpin. In an alternative world, Unpin could have also been an opt-out bound on traits as well to avoid this potential pessimization. Given that dyn Trait could be zero bytes and method dispatch reasserts the concrete type's invariants, though, it's likely that this doesn't have a big impact on optimizations. The same goes for ?Freeze disabling LLVM-noalias (which is only about write aliasing, not read/read aliasing) on &. ↩︎
Because Unpin is safe to implement, this gives UnsafeAliasCell a subtle footgun if Unpin is the opt-in to UB-aliasing &mut, since you could still safely reborrow the cell with UB-aliasing &mut to a containing struct ↩︎
Interesting side note: a pinned value address can be used for interesting unsafe tricks even without aliased writes, such as using the address to uniquely identify the value. ↩︎
It was in the discussion around letting PollFn inherit !Unpin from the closure captures where I brought up get_mut_unchecked, and IIRC, that was part of what convinced the relevant teams that going through the temporary &mut was valid and that the minor breaking change of removing the unconditional Unpin from PollFn was desirable. ↩︎
And Tokio's was for a significant period of time as well; the soundness issue was only discovered/understood shortly before std's version stabilized. ↩︎

afetisov · December 1, 2022, 10:28pm

That doesn't really answer whether the current implementation suppresses noalias for either type, or if it's all still in the discussion phase.

FFS, it's in the documentation now? No one other than tokio's macros used that hack, and now you're encouraging people to do more of it! Just to let them get away with not fixing their macros!

Aiden2207 · December 1, 2022, 10:34pm

The Range* types have no special behavior beyond the fact the compiler knows how to desugar a..=b into RangeInclusive::new(a, b). A little magic, but every other aspect of them can be recreated outside of lang items.

That's really interesting, I did not know that was a thing. For those wondering how it works,

#[phantom]
struct Foo<T>;

is desugared roughly into this:

enum Foo<T>{
    __DeadVariant([T; 0], !),
    Foo,
}
pub use Foo::Foo;

Yep, accidentally pasted down the wrong link. This is the right one. Essentially, the ptr::metadata internals assume every pointer is laid out like this:

#[repr(C)]
struct Pointer<T>{
    ptr: *const (),
    data: <T as Pointee>::Metadata,
}

Which is an unsound assumption for anybody who doesn't make the compiler.

Not just you, I originally thought it was an otherwise ordinary union with some nightly applied. Obviously, that's not the case. Also, you can soundly leak objects destructors without needing to resort to ManuallyDrop or mem::forget, it just takes a little more effort.

It's a #[repr(transparent)] union, and what exactly that means is still up in the air, but the compiler only knows about it for desugaring generators and/or futures.

There's nothing special about Pin, it's just a #[repr(transparent)] wrapper around a pointer. It looks like the soundness issue mentioned in that issue is currently mitigated by attaching special behavior to the Unpin trait, rather than attaching anything to Pin itself.

afetisov · December 1, 2022, 11:02pm

This is a weird example of std abusing unsoundness and getting away with it, when those are pretty bog standard private implementation guarantees. No more magic than casting a pointer to struct into a pointer to private field using the offset computable within its visibility region.

How would you do that without heap allocations?

CAD97 · December 1, 2022, 11:12pm

Yes; [link]

If the returned future is pinned, then the captured environment of the wrapped function is also pinned in-place, so as long as the closure does not move out of its captures it can soundly create pinned references to them.

I don't need to rehash the discussion here as the important parts are all in the GitHub PR or linked from there (and you were even a vocal part of it).

The choice was between either leaving and documenting the giant footgun^[1] of pinning owned values being unsound, or making doing that valid and documenting it as such. The latter was eventually chosen because it is both a safer default^[2] and strictly more useful.

It is not encouraged and is still unsafe to do (thus implicitly recommending using a safe alternative instead); it's merely documented as a sound thing to do.

rustc currently emits LLVM-noalias for &T where T: Freeze and for &mut T where T: Unpin.

I know you disagree with how relevant it is, and we absolutely don't need to reargue it. I'm just providing context for those who weren't part of that discussion. The most relevant counterargument is the two part observation that a) if tokio, arguably the team with both the best understanding of pinning and the most well-reviewed futures code made the mistake, others likely both did and will; and b) tokio being the only one to have publicly made the mistake is likely more due to them being the primary user of defining ad-hoc futures than the use case of ad-hoc 'static futures capturing and using !Unpin components (the safe alternative, which is Unpin anyway, requiring borrowing from the parent stack). ↩︎
The Rust project doesn't just care about ensuring safe code is sound, it also cares about ensuring that writing unsafe code is not more difficult to make sound than inherent to the domain. If std's poll_fn weren't to have forward the pinning guarantee, the result wouldn't be pinning to closure stacks being unsound in general (closures do inherit Unpin), but instead the soundness of code using poll_fn depending on whether std's or Tokio's version of the function is in scope. This would've been a very unfortunate outcome, especially since std was very explicitly uplifting the API proven useful in Tokio. ↩︎

afetisov · December 2, 2022, 12:51am

tokio is not the only runtime in existence, and not the only implementer of poll_fn. I found 16 independent(-ish) implementations in the up-to-date crates. Why does tokio get special treatment, just because it's popular?

Makes me so angry. Even more angry because my data showed that the decision wouldn't matter the least in the real ecosystem, but damn. That decision process drives me nuts.

Whatever. Certainly not worth restarting it in this thread.

I think you meant T: !Unpin.

CAD97 · December 2, 2022, 1:03am

No, I did mean T: Unpin. Note the polarity: I'm listing when LLVM-noalias is included.

Aiden2207 · December 2, 2022, 1:08am

It's pretty simple, just transmute from whatever type that needs drop into a byte array of an appropriate size. Of course, doing this soundly and in a generic context is a bit of an adventure, but it's definitely doable.

system · March 2, 2023, 1:08am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Pre-1.0 Type Safety Hole: explanation and resolution help	11	880	January 12, 2023
Unsafe rust: laying out structs dynamically	5	539	January 12, 2023
Different forms of "code reuse": polymorphism & macros	3	1166	April 19, 2022
Mem::transmute implementation source help	7	605	September 5, 2019
List of crates that improves or experiments with Rust, but may be hard to find help	29	12655	July 3, 2022

Compiler Magic Types

Related Topics