Is a reference to impossible value considered UB?

I've been looking around through some code and stumbled on this:

/// Hint to the optimizer that any code path which calls this function is
/// statically unreachable and can be removed.
///
/// Equivalent to `std::hint::unreachable_unchecked` but works in older versions of Rust.
#[inline]
pub unsafe fn unreachable() -> ! {
    enum Void {}
    let x: &Void = mem::transmute(1usize);
    match *x {}
}

I always thought that the very fact of creating a reference (and not raw ptr) to some impossible value like this enum Void is UB on itself. And how is that supposed to work anyway?

Yes, that is UB, if the transmute executes. This is basically relying on the fact that the optimizer knows that it is UB, so it is able to optimize based on this code never being reached. The disadvantage this has compared to unreachable_unchecked is that it has to just rely on the optimizer being able to work this out, whereas because std is more tightly coupled with the compiler it is likely using something more reliable at passing this information to the optimizer, and can change as needed with each version.

(One question is, why does it use 1 instead of 0? Both are UB, but 0 involves different UB in that it's creating a null reference, so was this empirically proven to result in better optimization?)

2 Likes

Actually, this is not my code so I can't explain what's author is counting for. I assume this is about creating as little UB as possible - now there's 1 UB (the ref itself) but if this ref is set to 0 it would be the second UB. Sounds stupid, still.

In fact, unreachable_unchecked directly calls the intrinsic. I can't check it now, but it seems that either the Rust frontend compiler always oprimises it away, or it is transpiled into the corresponding LLVM intrinsic.

This simple snippet:

fn main() {
    match Some(1) {
        Some(x) => println!("{}", x),
        None => unsafe { std::hint::unreachable_unchecked() }
    }
}

Debug build:

$ rustc main.rs --emit=llvm-ir

Relevant llvm-IR code

  switch i64 %5, label %bb2 [
    i64 0, label %bb1
    i64 1, label %bb3
  ]

bb1:                                              ; preds = %start
; call core::hint::unreachable_unchecked
  call void @_ZN4core4hint21unreachable_unchecked17h54c9e342b94a1675E()
  unreachable

bb2:                                              ; preds = %start
  unreachable

; core::hint::unreachable_unchecked
; Function Attrs: inlinehint noreturn nonlazybind uwtable
define internal void @_ZN4core4hint21unreachable_unchecked17h54c9e342b94a1675E() unnamed_addr #2 {
start:
  unreachable
}

It uses llvm unreachable instruction.

In release build:

$ rustc main.rs --emit=llvm-ir -C opt-level=2

It's already optimized away:


; main::main
; Function Attrs: nonlazybind uwtable
define internal void @_ZN4main4main17h6b07ac6c6f06cdd6E() unnamed_addr #0 {
start:
  %_12 = alloca [1 x { i8*, i8* }], align 8
  %_5 = alloca %"core::fmt::Arguments", align 8
  %x = alloca i32, align 4
  %0 = bitcast i32* %x to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
  store i32 1, i32* %x, align 4
  %1 = bitcast %"core::fmt::Arguments"* %_5 to i8*
  call void @llvm.lifetime.start.p0i8(i64 48, i8* nonnull %1)
  %2 = bitcast [1 x { i8*, i8* }]* %_12 to i8*
  call void @llvm.lifetime.start.p0i8(i64 16, i8* nonnull %2)
  %3 = bitcast [1 x { i8*, i8* }]* %_12 to i32**
  store i32* %x, i32** %3, align 8
  %4 = getelementptr inbounds [1 x { i8*, i8* }], [1 x { i8*, i8* }]* %_12, i64 0, i64 0, i32 1
  store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17hd5184f5375b0c2fdE" to i8*), i8** %4, align 8
  %5 = bitcast %"core::fmt::Arguments"* %_5 to [0 x { [0 x i8]*, i64 }]**
  store [0 x { [0 x i8]*, i64 }]* bitcast (<{ i8*, [8 x i8], i8*, [8 x i8] }>* @2 to [0 x { [0 x i8]*, i64 }]*), [0 x { [0 x i8]*, i64 }]** %5, align 8, !alias.scope !6, !noalias !9
  %6 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_5, i64 0, i32 1, i32 1
  store i64 2, i64* %6, align 8, !alias.scope !6, !noalias !9
  %7 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_5, i64 0, i32 3, i32 0
  store i64* null, i64** %7, align 8, !alias.scope !6, !noalias !9
  %8 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_5, i64 0, i32 5, i32 0
  %9 = bitcast [0 x { i8*, i8* }]** %8 to [1 x { i8*, i8* }]**
  store [1 x { i8*, i8* }]* %_12, [1 x { i8*, i8* }]** %9, align 8, !alias.scope !6, !noalias !9
  %10 = getelementptr inbounds %"core::fmt::Arguments", %"core::fmt::Arguments"* %_5, i64 0, i32 5, i32 1
  store i64 1, i64* %10, align 8, !alias.scope !6, !noalias !9
; call std::io::stdio::_print
  call void @_ZN3std2io5stdio6_print17h717dfda30ab823acE(%"core::fmt::Arguments"* noalias nocapture nonnull dereferenceable(48) %_5)
  call void @llvm.lifetime.end.p0i8(i64 48, i8* nonnull %1)
  call void @llvm.lifetime.end.p0i8(i64 16, i8* nonnull %2)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0)
  ret void
}
unsafe fn unreachable() -> ! {
    enum Void {}
    let x: &Void = ::core::mem::transmute(1usize);
    match *x {}
}

unsafe fn unreachable_null() -> ! {
    enum Void {}
    let x: &Void = ::core::mem::transmute(0_usize);
    match *x {}
}

unsafe fn unreachable_ptr() -> ! {
    enum Void {}
    let x: *const Void = 1usize as _;
    match *x {}
}

unsafe fn unreachable_forged() -> ! {
    #[derive(Clone, Copy)]
    enum Void {}
    union MaybeUninit {
        uninit: (),
        void: Void,
    }
    match (MaybeUninit { uninit: () }.void) {}
}

currently yields in release mode:

unreachable:
	ud2
.set unreachable_null, unreachable
.set unreachable_ptr, unreachable
.set unreachable_forged, unreachable

and in debug mode:

unreachable_null:
	pushq	%rax
	movq	$0, (%rsp)
	ud2

unreachable:
	pushq	%rax
	movq	$1, (%rsp)
	ud2

unreachable_ptr:
	pushq	%rax
	movq	$1, (%rsp)
	ud2

unreachable_forged:
	subq	$16, %rsp
	ud2

From all of the options above, I prefer the unreachable_forged, since it only triggers one form of UB (so it should be less "unstable" than the other ones), and the one I dislike most is unreachable_null, since it triggers three forms of UB (null reference, reference to inhabited, instance of inhabited)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.