Unsafe unions: bit pattern validity?

I came across the following code and was wondering if it was really sound (obviously this breaks tokio's type encapsulation, but this is for a debugging/tracepoint library, that part is OK here: USDT tracepoints only take integer types, that is part of the kernel ABI for this feature.):

    #[inline]
    fn id_to_u64(id: tokio::task::Id) -> u64 {
        unsafe {
            // SAFETY: Based on training and experience, I know that a
            // `tokio::task::Id` is represented as a single `NonZeroU64`.
            union TrustMeOnThis {
                id: tokio::task::Id,
                int: NonZeroU64,
            }
            const {
                assert!(size_of::<tokio::task::Id>() == size_of::<NonZeroU64>());
                assert!(align_of::<tokio::task::Id>() == align_of::<NonZeroU64>());
            };
            TrustMeOnThis { id }.int.get()
        }
    }

While the statement is currently true, what happens if Id changes to a u64 instead of a NonZeroU64? That would mean you could create a NonZeroU64 with an invalid bit pattern.

However, I think the code would be sound if it was changed to:

    #[inline]
    fn id_to_u64(id: tokio::task::Id) -> u64 {
        unsafe {
            // SAFETY: Based on training and experience, I know that a
            // `tokio::task::Id` is represented as a single `NonZeroU64`.
            union TrustMeOnThis {
                id: tokio::task::Id,
                int: u64,
            }
            const {
                assert!(size_of::<tokio::task::Id>() == size_of::<u64>());
                assert!(align_of::<tokio::task::Id>() == align_of::<u64>());
            };
            TrustMeOnThis { id }.int
        }
    }

My reasoning is that it should be perfectly fine since all bit patterns of a u64 is valid, so regardless of what the Id is, it should be sound.

It's extremely unlikely, but there's always the possibility that tokio::task::Id becomes some kind of struct with padding bytes which happens to have the same size and alignment as a u64. In that case, this would be unsound b/c the padding might be uninitialized which isn't valid to be read as a u64.

If you used and returned MaybeUninit<u64> instead, then this code would be sound but you'd just be kicking the potential UB downstream.

1 Like

That is a fair point I did not consider. Is there any way to detect that with a compile time check?

The other option would be LLVM's freeze, which isn't exposed in Rust. But it would be possible to use inline assembly to freeze the value (sucks for platform compatibility though, having to write several different implementations).

I know that any inline assembly block needs to be motivated with a corresponding Rust abstract machine (AM) operation. However, as this is a debugging tool what I actually want is read whatever the value in memory is on the concrete machine (CM).

A bit offtopic, but I wonder why this isn't just a std::mem::transmute

2 Likes

That is a good point that I was actually asking myself a few minutes ago too (I didn't write this code). Wouldn't solve potential padding issue though.

Not that I know of, unless tokio implements some of the bytemuck traits for this (which I doubt).

Even ignoring the type encapsulation, you need #[repr(C)] on the union to ensure that the fields have the same address.

1 Like

I looked up documentation on unions and it appears that using union in this fashion (without wrapping Id in ManuallyDrop) will fail if tokio::task::Id becomes too complex. (Not that it can happen without tokio removing Copy implementation on Id first and probably bumping major version in process.)

transmute would not have a problem if Id was something like struct Id(Box<()>); (on a system with 64-bit pointers), union will. Of course you can just write where tokio::task::Id: Copy in function signature instead which is bigger guarantee and use transmute which will also save multiple lines:

  1. No #[repr(C)].
  2. No union definition itself.
  3. Checking size is not needed: transmute fails to compile if sizes do not match.
  4. Checking alignment is not needed: result of trasmute will be properly aligned.
1 Like

You can use custom Hasher like this since tokio::task::Id implements Hash. Theoretically it does not guarantee you any specific order of bytes fed to hasher, that hashing Id will actually call write_u64, or even that hashing any Id will not just always hash zero, but there will be no padding issue and implementation of Hash is unlikely to ever actually be anything that crazy. And no unsafe needed!

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.