Unsafe review: secretmangle::MangledBoxArbitrary

The crate part I'm presenting is secretmangle/src/arbitrary.rs at master · ProgramCrafter/secretmangle · GitHub.

/// Utility for masking a structure in program's heap with a random key,
/// supporting an arbitrary content type.
///
/// This version is written using assembly (not even common Unsafe Rust).
/// If your data is [`bytemuck::NoUninit`] (that is, Copy and has no padding), you can
/// also use [`crate::MangledBox`].
///
/// It is recommended to use [`std::clone::CloneToUninit`] to initialize
/// the contents of the box rather than constructing it on stack, since the
/// latter option might leave some trace of value being masked.
pub struct MangledBoxArbitrary<T> {
    /// Heap allocation with bytes mangled by XORing with `key`.
    data: Box<MaybeUninit<T>>,

    /// T-sized buffer containing a cryptographically secure random key.
    key: MaybeUninit<T>,
}

impl<T> MangledBoxArbitrary<T> {
    // ...
    
    /// Unmangles the contents and invokes the provided closure on it.
    /// Whether the closure panics or returns normally, the contents
    /// are remangled.
    pub fn with_unmangled<F, R>(&mut self, f: F) -> R
    where
        F: FnOnce(NonNull<T>) -> R,
    {
        let data_ptr = Box::as_mut_ptr(&mut self.data).cast::<u8>();
        let key_ptr = self.key.as_ptr().cast::<u8>();
        let data_nn: NonNull<u8> = NonNull::new(data_ptr).unwrap();

        // # Safety
        // 1. Both pointers point to some `MaybeUninit<T>`, so aligned
        // 2. [`data_ptr`] and [`key_ptr`] point to an allocation of at least
        //    `size_of::<T>()` bytes because they are obtained from references
        //    to `MaybeUninit<T>`.
        // 3. [`data_ptr`] points to heap allocation and [`key_ptr`] to
        //    stack, therefore they do not overlap.
        unsafe {
            xor_chunks::<T>(data_ptr, key_ptr);
        }
        
        // ... another safety comment here ...
        let _guard = RemangleGuard::<T> {
            data: data_ptr,
            key: key_ptr,
            token: PhantomData,
        };

        f(data_nn.cast())
    }
}

xor_chunks::<T> is then calling xor_intrinsic::xor_chunks_intrinsic_baseline::<T>(data, key) which is written in assembly code and mangles the T-sized buffer behind first pointer using the key behind second one.

(To be continued)

T is explicitly permitted to have padding uninit bits. The execution never cares (to be precise, never branches nor holds them in a Rust AM-visible way) about them.

The key is stored in a buffer which is guaranteed to have same size, alignment, other layout properties (e.g. which bytes are padding) as the data, and it just turns out that the ideal type for doing so is MaybeUninit<T>!

I do not remember if MaybeUninit<T> guarantees that bytes marked as T's padding (or their initialization status) are preserved on moves, but in this crate it does not matter at all. Once the data is unmangled, the caller-provided closure will read instance of T, and padding doesn't influence it by definition.

(To be continued, providing an equivalent model in safe Rust, which shall shed light on whether my application of inline asm is sound.)

This model has the same shape as above, while being in perfectly safe Rust. Like in the real code, two calls to xor_chunks(data, key) taken together are no-op.

// Instantiating for, say, T = u64.

static SHELF: Mutex<HashMap<*mut u64, MaybeUninit<u64>>>
    = Mutex::new(HashMap::new());

fn xor_chunks(data_ref: &mut MaybeUninit<u64>, _key: &MaybeUninit<u64>) {
    let guard = SHELF.lock().unwrap();
    let sub_ref = guard
        .entry(data_ref.as_mut_ptr())
        .or_insert_with(|| MaybeUninit::zeroed());
    mem::swap(sub_ref, data_ref);
    atomic::fence(atomic::Ordering::SeqCst);
}

On this basis, I'd argue that the actual XORing is sound: after odd number of xor_chunks no code is accessing contents of the MaybeUninit-s, while after even number of them all and any correspondence of Rust AM to the hardware representation is resumed.

I'd also argue that pointer provenance is preserved too; from perspective of any high level, two invocations of inline assembly could very well be no-op, while on the low level all bits are preserved intact.

That concludes my speech. So, is that plausible?
Uh, and if there are bugs in secretmangle, I would like to hear about them!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.