Drop as soon as possible

Context

I am the author of sosecrets-rs and right now, I am writing RTSecret<T, MEC: typenum::Unsigned>, which represents a secret that would return an Err or panics only if the value of its internal counter (a counter that measures the number of time a secret is exposed, I name it as ec) exceeds the runtime unsigned integer value represented by the type parameter MEC.

Here is the current definition of RTSecret<T, MEC: typenum::Unsigned>:

pub struct RTSecret<
    #[cfg(feature = "zeroize")] T: Zeroize,
    #[cfg(not(feature = "zeroize"))] T,
    MEC: ChooseMinimallyRepresentableUInt + Unsigned,
>(T, Cell<<MEC as ChooseMinimallyRepresentableUInt>::Output>);

<MEC as ChooseMinimallyRepresentableUInt>::Output basically resolves to either u8, u16, u32 or u64 depending on what the user of this library chooses to be MEC, e.g. if he chooses MEC to be typenum::U69, then <MEC as ChooseMinimallyRepresentableUInt>::Output will resolve to u8, since '69' is minimally representable by u8.

Right now, there is a try_expose_secret which takes in a closure, it returns a Result<ReturnType, ExposeSecretError<MEC>> like so:

#[inline(always)]
    fn try_expose_secret<ReturnType, ClosureType>(
        &self,
        scope: ClosureType,
    ) -> Result<ReturnType, error::ExposeSecretError<MEC>>
    where
        MEC: IsGreater<Unchecked, Output = True>,
        for<'brand> ClosureType: FnOnce(RTExposedSecret<'brand, &'brand T>) -> ReturnType,
    {
        let ec_mut = self.1.get();
        let mec = MEC::cast_unsigned_to_self_type::<MEC>(__private::SealedToken {});
        if ec_mut >= mec {
            // NOTE HERE, I HAVE A PROBLEM HERE ###################
            return Err(error::ExposeSecretError::ExposeMoreThanMaximallyAllow(
                error::ExposeMoreThanMaximallyAllowError { mec, ec: ec_mut },
            ));
        };
        self.1.set(ec_mut + MEC::ONE);
        Ok(scope(RTExposedSecret(&self.0, PhantomData)))
    }

Note the comment '// NOTE HERE, I HAVE A PROBLEM HERE ###################' within the code snippet.

Problem

Suppose a secret has been maximally exposed but is not dropped immediately because it is still not out of scope, and if the program is a long-running process, that would mean that the secret will be held in memory for a long time. Hence, I am thinking of wrapping T with ManuallyDrop<Cell<T>> so that I can 1. get a &mut T from Cell<T> and 2. drop T only when the exposure count is greater than the maximal exposure count.

However, there is one important problem/blocker.

 if ec_mut >= mec {
        // NOTE HERE, I HAVE A PROBLEM HERE ###################
        return Err(error::ExposeSecretError::ExposeMoreThanMaximallyAllow(
            error::ExposeMoreThanMaximallyAllowError { mec, ec: ec_mut },
        ));
    };

Suppose a user of my library, continues to call try_expose_secret despite getting an Err for the first call, I would be double-freeing if I keep dropping T through ManuallyDrop<Cell<T>>.

Maybe one workaround will be to add one to the exposure count after it exceeds the maximal exposure count, but what if the caller sets the maximal exposure count to u*::MAX, then there will be overflow, and that must not happen - because in release mode, it will render the Secret type completely useless as the count is reset to zero.

Of course, I can completely ignore this 'problem' and let Secret get dropped only when its scope ends, however long that may take.

I may be missing something, but why can't you just store the secret as Option<T>, and set it to None when you think it must be dropped?

This may be a silly question from one not familiar with the pitfalls of writing security critical code but:

Let's assume the secret has been maximally exposed and is dropped immediately? So far so good, the secret is no longer available to the Rust program that is juggling such secrets. But who is to say it does not still exist, as a pattern or bits somewhere in memory, freed by the memory allocator, given back to the operating system, who knows what happens to it?

Or do you intend to scribble over the bits that hold that secret before giving them back to God knows who?

Good day to you, thank you for your response.

I did not want to use Option for two reasons:

  1. I cannot correlate ec, the runtime counter, to the existence of Some(T), i.e. there is no way for me to guarantee that if ec, the counter, is larger than the MEC, then statically, the .0 field will definitely be None. There is always a runtime associated destructuring involved.

  2. Might complicate the API I guess, since I always have to return Result<ReturnType, ExposeSecretError> by mapping from an Option<T> using .ok_or.

What's your take on this?

Good day to you, thank you for your response.

  1. I plan to add support for .mlock by adding dependencies to some of the Rust crates with such APIs. So that the bytes belonging to the secret can be prevented from being swap to disk.

  2. I have an optional dependency on zeroize crate which promises to zero out the bytes when T is dropped. I don't doubt them, so I guess, the user is free to choose if they are merely okay with returning the address space to the OS and live with it, or they require something more stringent then they can pass T: zeroize::Zeroize to Secret<T, MEC>.

Please let me know of your thoughts.

I'm not sure what's exactly your issue. Yes, you'd have to do some runtime checks, so what? You'd have to do that anyway, and your solution using ManuallyDrop also requires some kind of runtime checks to guard against double-free. The point of using Option<T> is that your checks will be very simple and error-proof. Even if you mess up your logic, you won't violate memory safety, since you can't unwrap a None value.

Option<T> is your internal implementation detail. You don't have to expose it in any way in the API if you don't want to.

Drop on its own doesn't provide any security guarantees. In particular, the data still stays in memory, and can be accessed by malicious code. One needs to explicitly zeroize the underlying buffer before freeing it, and one needs to do it in a special secure way, which resists compiler optimizations and any other possible low-level misbehaviour. In some cases, e.g. if the data is allocated on the heap of a compacting garbage collector, there is no way at all to securely erase the data, since the GC can copy it all over the place. On native heap, data doesn't just move around on its own, so clearing it is more feasible. The lower levels of the system can still screw you, though. The compiler could prefetch the data into some slot on the stack, and you will be unable to access it. Data can still stay in processor registers, nothing you can do about it (but it's also usually a non-issue). The OS could implement copy-on-write memory semantics, making it literally impossible to wipe memory. Same copy-on-write issue can be caused by hardware. RAM typically doesn't have that kind of behaviour, but flash memory, including SSD, often does.

Overall, clearing secrets is always a best-effort practice, not a hard guarantee.

2 Likes

Indeed. Sometimes I miss the good old days when compilers deterministically did what you told them, no sneaky code rearrangement/removal. Operating systems deterministically did what you told them, no CoWs etc. Processors deterministically did what you told them, no speculative execution etc.

Looks like if we want secure systems we have to rebuild everything from the silicon up.

Hi @afetisov

Thank you for your prompt response. I think in this regard, perhaps I should just leave my APIs as they are, i.e. without using Option<T> and allow RTSecret<T, MEC> to be dropped 'naturally' at the end of the scope, however long that might be from its last use where the counter was still lesser than the integer representation of the type parameter MEC.

Since dropping 'earlier' doesn't do much to enhance RTSecret<T, MEC>'s security beyond what zeroize::Zeroize trait does.

Let me know of your thoughts.