How to write a timing-attack-proof comparison function (`Ord::cmp`, lexicographic) for byte arrays?

Consider the following code (adapted from the secstr crate):

pub unsafe fn timing_attack_proof_cmp(us: *const u8, us_len: usize, them: *const u8, them_len: usize) -> Ordering {
    let mut result = Ordering::Equal;

    for i in 0..(us_len.min(them_len)) {
        let us_val = std::ptr::read_volatile(us.add(i));
        let them_val = std::ptr::read_volatile(them.add(i));

        if result == Ordering::Equal {
            result = us_val.cmp(&them_val);
        } else {
            // Once this line is hit, the compiler may figure out that it is allowed to break the loop.
            // How do I tell it not to?
        }
    }

    if result == Ordering::Equal {
        us_len.cmp(&them_len)
    } else {
        result
    }
}

I am specifically concerned with the compiler realising that it can abort the loop early, once it found a differing byte.

I assume that the volatile reads help nothing with keeping the loop running, but are only to avoid caching effects?

Is there a way to tell the Rust compiler to not abort the loop once result changes its value from Ordering::Equal to something else?

I'm afraid Rust has no mechanism against timing attacks. The problem isn't just that it may break out of the loop. Each loop iteration may take time that depends on the data.

The only way seems to be write the sensitive parts in assembly, or at least inspect the generated assembly code. Even assembly instructions may sometimes take different time depending on the data, so this is very platform specific.

4 Likes

If you want to ensure the loop always iterates through the entire array then how about something like:

pub fn timing_attack_proof_equ(us: &[u8], them: &[u8]) -> bool {
    if us.len() != them.len() {
        return false;
    }

    let mut result: u8 = 0;

    for i in 0..us.len() {
        result |= us[i] ^ them[i];
    }

    result == 0
}

That gets rid of the if and any problems balancing the timing in each branch.

I'm not sure if that helps with hiding the timing though. As noted above the compiler is free to do all kind of weird things and God knows what our processors do under the hood with all their caches, pipelines, parallel dispatch, branch predictors and so on.

I'd want to:
a) Be sure I'm using a processor that even has predictable timing for anything.
b) Check the generated assembly code for any odd behaviour.
c) Have tests in place that verifies the timing is invariant over a pile of random arrays that are and are not equal.

By which time one may as well be writing the thing in assembler anyway.

Ah, looking at the inspiration for the OP secstr/lib.rs at trunk - secstr - Codeberg.org I see it uses the same XOR trick. Along with a bunch of checks that are probably sensible.

BTW this code only checks if the two slices have even number of different elements.

assert_eq!(timing_attack_proof_equ(&[1, 2], &[3, 4]), true);

Damn! Quite so. I had a go at fixing it above.

Thanks for your ideas everyone! I am myself not affiliated with the secstr crate, so I do not know what the rationale behind calling their comparison function "timing attack proof" is (they actually call it constant time, but I assume this means the same).

But blindly following their example, I now came up with the following. The idea is to iterate over the arrays in reverse and update the result on the way. Since the result may be updated by the element at index zero no matter what happens with the other elements, the loop cannot be aborted early.

pub unsafe fn timing_attack_proof_cmp(us: *const u8, us_len: usize, them: *const u8, them_len: usize) -> Ordering {
    let mut result = Ordering::Equal;

    for i in (0..(us_len.min(them_len))).rev() {
        let us_val = std::ptr::read_volatile(us.add(i));
        let them_val = std::ptr::read_volatile(them.add(i));

        let new_result = us_val.cmp(&them_val);
        if new_result != Ordering::Equal {
            result = new_result;
        }
    }

    if result == Ordering::Equal {
        us_len.cmp(&them_len)
    } else {
        result
    }
}

Of course, the compiler could rewrite the whole thing back to what I had in the OP, but I'd just assume that reversing loops is not usually done by an optimising compiler.

It's not clear to me why one expect the loop to not terminate early when going backwards.

It's also not clear why you would do all that complication when you have a simple example already.

What is the end goal in rewriting what there is in secstr?

You are probably talking about comparing for equality? The code I posted here compares lexicographically, like the cmp method on e.g. [u8].

There's a crate for this, btw. It uses XOR and black_box:

2 Likes

Another idea may be to use volatile reads which are not allowed to be optimized away in theory. I don't have proof that this works; you'd have to check the generated assembly.

That seems weird given that the docs for black_box explicitly say "it must not be relied upon to control critical program behavior. This immediately precludes any direct use of this function for cryptographic or security purposes." (emphasis as in the original).

It seems the constant_time_eq crate uses inline assembly for the more popular CPU architectures and only uses black_box as a fallback for the others.

1 Like