Consider the following code (adapted from the secstr crate):
pub unsafe fn timing_attack_proof_cmp(us: *const u8, us_len: usize, them: *const u8, them_len: usize) -> Ordering {
let mut result = Ordering::Equal;
for i in 0..(us_len.min(them_len)) {
let us_val = std::ptr::read_volatile(us.add(i));
let them_val = std::ptr::read_volatile(them.add(i));
if result == Ordering::Equal {
result = us_val.cmp(&them_val);
} else {
// Once this line is hit, the compiler may figure out that it is allowed to break the loop.
// How do I tell it not to?
}
}
if result == Ordering::Equal {
us_len.cmp(&them_len)
} else {
result
}
}
I am specifically concerned with the compiler realising that it can abort the loop early, once it found a differing byte.
I assume that the volatile reads help nothing with keeping the loop running, but are only to avoid caching effects?
Is there a way to tell the Rust compiler to not abort the loop once result changes its value from Ordering::Equal to something else?
I'm afraid Rust has no mechanism against timing attacks. The problem isn't just that it may break out of the loop. Each loop iteration may take time that depends on the data.
The only way seems to be write the sensitive parts in assembly, or at least inspect the generated assembly code. Even assembly instructions may sometimes take different time depending on the data, so this is very platform specific.
If you want to ensure the loop always iterates through the entire array then how about something like:
pub fn timing_attack_proof_equ(us: &[u8], them: &[u8]) -> bool {
if us.len() != them.len() {
return false;
}
let mut result: u8 = 0;
for i in 0..us.len() {
result |= us[i] ^ them[i];
}
result == 0
}
That gets rid of the if and any problems balancing the timing in each branch.
I'm not sure if that helps with hiding the timing though. As noted above the compiler is free to do all kind of weird things and God knows what our processors do under the hood with all their caches, pipelines, parallel dispatch, branch predictors and so on.
I'd want to:
a) Be sure I'm using a processor that even has predictable timing for anything.
b) Check the generated assembly code for any odd behaviour.
c) Have tests in place that verifies the timing is invariant over a pile of random arrays that are and are not equal.
By which time one may as well be writing the thing in assembler anyway.
Thanks for your ideas everyone! I am myself not affiliated with the secstr crate, so I do not know what the rationale behind calling their comparison function "timing attack proof" is (they actually call it constant time, but I assume this means the same).
But blindly following their example, I now came up with the following. The idea is to iterate over the arrays in reverse and update the result on the way. Since the result may be updated by the element at index zero no matter what happens with the other elements, the loop cannot be aborted early.
pub unsafe fn timing_attack_proof_cmp(us: *const u8, us_len: usize, them: *const u8, them_len: usize) -> Ordering {
let mut result = Ordering::Equal;
for i in (0..(us_len.min(them_len))).rev() {
let us_val = std::ptr::read_volatile(us.add(i));
let them_val = std::ptr::read_volatile(them.add(i));
let new_result = us_val.cmp(&them_val);
if new_result != Ordering::Equal {
result = new_result;
}
}
if result == Ordering::Equal {
us_len.cmp(&them_len)
} else {
result
}
}
Of course, the compiler could rewrite the whole thing back to what I had in the OP, but I'd just assume that reversing loops is not usually done by an optimising compiler.
Another idea may be to use volatile reads which are not allowed to be optimized away in theory. I don't have proof that this works; you'd have to check the generated assembly.
That seems weird given that the docs for black_box explicitly say "it must not be relied upon to control critical program behavior. This immediately precludes any direct use of this function for cryptographic or security purposes." (emphasis as in the original).
It seems the constant_time_eq crate uses inline assembly for the more popular CPU architectures and only uses black_box as a fallback for the others.