I'm lost. The C code example of chacha20 has arrays of sixteen uint32_t as input and output. Salsa20 - Wikipedia What do I do with those test test vectors that have 256 bit keys ( only 8 u32s) and 64 bit nonce (only 2 u32s)? And where does that "expand 32-byte k" go? Which is 16 bytes or 4 u32?
That wikipedia C code does indeed produce an array of zeros for an array of zero input.
The hind thing converted that C code to Rust with a couple of mistakes. It did not care about overflow on addition and some how it forgot about the ++2
in the for loop and did to many rounds. With that fixed I get the same results as the wikipedia C version:
macro_rules! ROTL {
($a:expr, $b:expr) => {
(($a) << ($b)) | (($a) >> (32 - ($b)))
};
}
macro_rules! QR {
($a:expr, $b:expr, $c:expr, $d:expr) => {
$a = $a.wrapping_add($b);
$d ^= $a;
$d = ROTL!($d, 16);
$c = $c.wrapping_add($d);
$b ^= $c;
$b = ROTL!($b, 12);
$a = $a.wrapping_add($b);
$d ^= $a;
$d = ROTL!($d, 8);
$c = $c.wrapping_add($d);
$b ^= $c;
$b = ROTL!($b, 7);
};
}
const ROUNDS: usize = 20;
#[inline(never)]
pub fn chacha_block(output: &mut [u32], input: &[u32; 16]) {
let mut x = *input; // Copy the input array to x
// 10 loops × 2 rounds/loop = 20 rounds
for _ in 0..ROUNDS / 2 {
// Odd round
QR!(x[0], x[4], x[8], x[12]); // column 0
QR!(x[1], x[5], x[9], x[13]); // column 1
QR!(x[2], x[6], x[10], x[14]); // column 2
QR!(x[3], x[7], x[11], x[15]); // column 3
// Even round
QR!(x[0], x[5], x[10], x[15]); // diagonal 1 (main diagonal)
QR!(x[1], x[6], x[11], x[12]); // diagonal 2
QR!(x[2], x[7], x[8], x[13]); // diagonal 3
QR!(x[3], x[4], x[9], x[14]); // diagonal 4
}
for i in 0..16 {
output[i] = x[i].wrapping_add(input[i]);
}
}
pub fn test() {
let input = [0u32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1];
let mut output = [0u32; 16];
chacha_block(&mut output[0..16], &input);
println!("Result: {:x?}", output);
}