Hi,
I'm somewhat new to Rust and have a question about timing and performance vs C. I'm trying to benchmark some very simple code that does a lookup in an array and then applies the bitwise AND operator. Code is below. I've re-implemented the same benchmark function in C, and I seem to get about a factor of 4 difference in performance (elapsed time using a large iteration count such as 1E9).
I have also tried the comparison using just an array lookup, without the bitwise AND (see commented-out line), which results in much more similar performance (although C still slightly faster). This seems to indicate that the bitwise AND is performing significantly differently in Rust than C. I'm sort of scratching my head about this, as it's surprising that the Rust and C versions wouldn't compile to nearly identical code. However, I certainly could be missing something. Any insights would be appreciated.
Details:
- OS: Linux
- Rust 1.43.1 (running with
--release
) - gcc 10.1.0 (compiled with
-O2
)
Context: As a way to learn Rust, I wrote a chess engine following a guide for C. Overall, the program is around 30% slower than the C version, and I'm now going back through and trying to find parts that can be optimized. Based on profiling, a significant portion of the run time is spent in the "evaluation" function, which calculates a score for a given board position (during search, this function is called many times). A big part of the evaluation function involves using bitmasks to check for things like isolated pawns. So the example code below is representative of some of what actually goes on in the chess engine, and I suspect that any Rust-vs-C performance differences in this simple example are also manifesting in the performance of the overall chess engine.
Thanks,
John
pub fn benchmark_bitmask(iterations: u64) {
let zero: [u64; 64] = [0; 64];
let val: u64 = 123456;
let mut y = 0;
let start_time = Instant::now();
for i in 0..iterations {
if (zero[(y % 64) as usize] & val) == 0 {
// if zero[(y % 64)] == 0 {
y += 1;
}
}
let duration = start_time.elapsed();
println!("y: {}", y);
println!("benchmark: {} iterations in {}ms ({:.7}us per iter)", iterations, duration.as_millis(), (duration.as_secs_f32() / (iterations as f32)) * 1e6);
}
And the C version:
typedef unsigned long long U64;
void benchmark_bitmask(U64 iterations) {
U64 zero[64];
for (int i=0; i<64; i++)
zero[i] = 0;
U64 val = 123456;
struct timeval tval_before, tval_after, tval_result;
gettimeofday(&tval_before, NULL);
long y = 0;
for (U64 i=0; i<iterations; i++) {
if ((zero[y % 64] & val) == 0) {
/* if (zero[y % 64] == 0) { */
y++;
}
}
printf("y: %d\n", y);
gettimeofday(&tval_after, NULL);
timersub(&tval_after, &tval_before, &tval_result);
double elapsed = tval_result.tv_sec + tval_result.tv_usec / 1e6;
printf("benchmark: %ld iterations in %lf s (%lf us per iter)\n", iterations, elapsed, elapsed/( (double) iterations) * 1e6);
}