I am developing a performance-sensitive data service using Rust. The goal is to perform multiple operations on a list of 200,000 elements within a few seconds, with a total computation volume reaching tens of billions of operations.
The process involves multiple linear steps, leaving little room for multithreading. As a result, I have conducted performance tests on almost every code block.
Currently, I need to make a change from the first case to the second case:
// Case 1
for element in data {
let result = sub_task(element);
}
// Case 2
for element in data {
let (result, state) = sub_task(element);
}
Instead of just returning a bool
, I now need to retrieve an additional state to pass to the next step.
Due to performance concerns, I tested the impact of returning a tuple and found that it significantly affects performance. However, when conducting a similar test in C++, the performance impact of returning a tuple is relatively smaller.
Rust Code:
use std::hint::black_box;
use std::time::Instant;
use rand::{Rng, thread_rng};
// Define a simple structure
#[derive(Debug, Copy, Clone)]
struct RandomStruct {
flag: bool,
value: u8,
}
// First function: returns a u8 value
fn return_u8(rng: &mut impl Rng) -> u8 {
let test: u8 = rng.gen();
black_box(test);
black_box(test>10);
test
}
// Second function: returns a tuple (bool, u8)
fn return_tuple(rng: &mut impl Rng) -> (bool, u8) {
let test: u8 = rng.gen();
black_box(test);
(test>10, test)
}
// Third function: returns a RandomStruct
fn return_struct(rng: &mut impl Rng) -> RandomStruct {
let test: u8 = rng.gen();
black_box(test);
RandomStruct {
flag: test>10,
value: test,
}
}
// Benchmark the performance of random number generation
fn benchmark_rng(iterations: usize, rng: &mut impl Rng) {
let start = Instant::now();
for _ in 0..iterations {
let value = black_box(rng.gen::<u8>());
black_box(value); // Prevent compiler optimizations
}
let duration = start.elapsed();
println!(
"Random number generation: {:?} (for {} iterations)",
duration, iterations
);
}
fn main() {
let iterations = 100_000_000; // Test 100 million iterations
let mut rng = thread_rng(); // Initialize a global random number generator once
// Benchmark: random number generation
benchmark_rng(iterations, &mut rng);
// Test: returning a u8 value
let start = Instant::now();
for _ in 0..iterations {
let value = black_box(return_u8(&mut rng));
black_box(value); // Prevent compiler optimizations
}
let duration_u8 = start.elapsed();
// Test: returning a tuple (bool, u8)
let start = Instant::now();
for _ in 0..iterations {
let value = black_box(return_tuple(&mut rng));
black_box(value); // Prevent compiler optimizations
}
let duration_tuple = start.elapsed();
// Test: returning a RandomStruct
let start = Instant::now();
for _ in 0..iterations {
let value = black_box(return_struct(&mut rng));
black_box(value); // Prevent compiler optimizations
}
let duration_struct = start.elapsed();
// Print results
println!(
"return_u8: {:?}, return_tuple: {:?}, return_struct: {:?}",
duration_u8, duration_tuple, duration_struct
);
}
C++ Code:
#include <iostream>
#include <random>
#include <tuple>
#include <chrono>
#include <cstdint>
// Define a structure
struct RandomStruct {
bool flag;
uint8_t value;
};
// Random number generator
std::mt19937 rng(std::random_device{}()); // Mersenne Twister engine
std::uniform_int_distribution<uint8_t> dist(0, 255); // Generate uint8_t random numbers
// First function: returns uint8_t
uint8_t return_u8() {
uint8_t test = dist(rng);
volatile auto value = test; // Prevent optimization
volatile auto flag = value > 10;
return value;
}
// Second function: returns (bool, uint8_t)
std::tuple<bool, uint8_t> return_tuple() {
uint8_t test = dist(rng);
volatile auto value = test; // Prevent optimization
return {value>10, value};
}
// Third function: returns RandomStruct
RandomStruct return_struct() {
uint8_t test = dist(rng);
volatile auto value = test; // Prevent optimization
return {value>10, value};
}
// Benchmark random number generation
void test_rng_generation(size_t iterations) {
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < iterations; ++i) {
volatile auto value = dist(rng); // Prevent optimization
}
auto duration = std::chrono::high_resolution_clock::now() - start;
std::cout << "Random number generation (100M): "
<< std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()
<< "ms" << std::endl;
}
int main() {
const size_t iterations = 100000000; // Test 100 million iterations
// Benchmark: random number generation
test_rng_generation(iterations);
// Test: return uint8_t
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < iterations; ++i) {
volatile auto value = return_u8(); // Prevent optimization
}
auto duration_u8 = std::chrono::high_resolution_clock::now() - start;
// Test: return (bool, uint8_t)
start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < iterations; ++i) {
volatile auto value = return_tuple(); // Prevent optimization
}
auto duration_tuple = std::chrono::high_resolution_clock::now() - start;
// Test: return RandomStruct
start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < iterations; ++i) {
volatile auto value = return_struct(); // Prevent optimization
}
auto duration_struct = std::chrono::high_resolution_clock::now() - start;
// Print results
std::cout << "Results: "
<< "return_u8: "
<< std::chrono::duration_cast<std::chrono::milliseconds>(duration_u8).count()
<< "ms, return_tuple: "
<< std::chrono::duration_cast<std::chrono::milliseconds>(duration_tuple).count()
<< "ms, return_struct: "
<< std::chrono::duration_cast<std::chrono::milliseconds>(duration_struct).count()
<< "ms" << std::endl;
return 0;
}
Rust command:
RUSTFLAGS="-C opt-level=3 -C target-cpu=native" cargo run --release
Rust result:
Random number generation: 230.609456ms (for 100000000 iterations)
return_u8: 243.934695ms, return_tuple: 531.157848ms, return_struct: 524.711489ms
C++ command:
g++ test.cpp -o test -std=c++20 -O3
./test
C++ result:
Random number generation (100M): 226ms
Results: return_u8: 261ms, return_tuple: 313ms, return_struct: 336ms
I am not proficient in either Rust or C++. I compared it with C++ just to verify whether the performance impact of tuples is a universal issue across languages. The C++ code was generated with the help of AI.
I would like to ask for help: In addition to this case, tuples are also used extensively in my other places.Is there a way to optimize?