Hey every one ^^,
I'm just playing with rust pointer and simply try to catch some runtime cost of borrow checker !
I have scripted this few lines of code on Rust play ground ... and get random result
with no relevant results some time one way win some time its the other so since im not competent enough for checking the disassembly , can I stand on the zero cost abstraction of rust or my method is simply not contrasted enough and the os machine introduce some noise in the result ? since its a fun topic I hope to get some quick answer ^^ thanks any way.
use std::cell::RefCell;
#[allow(unused_imports)]
use std::sync::atomic::{AtomicU64, Ordering::Relaxed};
use std::time::Instant;
const BENCH_ITER_COUNT:u64 = 100001;
fn main() {
// Performace benchmark Test:
// loop & results memory config...
let mut result_data: Vec<String> = Vec::new();
let mut raw_val: Vec<u128> = Vec::new();
let mut refcell_val: Vec<u128> = Vec::new();
let mut count = 0u64;
// benchmark loop...
while count < BENCH_ITER_COUNT {
// raw way...
unsafe {
let mut m3 = SomeMemory::new();
let checka = Instant::now();
//(*(&mut m3 as *mut SomeMemory)).mem2.store(count, Relaxed);
(*(&mut m3 as *mut SomeMemory)).mem4 = count;
let ref value = *(&m3 as *const SomeMemory);
let timea = checka.elapsed().as_nanos();
result_data.push(format!(
"raw -> val:{:?} memory mutation perf:{}(ns)",
value, timea
));
raw_val.push(timea);
}
// Refcell way...
let m4 = RefCell::new(SomeMemory::new());
let checkb = Instant::now();
let ref m5 = m4;
//m5.borrow_mut().mem2.store(count, Relaxed);
m5.borrow_mut().mem4 = count;
let timeb = checkb.elapsed().as_nanos();
result_data.push(format!(
"RefCell -> val:{:?} memory mutation perf:{}(ns)",
m5.borrow(),
timeb
));
refcell_val.push(timeb);
count += 1;
}
let raw_min = raw_val.iter().min().unwrap();
let refcell_min = refcell_val.iter().min().unwrap();
// result display.
println!(
"\nOS exception:\nraw way: {}ns (at best)\nrefcell way: {}ns (at best)\n",
raw_min, refcell_min
);
if raw_val.len() == refcell_val.len() {
// compute the average.
let raw_average = raw_val.iter().sum::<u128>() / (raw_val.len() as u128);
let refcell_average = refcell_val.iter().sum::<u128>() / (refcell_val.len() as u128);
// result display.
println!(
"Average performance counter :\nfor raw: {}ns\nfor RefCell: {}ns",
raw_average, refcell_average
);
}
}
#[allow(dead_code)]
#[derive(Debug)]
struct SomeMemory {
mem1: AtomicU64,
mem2: AtomicU64,
mem3: AtomicU64,
mem4: u64,
}
impl SomeMemory {
fn new() -> Self {
Self {
mem1: AtomicU64::new(0u64),
mem2: AtomicU64::new(0u64),
mem3: AtomicU64::new(0u64),
mem4: 0,
}
}
}
Micro-benchmarking is super-hard. Anything under hundreds of milliseconds it's critically important to use something like Criterion.rs - Criterion.rs Documentation to have it be more likely a meaningful result.
RefCell is not intended to be a zero-cost abstraction: it defers borrow checking to run time, where you have to pay for it with extra computational work. Which is one reason you should avoid it and use normal references or Cell whenever reasonably possible.
But what I see is RefCell are not so expensive in runtime cost next to raw pointer (some time even better in average time... (see it by yourself you just have to increase the const value)... on a large set of mutable memory which the point of my issue... is this a LLVM trick or so ?)
Of course, zero cost abstraction doesn't fit there, compiler optimisation would be the right pick for explaining the fact that on a large set of mutable value I don't really pick some huge differences (?...), ReffCell is often faster than just raw pointer itself which is unexpected... but that is certainly not the right way to contrast the borrow checker runtime cost ... im going to look at micro bench for that and forget average values...
I wouldn't be sure that's what you're actually measuring. The measurement windows in your OP are so small (3-4 asm instructions) that any difference is probably dwarfed by variations in calling Instant::now() and elapsed().
And in a larger context, safe Rust such as RefCell has more optimization potential, due to the LLVM noalias attribute, than unsafe Rust, as benchmarks of significant size might demonstrate.
Final diagnostic:
with cargo asm and --rust flag i'm able locate the instructions quality
and ReffCell sim to be surpinsingly efficient... the runtime tradeoff worse the benefit
at my sense, when you don't have the control of the context of your function call...
(with raw pointer: 1 instruction)
(*(&mut m3 as *mut SomeMemory)).mem4 = count;
str x27, [sp, #216]
Borrow checkers purely run at compile time. Those lifetime informations are not passed to the compiler backend like llvm. It's guaranteed that the lifetime system never affect the produced machine code. It can only reject some code that failed to pass the check.
What you're testing here is the runtime cost of the RefCell, which is a highly efficient single threaded lock, but not the borrow checker itself.
There are so many ways you can fool yourself when measuring.
You're using Instant to measure something that is a few instructions long. That's like using a tape measure to measure a few atoms. You're only measuring the manufacturing tolerance of the instrument itself, not the thing you're actually trying to measure. Try taking a measure of doing no work at all (Instant::now().elapsed()) and you see that you get numbers around one microsecond. To accurately measure something very small, you need to repeat it a lot of times within a single measurement. You can for example make a loop that runs a million times between starting and stopping the timer.
Rust is a very high-level language. Instructions are not layed out the same way as your source code is structured. What looks like a very concrete assignment in the source code will actually not be compiled directly as an assignment, but will just tell the compiler “hold my assignment” , and the compiler can then insert an assignment wherever needed, or nowhere at all if redundant. Your assignments can happen before the timer starts, after the timer stops, or not at all if the compiler can prove that the memory already has the value it needs to be assigned. There can also be other irrelevant instructions placed in the section that is measured by the timer.
Operations on RefCell are inlined and can be optimized as a zero-cost abstraction, but this entirely depends on the context. In one context the lock may be optimized away, and in another one it will not, so you have to look at the actual context where you consider using a RefCell.
Here you can see that two instructions that are irrelevant to what you're trying to be measured have been inserted in the section where the timer is running.
Here you can see that the assignment that locks the RefCell has been optimized away, but the instruction that check that the cell is unlocked and the instruction that unlocks it have not.
( for info u128 was what get out natively of an std::time::Instant struct ) and that may not be the right pic form purely measuring the runtime... but from the main idea to calculate the average runtime cost over millions of iterations to :
Looking straight down to assembly code we can now have a clear idea of what's going on under the hood and see that LLVM still the final master in rust on how code will be optimised
and the most efficient calliper was in fact the asm instructions count not an Instant timing evaluation.
by the way how can we directly write asm instruction from rust code ?
(exceeding the rust compiler policy is always a such things... I know machine often win but rust can only be competitive with c++ with this kind of fine tuning verbose and that why Im enjoy rust code so much)
Have a nice day/night all and thanks all for all your comments that was instructive.
(Although it has been proposed that Rust shall do some optimizations before handing over to LLVM. We'll see in a few years.)
Look at the story behind Matt Godbolt's Compiler Explorer. The point was to show that you don't need a lower level language than C++. You can write your C++ code and see that the assembly code that comes out of it is just as good as what you would write in a lower level language. Same thing applies to Rust. The typical way to optimize a program is not to write assembly, but to look at Compiler Explorer what assembly code comes out of the compiler, and then modify your C++ or Rust source code until the assembly code looks good enough. You can find his talks on Youtube.