Hi, I'm working on some trading application in Rust and, doing some components, I've few points that I don't understand what is happened, despite few days trying to do. I'm not (hope not yet) a confirmed Rustacean, so sorry if those questions seems trivial.
The first component that I've is a buffer based on array, statically defined, using const generic.
For the size required (1_000_000 items) , I need to create it in the heap, so here the code of this buffer:
use crate::interfaces::StandardBuffer;
pub struct SimpleRingBuffer<T, const S:usize>
where T: Sized + Clone + Default + Debug
{
buffer: Box<[Option<T>;S]>,
index : Cell<u64>
}
impl <T, const S: usize> SimpleRingBuffer<T, S>
where T: Sized + Clone + Default + Debug
{
fn vec_to_boxed_array() -> Box<[Option<T>;S]> {
let boxed_slice = vec![Some(T::default()); S].into_boxed_slice();
let ptr = Box::into_raw(boxed_slice) as *mut [Option<T>; S];
unsafe { Box::from_raw(ptr) }
}
}
impl<T, const S:usize> StandardBuffer<u64, T, S> for SimpleRingBuffer<T, S>
where T: Sized + Clone + Default + Debug,
{
fn new() ->impl StandardBuffer<u64, T, S> {
SimpleRingBuffer {
index: Cell::new(0),
buffer: Self::vec_to_boxed_array()
}
}
fn insert_at(&mut self,elt: Option<T>, idx: u64) {
self.buffer[idx as usize % (S-1)] = elt;
}
fn get(&self, idx:u64) ->Option<T> {
self.buffer[idx as usize % (S-1)].clone()
}
fn get_mut(&mut self, idx:u64) ->Option<&mut Option<T>> {
Some(&mut self.buffer[idx as usize % (S-1)])
}
fn delete(&mut self, idx:u64) {
let idx= idx as usize % (S-1);
if let Some(_) = self.buffer[idx]{
self.buffer[idx] = None;
}
}
fn display_buffer(&self) {
for cpl in self.buffer.iter().enumerate(){
println!("The value of the elements after insert ring elements are {}, {:?}", cpl.0, cpl.1);
}
}
fn get_next_free_index(&self) -> u64 {
self.index.replace(self.index.get() + 1)
}
}
So,I'm going to do some micro-benchmarks on it:
Environment: Intel7-7700HQ, 2.8 GH, 4 physical cores, 8 logical processors, 8 GB memory, windows 10
cargo.toml
nohash = "0.2.0"
crossbeam-channel="0.5.13"
mimalloc = "0.1"
[profile.release]
codegen-units = 1
lto = "fat"
panic = "abort"
opt-level = "s"
first bench: bench1
fn test_buffer_array_int(){
println!("The test of ring buffer: insert u64");
let mut hm = SimpleRingBuffer::<u64,1_000_000>::new();
let base_index = OrderIdProvider::new().get_range_id();
for _ in 0..10 {
let t0 = Instant::now();
for i in 0..1_000_000{
let id=OrderIdProvider::get_full_order_id(base_index, hm.get_next_free_index());
hm.insert_at(Some(i), id);
}
let elapsed = t0.elapsed().as_secs_f64();
println!("insert 1000 000 u64 in the buffer.Time elapsed: {:.3}", elapsed);
}
}
I precise that the static method get_full_order_id takes totally negligeable time
Here is code
use rand::Rng;
use rand::rngs::ThreadRng;
static MASK_L :u64 =0b0000000000000000000000000000000011111111111111111111111111111111;
static MASK_H :u64 =0b1111111111111111111111111111111100000000000000000000000000000000;
pub struct OrderIdProvider {
rand_gen:ThreadRng
}
impl OrderIdProvider {
pub fn get_base_id(full_id: u64) ->u64{
(full_id & MASK_H)>>32
}
pub fn get_order_index(full_id: u64) ->u64{
full_id & MASK_L
}
pub fn get_full_order_id(id_base: u64, order_index: u64) ->u64 {
let id_shift:u64 = id_base << 32;
id_shift + order_index
}
pub fn get_range_id(&mut self) ->u64{
self.rand_gen.gen_range(10u64..=(u64::MAX)>>32)
}
pub fn new() -> OrderIdProvider{
OrderIdProvider{
rand_gen: rand::thread_rng(),
}
}
results bench1:
The test of ring buffer: insert u64
insert 1000 000 u64 in the buffer.Time elapsed: 0.004
insert 1000 000 u64 in the buffer.Time elapsed: 0.003
insert 1000 000 u64 in the buffer.Time elapsed: 0.003
insert 1000 000 u64 in the buffer.Time elapsed: 0.002
insert 1000 000 u64 in the buffer.Time elapsed: 0.002
insert 1000 000 u64 in the buffer.Time elapsed: 0.002
insert 1000 000 u64 in the buffer.Time elapsed: 0.005
insert 1000 000 u64 in the buffer.Time elapsed: 0.004
insert 1000 000 u64 in the buffer.Time elapsed: 0.003
insert 1000 000 u64 in the buffer.Time elapsed: 0.003
So as expected is really fast
Second test using simple struct: bench2
fn test_perf_buffer_insert_point(){
let mut buffer = SimpleRingBuffer::<Point,1_000_000>::new();
//run insertion
println!("The test of ring buffer with struct like point: insertion");
let base_index = OrderIdProvider::new().get_range_id();
println!("base index {}:",base_index);
for _ in 0..10{
let t0 = Instant::now();
for k in 0..1_000_000 {
let id=OrderIdProvider::get_full_order_id(base_index, buffer.get_next_free_index());
buffer.insert_at(Some(Point{x: 45,y: 78 + k, id}), id);
}
let elapsed = t0.elapsed().as_secs_f64();
println!("Time elapsed: {:.3} sec", elapsed);
}
}
results bench2:
The test of ring buffer with struct like point: insertion
base index 301404768:
Time elapsed: 0.006 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Time elapsed: 0.006 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Time elapsed: 0.005 sec
Well, t'ill now results are consistent, we expect move the struct (size: 16bytes) from the stack to the heap (correct me if I'm wrong, but as I know, structs are value types first allocated in the stack) should take few more time than move u64
So, the time (as cpu execution) seems some fonction approximativley as O(ax), where x is the size of the element and a is a some number.
NB: I've tried to create the instances of Point in the heap using Box, but, and I not understand really why, the performances are worse significativelly.
So, now I'm trying with more complex struc:
Here the struct:
pub struct Order<'a,T=OrderType>
where T: Debug+ Default + Copy + PartialEq
{
pub order_data: Option<OrderData<'a>>,
pub order_variant: Option<T>,
pub order_status: OrderStatus,
pub order_id: u64,
pub basket_id:Option<u32>,
}
The size can be variable bcz OrderData contains several fields with string slice.
But in the run, I use defaulted value:
tierd test using Order: bench3
fn test_perf_buffer_insert_order(){
let mut buffer = SimpleRingBuffer::<Order<_>,1_000_000>::new();
//run insertion
println!("The test of ring buffer with struct like order simple: insertion");
let base_index = OrderIdProvider::new().get_range_id();
println!("base index {}:",base_index);
for _ in 0..10 {
let t0 = Instant::now();
for k in 0..1_000_000 {
let id=OrderIdProvider::get_full_order_id(base_index, buffer.get_next_free_index());
buffer.insert_at(Some(Order::new_on_draft(OrderType::Limit(LimitOrderData{price: 45.2, quantity: 458 + k, tif: TimeInForce::Day,side:OrderSide::Buy}),&OrderData::default())), id);
}
let elapsed = t0.elapsed().as_secs_f64();
println!("Time elapsed: {:.3} sec", elapsed);
}
}
results of bench3:
The test of ring buffer with struct like order simple: insertion
base index 1838332593:
Time elapsed: 0.040 sec
Time elapsed: 0.043 sec
Time elapsed: 0.032 sec
Time elapsed: 0.034 sec
Time elapsed: 0.034 sec
Time elapsed: 0.033 sec
Time elapsed: 0.034 sec
Time elapsed: 0.033 sec
Time elapsed: 0.032 sec
Time elapsed: 0.034 sec
OK...At this point, I'm not comfortable with that, why it is so expensive move a struct to the corresponding heap cell in the array? I think it should be some like memcopy(x, sizeof(x) ..) There are some reason to explain the cost? There are way to improve that?
Creating object in the stack takes nothing...
Hum, but now (and sorry for the above annoying code..) I've just experiment following:
fn test_perf_buffer_insert_order(){
let mut buffer = SimpleRingBuffer::<Order<_>,1_000_000>::new();
//run insertion
println!("The test of ring buffer with struct like order simple: insertion");
let base_index = OrderIdProvider::new().get_range_id();
println!("base index {}:",base_index);
for _ in 0..10 {
let t0 = Instant::now();
for k in 0..1_000_000 {
//let id=OrderIdProvider::get_full_order_id(base_index, buffer.get_next_free_index());
buffer.insert_at(Some(Order::new_on_draft(OrderType::Limit(LimitOrderData{price: 45.2, quantity: 458 + k, tif: TimeInForce::Day,side:OrderSide::Buy}),&OrderData::default())), 4578125899);
buffer.insert_at(Some(Order::new_on_draft(OrderType::Limit(LimitOrderData{price: 45.2, quantity: 458 + k, tif: TimeInForce::Day,side:OrderSide::Buy}),&OrderData::default())), 12305);
buffer.insert_at(Some(Order::new_on_draft(OrderType::Limit(LimitOrderData{price: 45.2, quantity: 458 + k, tif: TimeInForce::Day,side:OrderSide::Buy}),&OrderData::default())), 100045789555);
buffer.insert_at(Some(Order::new_on_draft(OrderType::Limit(LimitOrderData{price: 45.2, quantity: 458 + k, tif: TimeInForce::Day,side:OrderSide::Buy}),&OrderData::default())), 7840012543);
}
let elapsed = t0.elapsed().as_secs_f64();
println!("Time elapsed: {:.3} sec", elapsed);
}
}
And the results here:
The test of ring buffer with struct like order simple: insertion
base index 4013720174:
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
Time elapsed: 0.003 sec
So, now it is really faster despite the fact that we are moving same structs to the heap, so why this doesn't take time?
And more, why using variable index for insertion (we can replace id with the loop variable k, the results are the same) it takes x10 times more that using constant number as index?
I believe access to array elements uses pointer arithmetic, so it is just constant time: address[0] + offset; isin't it?
Do have someone ideas about this?
Thanks for your time and help