Problem with criterion benchmarks

Fi3 · March 31, 2020, 4:44pm

Hi all,

I'm pretty new to rust and my experience with benchmarking is 0.

I'm working at a library, that let the user define assets, like usd, bitcoin ecc ecc and do basic math operation between assets of the same type in a safe fashion, the library is inspired by this article.

I wanted to use the fixed crate but I need to express the fractional bit count via a const (is more user friendly), and fixed do not support that yet, so I implemented a dummy fixed logic internally.

I was ready to release a first experimental 0.0.1 version, when I added some benchmark that completely demoralized me. I was expecting adding Assets to have performance pretty close to checked_add.

What I get is:
Checked add i64 -> ~1.3000 ns
Add [u8; 8] using the below function (the same used in the library) -> ~2.5000 ns and that is ok
Add Fixed<[u8; 8]> (is just a wrapper) -> ~18 ns

The function used to add the arrays is the below:

pub fn add_checked_buf<'a>(le: [u8; 8], ri: [u8; 8]) -> Option<[u8; 8]> {
    let z: i64;
    unsafe {
        let le: [u8; 8] = std::mem::transmute_copy(&le);
        z = <i64>::from_le_bytes(le);
    }
    let y: i64;
    unsafe {
        let ri: [u8; 8] = std::mem::transmute_copy(&ri);
        y = <i64>::from_le_bytes(ri);
    }
    let sum = z.checked_add(y)?.to_le_bytes();
    let sum_: [u8; 8];
    unsafe { sum_ = std::mem::transmute_copy(&sum) }
    Some(sum_)
}

Here the benchmarks
Here the function that add the arrays
Here where the very slow Fixed is defined
Fixed implement HasFixedOps that call the above function.

Here I'm stuck as I said before my experience with benchmarking is zero. Anyone can help me figuring out what could be the next step in order to increase the performances of adding Fixed?

The library need to be compiled with a version of rustc >= 2020-03-23

Ty

OptimisticPeach · March 31, 2020, 4:52pm

I'd just like to note that the following function does exactly what your add_checked_buf does without all the unnecessary unsafe:

pub fn add_checked_buf(le: [u8; 8], ri: [u8; 8]) -> Option<[u8; 8]> {
    let z: i64 = <i64>::from_le_bytes(le);
    let y: i64 = <i64>::from_le_bytes(ri);
    let sum = z.checked_add(y)?.to_le_bytes();
    Some(sum)
}

Playground.

Fi3 · March 31, 2020, 4:57pm

Ty, I tried something similiar to your implementation but sadly I can not use it because the array length is not fixed: merx/numeric_methods.rs at master · Fi3/merx · GitHub

OptimisticPeach · March 31, 2020, 5:00pm

Yeah, okay, even then, in the context of your macro, you can reduce it to the following:

#[inline]
pub fn $fn_name<const LEN: usize>(le: [u8; LEN], ri: [u8; LEN]) -> Option<[u8; LEN]> {
    let z = <$int_type>::from_le_bytes(le);
    let y = <$int_type>::from_le_bytes(ri);
    let sum = z.checked_add(y)?.to_le_bytes();
    Some(sum)
}

(And no matter what, you don't need the extra lifetime parameter)

Fi3 · March 31, 2020, 5:04pm

The compiler still complain

   = note: expected array `[u8; 16]`                                
              found array `[u8; _]

I tried hard to not use unsafe but is the only impl that I can get to work

OptimisticPeach · March 31, 2020, 5:11pm

Ah, wait, I see that now, but now I must ask why $fn_name is generic over LEN, if its internal implementation requires you to use a buffer of a certain size? After all, a u64 will always require a [u8; 8] since it is the nature of a u64 that it is 8 bytes wide.

If you're doing match statements like so: merx/fixed.rs at master · Fi3/merx · GitHub you should probably be implementing the trait multiple times. You can do this: Playground.

Also, please note that const generics aren't done and stable yet, so your mileage may vary.

Fi3 · March 31, 2020, 5:20pm

$fn_name is generic over LEN because it should work for i32 i64 i128. Regard merx/fixed.rs at master · Fi3/merx · GitHub you are right maybe if I implement it multiple times for 4 8 16 maybe I can get rid of the unsafe in the $fn_name and not be generic over LEN now I try.

Fi3 · March 31, 2020, 5:49pm

Mmm I don't think that it can work because I need HasFixedOps to be generic over LEN because is a superset of HasBuf that is generic over LEN and I need HasBuf to be generic over LEN because the crate export the macro new_asset that create a wrapper around an array of LEN 4 8 or 16 and dynamically but at compile time chose the LEN based on the value passed by the user merx/asset.rs at master · Fi3/merx · GitHub. For that I think that be generic over LEN is the only way to go. But actually I'm not very confindent in that because I'm new to rust.

Regard the fact that const generics are not stable I'm aware of that, your question let me realize that I didn't give you enough context about the project and what I'm looking for.

Some more context about this project and what I'm looking for:
I'm doing this library mainly to get confidence with rust (so I welcome any advise also not related to the performance issue). Btw I would like also to create something that can be useful and actually the library is completely useless if the performance are so bad (I just could use something with infinite precision), is for that that my main concern is the benchmark but I really welcome any kind of advises about.

Fi3 · April 1, 2020, 2:27pm

Ok, if I compile criterion with the real_blackbox flag I get numbers that make much more sense.

Without the flag I had:

Checked add i64 -> ~1.3000 ns
Add [u8; 8] using add_checked_buf -> ~2.5000 ns
Add Fixed<[u8; 8]> -> ~18 ns

Now I have:
Checked add i64 -> ~1.3000 ns
Add [u8; 8] using add_checked_buf -> ~1.4000 ns
Add Fixed<[u8; 8]> -> ~4.1 ns

This make much more sense to me because the assembly generated by add_checked_buf is very similar at the one of checking add two int: Compiler Explorer

I'm still surprised by the 4x time needed in order add 2 Fixed. Because Fixed is just a wrapper around an [u8;8] that call add_checked_buf

github.com

Fi3/merx/blob/master/src/fixed.rs#L234


      
          
          
int_try_from!(I32_LEN, i32);
          int_try_from!(I64_LEN, i64);
          
          
macro_rules! int_from {
              ($len:ident, $ty_:ty) => {
                  impl From<$ty_> for ByteArray<$len> {
                      fn from(value: $ty_) -> ByteArray<$len> {
                          let inner = <[u8; $len]>::try_from(&value.to_le_bytes()[0..$len]).unwrap();
                          ByteArray(inner)
                      }
                  }
              };
          }
          int_from!(I32_LEN, i32);
          int_from!(I64_LEN, i64);
          int_from!(I128_LEN, i128);
          
          
macro_rules! int_into {
              ($len:ident, $ty_:ty) => {
                  impl Into<$ty_> for ByteArray<$len> {

OptimisticPeach · April 1, 2020, 3:22pm

I'm not completely sure as to why it's that much slower.

Perhaps the compiler is having a hard time optimizing?

Fi3 · April 3, 2020, 3:14pm

Here it is:

github.com

Fi3/merx/blob/c4d2c4da77b0a74b26f20ce7255b4e8585a655ab/benches/benchmark.rs#L16


      
          #![allow(incomplete_features)]
          #![allow(unused_imports)]
          #![allow(dead_code)]
          
          
#[macro_use]
          extern crate merx;
          use criterion::{black_box, criterion_group, criterion_main, Criterion};
          use merx::{Asset, Debt, Credit, asset::CheckedOps};
          
          
get_traits!();
          new_asset!(bench_asset_64, 10, 214748364700);
          type BenchAsset64 = Asset<bench_asset_64::Value>;
          type BenchAsset64Inner = bench_asset_64::Fixed_;
          
          
fn add_checked64(a: i64, b: i64) -> Option<i64> {
              a.checked_add(b)
          }
          
          
pub fn add_checked_buf<'a>(le: [u8; 8], ri: [u8; 8]) -> Option<[u8; 8]> {
              let z: i64;
              unsafe {

Just a distraction error 214748364700 * 10^10 is bigger than i64::max_value() I was comapring the performance of adding i64 with the ones of adding i128 values

system · July 2, 2020, 3:14pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to benchmark multiple versions of a library (with different compile-time constants) against each other? help	8	740	February 21, 2020
Why is this prime number benchmark 30% slower than C++? help	13	4381	January 12, 2023
Benchmark Question help	4	196	December 2, 2023
Looking for help understanding Rust's performance vs C++ community	28	6971	November 1, 2019
Is My Criterion Benchmarking Code Actually Answering My Question? code review	6	1016	March 10, 2022

Problem with criterion benchmarks

Related Topics