Problem with criterion benchmarks

Hi all,

I'm pretty new to rust and my experience with benchmarking is 0.

I'm working at a library, that let the user define assets, like usd, bitcoin ecc ecc and do basic math operation between assets of the same type in a safe fashion, the library is inspired by this article.

I wanted to use the fixed crate but I need to express the fractional bit count via a const (is more user friendly), and fixed do not support that yet, so I implemented a dummy fixed logic internally.

I was ready to release a first experimental 0.0.1 version, when I added some benchmark that completely demoralized me. I was expecting adding Assets to have performance pretty close to checked_add.

What I get is:
Checked add i64 -> ~1.3000 ns
Add [u8; 8] using the below function (the same used in the library) -> ~2.5000 ns and that is ok
Add Fixed<[u8; 8]> (is just a wrapper) -> ~18 ns :astonished:

The function used to add the arrays is the below:

pub fn add_checked_buf<'a>(le: [u8; 8], ri: [u8; 8]) -> Option<[u8; 8]> {
    let z: i64;
    unsafe {
        let le: [u8; 8] = std::mem::transmute_copy(&le);
        z = <i64>::from_le_bytes(le);
    }
    let y: i64;
    unsafe {
        let ri: [u8; 8] = std::mem::transmute_copy(&ri);
        y = <i64>::from_le_bytes(ri);
    }
    let sum = z.checked_add(y)?.to_le_bytes();
    let sum_: [u8; 8];
    unsafe { sum_ = std::mem::transmute_copy(&sum) }
    Some(sum_)
}

Here the benchmarks
Here the function that add the arrays
Here where the very slow Fixed is defined
Fixed implement HasFixedOps that call the above function.

Here I'm stuck as I said before my experience with benchmarking is zero. Anyone can help me figuring out what could be the next step in order to increase the performances of adding Fixed?

The library need to be compiled with a version of rustc >= 2020-03-23

Ty :slight_smile:

I'd just like to note that the following function does exactly what your add_checked_buf does without all the unnecessary unsafe:

pub fn add_checked_buf(le: [u8; 8], ri: [u8; 8]) -> Option<[u8; 8]> {
    let z: i64 = <i64>::from_le_bytes(le);
    let y: i64 = <i64>::from_le_bytes(ri);
    let sum = z.checked_add(y)?.to_le_bytes();
    Some(sum)
}

Playground.

Ty, I tried something similiar to your implementation but sadly I can not use it because the array length is not fixed: merx/numeric_methods.rs at master · Fi3/merx · GitHub

Yeah, okay, even then, in the context of your macro, you can reduce it to the following:

#[inline]
pub fn $fn_name<const LEN: usize>(le: [u8; LEN], ri: [u8; LEN]) -> Option<[u8; LEN]> {
    let z = <$int_type>::from_le_bytes(le);
    let y = <$int_type>::from_le_bytes(ri);
    let sum = z.checked_add(y)?.to_le_bytes();
    Some(sum)
}

(And no matter what, you don't need the extra lifetime parameter)

The compiler still complain :slight_smile:

   = note: expected array `[u8; 16]`                                
              found array `[u8; _]

I tried hard to not use unsafe but is the only impl that I can get to work :frowning:

Ah, wait, I see that now, but now I must ask why $fn_name is generic over LEN, if its internal implementation requires you to use a buffer of a certain size? After all, a u64 will always require a [u8; 8] since it is the nature of a u64 that it is 8 bytes wide.

If you're doing match statements like so: merx/fixed.rs at master · Fi3/merx · GitHub you should probably be implementing the trait multiple times. You can do this: Playground.

Also, please note that const generics aren't done and stable yet, so your mileage may vary.

$fn_name is generic over LEN because it should work for i32 i64 i128. Regard merx/fixed.rs at master · Fi3/merx · GitHub you are right maybe if I implement it multiple times for 4 8 16 maybe I can get rid of the unsafe in the $fn_name and not be generic over LEN now I try.

Mmm I don't think that it can work because I need HasFixedOps to be generic over LEN because is a superset of HasBuf that is generic over LEN and I need HasBuf to be generic over LEN because the crate export the macro new_asset that create a wrapper around an array of LEN 4 8 or 16 and dynamically but at compile time chose the LEN based on the value passed by the user merx/asset.rs at master · Fi3/merx · GitHub. For that I think that be generic over LEN is the only way to go. But actually I'm not very confindent in that because I'm new to rust.

Regard the fact that const generics are not stable I'm aware of that, your question let me realize that I didn't give you enough context about the project and what I'm looking for.

Some more context about this project and what I'm looking for:
I'm doing this library mainly to get confidence with rust (so I welcome any advise also not related to the performance issue). Btw I would like also to create something that can be useful and actually the library is completely useless if the performance are so bad (I just could use something with infinite precision), is for that that my main concern is the benchmark but I really welcome any kind of advises about.

Ok, if I compile criterion with the real_blackbox flag I get numbers that make much more sense.

Without the flag I had:

Checked add i64 -> ~1.3000 ns
Add [u8; 8] using add_checked_buf -> ~2.5000 ns
Add Fixed<[u8; 8]> -> ~18 ns

Now I have:
Checked add i64 -> ~1.3000 ns
Add [u8; 8] using add_checked_buf -> ~1.4000 ns
Add Fixed<[u8; 8]> -> ~4.1 ns

This make much more sense to me because the assembly generated by add_checked_buf is very similar at the one of checking add two int: Compiler Explorer

I'm still surprised by the 4x time needed in order add 2 Fixed. Because Fixed is just a wrapper around an [u8;8] that call add_checked_buf

I'm not completely sure as to why it's that much slower.

Perhaps the compiler is having a hard time optimizing?

Here it is:

Just a distraction error 214748364700 * 10^10 is bigger than i64::max_value() I was comapring the performance of adding i64 with the ones of adding i128 values :woman_facepalming:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.