Double peak in criterion result testing BTreeMap

This is a result from criterion benchmark of the standard library BTreeMap.

Sget/Std/10

It isn’t important, but I am curious to know what might lead to this unusual distribution with two distinct peaks. It doesn’t occur for values of n other than 10. Have you seen similar double-peak results when using criterion?

The benchmark code ( I am comparing my “pstd” implementation against “std” ):

fn bench_sget(c: &mut Criterion) {
    //use rand::Rng;
    //let mut rng = rand::thread_rng();
    let mut group = c.benchmark_group("Sget");
    for n in [10, 20, 50, 100, 200, 500, 1000].iter() {
        let n = *n;
        let mut s = Vec::new();
        for i in 0..n {
            s.push(i.to_string());
        }
        group.bench_function(BenchmarkId::new("Exp", n), |b| {
            let mut map = pstd::collections::BTreeMap::new();
            for i in 0..n {
                map.insert(i.to_string(), i.to_string());
            }
            b.iter(|| {
                for i in 0..n {
                    /* let ri = rng.gen::<usize>() % n; */
                    assert!(map.get(&s[i]).unwrap() == &s[i]);
                }
            })
        });
        group.bench_function(BenchmarkId::new("Std", n), |b| {
            let mut map = std::collections::BTreeMap::new();
            for i in 0..n {
                map.insert(i.to_string(), i.to_string());
            }
            b.iter(|| {
                for i in 0..n {
                    /* let ri = rng.gen::<usize>() % n; */
                    assert!(map.get(&s[i]).unwrap() == &s[i]);
                }
            })
        });
    }
    group.finish();
}

This is the result for Sget/Exp/10 (about twice as fast as std, with no strange double peak ):

2 Likes

I cannot answer the question, but allow me the tongue-in-cheek quote of

4 Likes

I keep thinking of this:

The definition of insanity is doing the same thing over and over expecting a different outcome.”

If that is true, why does the time vary?! Why is the outcome different?

Maybe it is an artefact of the time measurement or something. I suppose that is the first question, is this “real”? Does it really take either about 300 ns or 400ns to look up 10 strings in a BTreeMap, depending on …. something. Does the processor sometimes “stall” for 100ns? Apparently not in general, because the effect is specific to this test. [What I am thinking about]

1 Like

Can you show output with harness = false. (without gnuplot)

What do I need to do? I currently have this in my toml file:

[[bench]]
name = "crit_bench"
harness = false

I don't know why too.
Sget/Std/10 time: [165.79 ns 166.02 ns 166.29 ns]
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
/// it should one peaks too,shouldn't it ?

In my case,the result only display excution time.
I have a question,what is density ?

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use std::time::Duration;

fn bench_sget(c: &mut Criterion) {
    //use rand::Rng;
    //let mut rng = rand::thread_rng();
    let mut group = c.benchmark_group("Sget");
    group.measurement_time
    for n in [10, 20, 50, 100, 200, 500, 1000].iter() {
        let n = *n;
        let mut s = Vec::new();
        for i in 0..n {
            s.push(i.to_string());
        }
        group.bench_function(BenchmarkId::new("Std", n), |b| {
            let mut map = std::collections::BTreeMap::new();
            for i in 0..n {
                map.insert(i.to_string(), i.to_string());
            }
            b.iter(|| {
                for i in 0..n {
                    /* let ri = rng.gen::<usize>() % n; */
                    assert!(map.get(&s[i]).unwrap() == &s[i]);
                }
            })
        });
    }
    group.finish();
}
criterion_group!(benches, bench_sget);
criterion_main!(benches);

I think if you look in the target/criterion/report folder you may find an html report with the results.


yeah, it is one peak too

Ah, ok, thanks. So that is a normal result. So it seems it is specific to my hardware maybe.

I wonder if periodically the processor has some kind of stall, gets stuck, goes slow, or whatever, and has to take a break to recover. I wouldn’t expect that, but that seems to be what criterion is saying.

Something like this maybe?

“Yes, Intel CPUs can and do stall their execution pipelines. A stall occurs when the pipeline stops fetching or executing instructions due to hazards like data dependencies, cache misses, or branch mispredictions. These stalls introduce "bubbles" (idle cycles) to maintain correct operation when data is unavailable or when previous instructions haven't finished.”

But with something this simple, would it really stall for so long? Could it get in an unstable state maybe where it keeps stalling. I am making wild guesses here, probably completely wrong!

Well I think maybe it could be some kind of “turbo” mode, where the processor sprints, but then gets hot and bothered and has to slow down and recover. That’s my best idea so far.

1 Like

just for curious. can you show a result of Sget/Exp/1000

Sget/Exp/1000

Sget/Std/1000

1 Like

your get is faster std !!! wow

Oh yes, most functions are faster than std, sometimes 4 times faster, for example clone. Insert is nearly 3 times faster.

The thing that slower is iterating over small maps, there is a compiler fix outstanding that may help to some extent with that.