Benchmark result for "Local" allocation

After all the effort of getting “Local” allocation to work, I wanted to know if it was actually faster, so I just tried a simple benchmark:

#[divan::bench(args = [1, 5, 10, 100, 1000, 10000])]
fn allocate_stdbox(bencher: divan::Bencher, size: usize) {
    
    bencher.counter(size).bench_local(|| {
        let mut v = Vec::new();
        for _i in 0..200 {
            v.push( Box::new(99) );
        }
    })
}

#[divan::bench(args = [1, 5, 10, 100, 1000, 10000])]
fn allocate_lbox(bencher: divan::Bencher, size: usize) {
    use rustdb::alloc::{Local,lvec,lbox};

    Local::enable_bump();
    bencher.counter(size).bench_local(|| {
        let mut v = lvec();
        for _i in 0..200 {
            v.push( lbox(99) );
        }
    })
}

fn main() {
    // Run registered benchmarks.
    divan::main();
}

Results:

Timer precision: 50 ns
example             fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ allocate_lbox                  │               │               │               │         │
│  ├─ 1             3.089 µs      │ 9.034 µs      │ 3.119 µs      │ 3.31 µs       │ 100     │ 100
│  │                323.7 Kitem/s │ 110.6 Kitem/s │ 320.5 Kitem/s │ 302 Kitem/s   │         │
│  ├─ 5             3.086 µs      │ 6.233 µs      │ 3.396 µs      │ 3.383 µs      │ 100     │ 200
│  │                1.62 Mitem/s  │ 802.1 Kitem/s │ 1.471 Mitem/s │ 1.477 Mitem/s │         │
│  ├─ 10            3.31 µs       │ 9.734 µs      │ 3.426 µs      │ 3.533 µs      │ 100     │ 200
│  │                3.02 Mitem/s  │ 1.027 Mitem/s │ 2.918 Mitem/s │ 2.829 Mitem/s │         │
│  ├─ 100           3.388 µs      │ 7.259 µs      │ 3.509 µs      │ 3.573 µs      │ 100     │ 200
│  │                29.5 Mitem/s  │ 13.77 Mitem/s │ 28.49 Mitem/s │ 27.98 Mitem/s │         │
│  ├─ 1000          3.38 µs       │ 9.669 µs      │ 3.512 µs      │ 3.576 µs      │ 100     │ 200
│  │                295.7 Mitem/s │ 103.4 Mitem/s │ 284.6 Mitem/s │ 279.6 Mitem/s │         │
│  ╰─ 10000         3.393 µs      │ 5.566 µs      │ 3.628 µs      │ 3.608 µs      │ 100     │ 200
│                   2.946 Gitem/s │ 1.796 Gitem/s │ 2.756 Gitem/s │ 2.771 Gitem/s │         │
╰─ allocate_stdbox                │               │               │               │         │
   ├─ 1             3.961 µs      │ 32.57 µs      │ 3.984 µs      │ 5.503 µs      │ 100     │ 100
   │                252.4 Kitem/s │ 30.7 Kitem/s  │ 250.9 Kitem/s │ 181.7 Kitem/s │         │
   ├─ 5             4.088 µs      │ 11.04 µs      │ 4.301 µs      │ 5.936 µs      │ 100     │ 200
   │                1.223 Mitem/s │ 452.4 Kitem/s │ 1.162 Mitem/s │ 842.3 Kitem/s │         │
   ├─ 10            4.092 µs      │ 15.92 µs      │ 7.909 µs      │ 6.443 µs      │ 100     │ 100
   │                2.443 Mitem/s │ 628.1 Kitem/s │ 1.264 Mitem/s │ 1.551 Mitem/s │         │
   ├─ 100           4.216 µs      │ 23.12 µs      │ 7.323 µs      │ 6.46 µs       │ 100     │ 100
   │                23.71 Mitem/s │ 4.323 Mitem/s │ 13.65 Mitem/s │ 15.47 Mitem/s │         │
   ├─ 1000          4.214 µs      │ 9.987 µs      │ 7.262 µs      │ 6.241 µs      │ 100     │ 100
   │                237.2 Mitem/s │ 100.1 Mitem/s │ 137.6 Mitem/s │ 160.2 Mitem/s │         │
   ╰─ 10000         4.212 µs      │ 10.73 µs      │ 8.05 µs       │ 6.431 µs      │ 100     │ 100
                    2.374 Gitem/s │ 931.5 Mitem/s │ 1.242 Gitem/s │ 1.554 Gitem/s │         │

So on this allocation-intensive test, it was nearly twice as fast on average. I have to say I am quite dubious whether this is really worthwhile, but it has still been an interesting exercise.

1 Like