Aaaaand results. I've got to the end of the poor-man's approach and have pushed computation into the nodes – called Inner in my branch. Compression is a touch more aggressive than quantiles/master which nets a win as well. The updated branch is at bc906d9f8581c74fae55481c4e6b8f90d1bcadeb.
Benchmarking bench_insert_65535
> Warming up for 3.0000 s
> Collecting 100 samples in estimated 461.24 s
test ckms::u16::bench_insert_65535 has been running for over 60 seconds
> Found 4 outliers among 99 measurements (4.04%)
> 1 (1.01%) low mild
> 3 (3.03%) high mild
> Performing linear regression
> slope [91.072 ms 91.250 ms]
> R^2 0.9972804 0.9972740
> Estimating the statistics of the sample
> mean [91.094 ms 91.341 ms]
> median [91.054 ms 91.295 ms]
> MAD [380.69 us 593.37 us]
> SD [508.63 us 747.06 us]
bench_insert_65535: Comparing with previous sample
> Performing a two-sample t-test
> H0: Both samples have the same mean
> p = 0
> Strong evidence to reject the null hypothesis
> Estimating relative change of statistics
> mean [-82.042% -81.499%]
> median [-81.492% -81.412%]
> mean has improved by 81.72%
> median has improved by 81.45%
!!!