I'm benchmarking a concurrent data structure where individual operations are in the 100-200ns range. I need to measure latency distributions (p99, p99.9) rather than just throughput. I'm familiar with divan and criterion, but from what I understand by going through their docs and source code, they don't provide an API to test tail-latency, they focus on agregate throughput/mean.
Any crate recommendations or repos with a good tail-latency benchmarking methodology I can use as a reference?
Is something like HDRHistogram insufficient? This is what I've used in the past albeit for network calls. HDRHistogram is the usual algorithm for this task (or one of).
HDR Histogram is designed for recording histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2014) Intel CPUs. The HDR Histogram maintains a fixed cost in both space and time. A Histogram’s memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.
You might be able to use something a bit cheaper if you're willing to hard-code more properties of your specific situation. The simplest version of that is just tracking the https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm for ~7 logarithmic buckets, because generally if you're expecting hundreds of nanoseconds you don't actually need a ton of specificity in the "multiple milliseconds" range, and realistically you're never tracking any numbers below 1 ns at all.
Thanks, I was already working on a crate for this. I was wondering what people were generally using. I knew about hdrhistogram and have used it as the backend for large sets (>100k), using it directly for concurrent data structures was also troublesome, especially if you are interested in statistically significant results. It's not insufficient if you know what you are doing, but I was looking for something simpler.
If anyone is interested in a simplified framework focused on latency analysis, that takes care of the boilerplate required to produce statistically meaningful results, please check out Simplified tail latency benchmarking
Sorry for the shameless promotion :'D It's a pretty niche metric, but the crate should be useful for those who work on stuff that are latency sensitive.