**Summary**: I need to maintain a large number of things in a sorted structure with iteration, ith insertion and lookup. I'm presently using a Vec but the performance is abysmal. I could really use recommendations on alternative structures to use, fancy tricks.

Hidy folks,

I have an implementation of "Effective Computation of Biased Quantiles Over Data Streams" in my quantiles library that works pretty well if the number of samples you have is low-ish, say a handful of thousands. The implementation is here. CKMS is a data structure that allows you to answer approximate quantile queries on a stream of data in bounded space. The underlying algorithm has this neat trick where the samples stored into the structure are compressed in such a way that any quantile queries post-compression have a guaranteed error bound on then. You save space and get answers with a guaranteed accuracy. Tidy.

Now, the problem I've got is I'm finding a need for CKMS structures with *significantly* more samples and the insert performance with my implementation is just killing me. I've been fiddling with the implementation, have hit a brick wall and could use some advice on where to go next. I've put together an in-progress branch that has criterion benchmarks and some fiddling micro-optimizations. You can find that branch on Github. I run the benchmarks like `cargo bench -- --test --nocapture`

.

The basic problem I have with my implementation is this: quantiles/CKMS stores its samples, sorted, in a Vec. The algorithm needs access to `i-1`

, `i`

and `i+1`

elements in the underlying store – which is why I use a Vec – and must be able to perform sorted iteration on the elements.

Here, let's have a look at insertion. The whole function starts here but the important part is:

```
} else {
// inserting somewhere in the middle
let mut idx = 0;
let mut r = 0;
for smpl in self.samples.iter() {
match smpl.v.partial_cmp(&v).unwrap() {
cmp::Ordering::Less => {
idx += 1;
r += smpl.g
}
_ => break,
}
}
// To insert in the middle we have to know two things:
//
// * the correct point of insertion, 'idx'
// * the sum of all Entry.g prior to 'idx', called 'r'
//
// We could avoid the 'idx' computation if we could get at 'r'
// some other way.
(idx, invariant(r as f64, self.error) - 1)
};
self.samples.insert(
idx,
Entry {
v: v,
g: 1,
delta: delta,
},
);
```

In the paper `v`

is the value being inserted, `g`

is the difference in ranks between samples `i`

and `i-1`

and `delta`

is the difference between the lowest and greatest possible ranks of `i`

. Just, you know, incidentally. The important part here is that to insert we have to do a linear trawl through the Vec to compute `r`

so that we can compute `delta`

for the newly inserted item, as the comment points out, and then when we insert all the existing items get scooted down one in the Vec.

Compression – defined here – is much smaller than insertion.

```
fn compress(&mut self) {
if self.samples.len() < 3 {
return;
}
let mut s_mx = self.samples.len() - 1;
let mut idx = 0;
let mut r: u32 = 1;
while idx < s_mx {
let cur_g = self.samples[idx].g;
let nxt_v = self.samples[idx + 1].v;
let nxt_g = self.samples[idx + 1].g;
let nxt_delta = self.samples[idx + 1].delta;
if cur_g + nxt_g + nxt_delta <= invariant(r as f64, self.error) {
self.samples[idx].v = nxt_v;
self.samples[idx].g += nxt_g;
self.samples[idx].delta = nxt_delta;
self.samples.remove(idx + 1);
s_mx -= 1;
} else {
idx += 1;
}
r += 1;
}
}
```

We perform a pairwise step through the samples Vec, check against the invariant property the paper defines and, when possible, compress two samples into one (which turns out to be, modify one in-place and remove another from the Vec, incurring more scooting.)

My challenge is this: I can't use BinaryHeap or LinkedList. To the first, I'm maintaining a sorted store, right, so BinaryHeap immediately comes to mind. The tricky bit is on insertion I have to be able to query the structure for all `Entry`

with `v`

less than the to-be-inserted `v`

so that `r`

can be computed. Also, BinaryHeap does not allow sorted iteration – I think – without conversion into a Vec first, trouble for compressing and querying the structure. The paper authors discuss storing their samples in a linkedlist but Rust stdlib LinkedList does not allow insertion at arbitrary points.

I *have* been looking at BinaryHeap's implementation – specifically its Hole technique – with some interest but I don't know if I'd be barking up the wrong tree there or not. If anyone knows of any better structure than Vec that I could use *or* any wacky unsafe tricks I could make use of I'd greatly appreciate it.

Boy, if you read this all the way to the end thank you very much.