Which trait bounds should go on a B+ tree?


#1

This is an interesting problem that has to do with ownership, memory safety, and performance.
I discussed it with a few folks over IRC, but would like to hear what a broader audience thinks.

Let’s call this data structure a BPlusTreeMap<K, V>, which is going to be similar to BTreeMap<K, V>.
Obviously, keys must be ordered, so there’s our first trait bound: K: Ord.

I want to support just two operations:

  1. fn insert(&mut self, key: K, value: V)
  2. fn search(&self, key: &K) -> Option<&V>

Now, in a B+ tree leaf nodes contain elements (K, V) and internal nodes contain copies K of keys, whereas a B tree has just one (K, V) for each inserted element (maybe in a leaf node, maybe in an internal node).
Ok, so a B+ tree must be able to copy keys for it’s internal use, therefore we have another bound: K: Clone.

Suppose we want to create a BPlusTreeMap<String, i32>. There is going to be a ton of string cloning, which will have a horrible effect on performance. However, we don’t really need clones of keys. Having unsafe shallow copies of keys in internal nodes would be just fine. So in this example, we could just memcpy those string keys in order to have copies in internal nodes. Then we just have to be careful not to call destructors on those copies. This eliminates the K: Clone bound.

Ok, this is a good performance improvement. But now we have a memory safety problem!
Suppose that K is a value which contains an ArcCell<Option<Box<i32>>> and on every comparison does something with it, perhaps drops the Box or allocates a new one. If we are sometimes operating on the real K and sometimes on a shallow copy of it, then they will eventually go out of sync and we might have double frees! This is bad.

So how do we fix this? That’s the hard part where I need help :slight_smile:

We figured out a possible solution over IRC. It seems that what we really want is to forbid keys which are cell-like. In other words, forbid keys which have interior mutability. We need a bound K: NotCell, where NotCell is defined like this:

#![feature(optin_builtin_traits)]

trait NotCell {}

impl NotCell for .. {}
impl<T> !NotCell for std::cell::UnsafeCell<T> {}

I’m very happy with this solution, but would need some reassurance that this is indeed correct.

What do you think? Do you have any better ideas?


#2

NotCell won’t work as is, because you can have interior mutability via unsafety. So, the trait should be unsafe and !implemented for raw pointers. (EDIT: Hm, actually it would work because unsafety will boil down to UnsafeCell :slight_smile: )

With this modification, NotCell will be surprisingly similar to std::Sync! But, Sync won’t suffice either: atomic integers are Sync and contain interior mutability, so they should be forbidden as well.

And even if we forbid everything we need, I don’t think it’d be a right solution: we don’t want to constrain the type, we want to constrain PartialEq::cmp method implementation. I don’t know how to do that.

Is there perhaps some way of completely dodging the problem? Like using array of keys and indices? Or storing each key exactly once at the topmost node, and storing empty holes below?


#3

And another idea: what about guarding against this edge case at runtime? memcpy the key before the comparison and assert! that it does not changes.


#4

Constraining PartialEq::cmp is not enough. The user could just as well do map.search(key).mutate_internally() to screw things up. :slight_smile:

Good idea, something like that might work without sacrificing performance too much.


#5

To simplify a bit: https://is.gd/6fj76q

Ref<T> is a reference similar to &T, but it cheats: instead of having a pointer to T, it copies the target T and pretends as if it had a pointer! That means it offers fast access without indirection and saves potential cache misses. Nifty, eh? :slight_smile:

Of course, the API is safe only when bounded by T: NotCell.