Demonstration Code: B+ Tree

As an exercise, I decided to write a B+ Tree without using any bespoke unsafe code¹. It turned out to be a larger project than I expected— So far, it only supports insertion and iterating over the entire contents of the tree.

I'm thinking about turning this into a blog post to demonstrate using indexing in place of references for complex data structures. With this goal in mind, are there any code patterns in here that might cause problems for intermediate-level Rust users? (General feedback is also appreciated)

I've pasted the type definitions inline here; see the playground for full implementation details.

#[derive(Copy,Clone,Debug)]
struct BPlusInfoNode {
    free: Option<NonZeroUsize>,
    root: NonZeroUsize
}

#[derive(Copy, Clone, Debug)]
struct BPlusFreeNode {
    next: Option<NonZeroUsize>
}

#[derive(Debug)]
struct BPlusInnerNode<K,const B:usize> {
    splits: ArrayVec<K,B>,
    pre: ArrayVec<NonZeroUsize, B>,
    post: NonZeroUsize,
}

#[derive(Debug)]
struct BPlusLeafNode<K,V,const B:usize> {
    keys: ArrayVec<K, B>,
    vals: ArrayVec<V, B>,
    prev: Option<NonZeroUsize>,
    next: Option<NonZeroUsize>
}

#[derive(Debug)]
#[repr(align(4096))]
enum BPlusNode<K,V,const BI:usize, const BL:usize> {
    Info ( BPlusInfoNode ),
    Free ( BPlusFreeNode ),
    Inner ( BPlusInnerNode<K,BI> ),
    Leaf ( BPlusLeafNode<K,V,BL> )
}

#[derive(Debug)]
pub struct BPlusTreeImpl<K,V,const BI:usize,const BL:usize>(Vec<BPlusNode<K,V,BI,BL>>);

/// Attempt to calculate `BI` and `BL` such that each node fits within `PAGE` bytes
pub type BPlusTree<K,V,const PAGE:usize> = BPlusTreeImpl<
    K,
    V,
    {(PAGE-4*size_of::<usize>()-align_of::<usize>()) / (size_of::<K>() + size_of::<usize>())},
    {(PAGE-5*size_of::<usize>()-align_of::<usize>()-align_of::<V>()) / size_of::<(K,V)>()}
>;

impl<K:Clone+Ord+Debug,V:Debug,const BI:usize,const BL:usize> BPlusTreeImpl<K,V,BI,BL> {
  // ...
}

enum InsertResult<K,V> {
    Inserted,
    Duplicate(K,V),
    Split(K, [NonZeroUsize;2])
}

pub struct TreeIter<'tree, K, V, const BI: usize, const BL: usize> {
    tree: &'tree BPlusTreeImpl<K,V,BI,BL>,
    leaf: Option<NonZeroUsize>,
    idx: usize,
}

¹ It's not entirely unsafe-free because it uses Vec and ArrayVec, both of which use unsafe internally.

3 Likes

Any chance you could throw in disk serialization as well? I have been looking for a pedagogical minimal "on disk b-tree". :slight_smile:

I was looking for this a while back as well, except to encrypt and write to another part of memory, not disk. So it'd be nice to have the serialization part abstract / modular enough to accommodate for use cases like that!

I had actually tried modifying Rust std's BTreeMap implementation to do this, but it was too tied to being held persistently in-memory, at least at the time.

I feel like the naming of everything with BPlus* prefixes is unnecessary. The whole module is about implementing B+ trees, so it seems unnecessary to keep re-iterating the fact that all those internal structs are about BPlus-Something.

If you make a blog post, you might want to consider using standard code formatting (i.e. apply rustfmt).

AFAIK it’s typical to do things like use std::{any, mem}; in order to be able to write mem::drop(…) or any::type_name::<…>() instead of having to type the full std::module::path everywhere. At least for std::mem::drop that’s not even necessary as the function is in the prelude; just write drop(…).

2 Likes

I actually prefer BPlusNode.

Choice 1: BPlusNode, GfxNode, FsNode, LogNode
Choice 2: bplus::Node, gfx::Node, fs::Node, log::Node

Although I think (2) is more standard, I prefer (1) becaue:

  • auto completion on goto class in IntelliJ works better
  • when seeing a Node in a file, I don't have to hunt for use statements to figure out what Node it is

We’re talking about private types in a module that only implements B+ trees. There’s never any bplus::Node being written because Node is private. And inside of the bplus module it would just be written Node; which auto-completes just fine (in case you really need auto-completion for that). Since it’s private, there’s never any use statements involving it either.

2 Likes

I am referring to Search Everywhere | IntelliJ IDEA Documentation

This is an IntelliJ feature where you can have a workspace with multiple crates in the workspace and then jump to any struct/enum by hitting goto-class and auto completing.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.