Hi everyone. I've posted here a few times before and I am very thankful for all of the helpful and prompt responses I have received.
That being said, I am trying (for the 5th time) to get myself to understand Rust. I really appreciate the idea of compile time safety checks and it seems like everyone in my field (kernel, filesystems, and database programming) believes that Rust will be the language of the future. However, every time I try to get into it I can't help but feeling like Sideshow Bob.
I am trying to implement a simple B+ Tree that stores Box<u64>
values indexed by u64
keys. Below, I have pasted what I have so far (it is really ugly and it currently only supports find()
and an insert()
method that doesn't compile and is missing some logic). I have left inline comments with questions I have. I would appreciate it if anyone could answer any of the questions I have here. I'm sorry that I don't have a single specific question to ask, but I really feel like I'm just banging my head against the wall at every turn. If it makes more sense for me to split this up into separate topics I can do that too (I'm not sure which is easiest).
Advice on different approaches to learning Rust is welcome as well. I have tried following along with a few guides including Rust through too many linked lists and the official handbook, but I always feel like I've come away more confused after reading them.
Thanks for any help.
struct InteriorNode {
keys: Vec<u64>,
children: Vec<Box<Node>>,
}
struct KeyValuePair {
key: u64,
value: Box<u64>,
}
struct LeafNode {
kvs: Vec<KeyValuePair>,
}
enum Node {
Interior(InteriorNode),
Leaf(LeafNode),
}
struct BPTree {
n: usize,
/*
* The type here has some crazy levels of nesting. Assuming I have a
* single key/value in the tree and I want to get a reference to the value,
* I need to unwrap the option, dereference the box, match the Leaf type,
* grab a reference to the Leafnode::kvs vector, get first value from the
* Vec, and finally dereference the box. None of this process even dealt
* with iterating down the actual structure of the BPTree. I understand
* the value of each of these layers, but this in turn makes all of my
* code extremely nested and hard to follow. Is there a way I can do this
* without all of the nesting? Can I write helper functions to hide some
* of this? If I can write helper functions do I need to write all the
* boilerplate myself every time I have a complex structure like this?
*/
root: Option<Box<Node>>,
}
impl BPTree {
fn new_leaf(&self) -> Box<Node> {
Box::new(Node::Leaf(LeafNode {kvs: Vec::new()}))
}
fn new_interior(&self) -> Box<Node> {
/* This seems really verbose. Is there a shorthand way to do this? */
Box::new(Node::Interior(InteriorNode {children: Vec::new(), keys: Vec::new()}))
}
fn new(n: usize) -> BPTree {
BPTree { n: n, root: None }
}
fn split_leaf(&mut self, leaf: &mut LeafNode) {
/*
* I got stuck in a few ways here. Basically, this function is supposed to split
* a leaf node that is overfull into 2 half-full leaves and then add both to the
* parent layer. If this insertion results in an interior node being overfull, that
* node is also split and this process continues up the entire tree until either
* an underfull node is found or we hit the root node. If we hit the root, we split
* it and create a new root level of the BPTree. My issues are as follows:
*
* 1) As I stated below in insert(), I cannot call this function at its current place
* in the code due to an error about borrowing self twice.
* 2) I'm not sure how to do parent pointers idiomatically. Should I create raw
* pointers and drop into unsafe blocks? Should I use Rc::weak pointers? Can I use
* these without using a regular Rc pointer?
* 3) Ideally, in c I would create a function to add a node as a child of another to
* ensure that every time I add a child to a node the parent pointer also gets
* updated. What does the ownership model look like here? Would the parent and
* child both have mutable references to each other? Ins't that not allowed?
* 4) Due to the other issues, I have not really gotten close to implementing this
* method and I have no idea what other issues I might hit.
*/
unimplemented!();
}
fn insert(&mut self, key: u64, val: u64) {
match self.root {
Some(ref mut node_box) => {
let mut node = node_box.as_mut();
'search: loop {
match node {
/*
* Is the only way to get to the value contained within an enum
* to use a match statement?
*/
Node::Interior(ref mut int) => {
for (i, search_key) in int.keys.iter().enumerate() {
/*
* I don't understand why sometimes dereferencing happens
* automatically and sometimes i need to do it manually.
* Can anyone explain this to me?
*/
if key < *search_key {
node = int.children.get_mut(i).unwrap();
/*
* Why did I need to add a label to my loop? I was not
* able to figure out what 'continue' means in the context
* of a 'match' statement (at least with a quick google).
* Does it mean the equivalent of a "fall-thru" switch case?
*/
continue 'search;
}
}
node = int.children.last_mut().unwrap();
},
Node::Leaf(ref mut leaf) => {
for (i, search_kv) in leaf.kvs.iter_mut().enumerate() {
if key < search_kv.key {
/*
* I can't create a method to make a new KeyValuePair from a
* key and value. I would like to be able to do this so that I have
* a concise way of creating KeyValuePairs. I wanted it to be a
* method so that I can eventually make BPTree generic, allowing the
* method to automatically create the right generic KeyValuePair.
*/
leaf.kvs.insert(i, KeyValuePair { key: key, value: Box::new(val) });
break;
}
}
/*
* I am missing a case here. If the key to be inserted is higher than any
* existing keys nothing will get inserted. In c, I would create the variable
* 'i' before the loop and check if i > kvs.len() after the loop. In python
* I would use a for...else statement. I'm not sure what the idiomatic thing
* to do is here.
*/
if leaf.kvs.len() > self.n {
/*
* Compilation error: can't borrow self twice. I don't understand why
* this counts as borrowing twice. From my perspective i'm trying to
* pass my mutable, borrowed reference into the function.
*/
self.split_leaf(leaf);
}
}
}
}
},
None => {
/*
* In any other language I would structure this differently. I would
* have an 'if' statement at the top of my code to guarantee that
* root is allocated. Then I would run the normal insertion code that
* is currently in the 'Some' case. Instead it seems I need to duplicate
* the insertion code or add unnecessary 'None' checks.
*/
let mut root = self.new_leaf();
/*
* Why don't I need 'ref mut' on the leaf variable?
*/
if let Node::Leaf(leaf) = root.as_mut() {
leaf.kvs.push(KeyValuePair { key: key, value: Box::new(val)});
} else {
/*
* I just made sure 'root' was a leaf, but I have no way to get a
* ref to the inner struct without a match or an 'if let' statement.
* I find myself writing a lot of 'else panic' statements like this
* one because I want to make sure all of the cases of the 'if'
* are handled.
*/
panic!("Root node is not a leaf");
}
self.root = Some(root);
}
}
}
/*
* Do I need to write another version of this function for fetching
* a mutable reference? Wouldn't that function be almost entirely
* copy-pasted? That seems like a lot of code duplication / boilerplate.
*/
fn find(&self, key: u64) -> Option<&Box<u64>> {
match self.root {
Some(ref node_box) => {
/*
* This kind of hurts my brain. So this is a MUTABLE variable
* named 'node' that refers to an immutable reference to a
* value inside a box? Do I have that right? If I take away the
* 'mut' keyword I get compilation errors when setting 'node'
* later in the code.
*/
let mut node = node_box.as_ref();
'search: loop {
match node {
Node::Interior(int) => {
for (i, search_key) in int.keys.iter().enumerate() {
if key < *search_key {
node = int.children.get(i).unwrap().as_ref();
continue 'search;
}
}
node = int.children.last().unwrap().as_ref();
},
Node::Leaf(leaf) => {
for search_kv in leaf.kvs.iter() {
if key == search_kv.key {
return Some(&search_kv.value);
}
}
return None;
}
}
}
},
None => {
return None;
}
}
}
}
fn main() {
let mut bt = BPTree::new(4);
bt.insert(1, 1);
let val = bt.find(1);
println!("Hello, world!: {}", *val.unwrap());
}