Help creating a self-referential struct

Hi everyone. I'm working on a library that deals with the following struct (and its BTreeMap equivalent):

pub struct BiHashMap<L, R> {
    left2right: HashMap<Rc<L>, Rc<R>>,
    right2left: HashMap<Rc<R>, Rc<L>>,
}

It's a two-directional map data structure that that creates a bijection between values of type L and values of type R. The internal Rc's are never exposed by the public API and the reference count of each Rc is essentially always 2 (for each L or R value there's one Rc in each HashMap). For instance, a simplified insert method looks like:

fn insert_unchecked(&mut self, left: L, right: R) {
    let left_rc = Rc::new(left);
    let right_rc = Rc::new(right);
    self.left2right.insert(left_rc.clone(), right_rc.clone());
    self.right2left.insert(right_rc, left_rc);
}

Since the Rc's are completely internal to the struct, I was thinking that there should be a way to get rid of them and reduce the overhead of using Rc's. However, I don't see any way to do this other than by creating a self-referential struct (using Pin?) and/or using unsafe code, both of which I'm not completely comfortable doing.

I'm imagining something like

pub struct BiHashMap<L, R> {
    left2right: HashMap<NonNull<L>, NonNull<R>>,
    right2left: HashMap<NonNull<R>, NonNull<L>>,
   // something to store the actual L and R values here
}

but I'm really not sure about the details. Any help or suggestions would be greatly appreciated!

One classic way to have a cyclic graph of objects without unsafe code, self-referential structs or some form of garbage collection (including Rc + Weak) is to have all nodes be owned by a Vec and use indices in that Vec as pointers.

That's easy to do as long as you're only inserting nodes and deleting everything at once. Fine-grained deletion is much harder (besides the fact that you cannot really delete something in the vec without invalidating every subsequent index and must generally use some sort of Option-based tombstoning as a result, how do you even know when it's okay to delete a node in a general graph without some form of garbage collection?), but at least you can detect incorrect deletion using a more advanced trick called generational indices.


You may also be interested in Learning Rust With Entirely Too Many Linked Lists if you've not read that already, as it deals with a more restricted kind of cyclic graph (doubly linked list) that is closer to the one you're trying to create here.

3 Likes

Thanks for the reply. I considered doing something like that where a Vec owns all the nodes and using indices as pointers, but like you said removing entries becomes much more complicated that way. Since this is intended to be a general-purpose data structure, it should support efficient deletion just like a regular HashMap.

Also, Learning Rust With Entirely Too Many Linked Lists was a great read!

1 Like

I believe you could just use Box::leak(Box::new(val)) to create pointers to values when inserting them, and when deleting them, you can use Box::from_raw to create the box. As long as you're careful to remove both halves, you should be fine.

This would not require adding a third field to store the values, as they're implicitly owned through the pointers. If you want to avoid allocating a box for every value, you'll have to store them together in some larger storage, and in that case you might as well use indices, as pointers are unlikely to help with the fact that things move around when you move them around.

Don't do this. It is incredibly easy to make a mistake this way, and that can be disastrous around unsafe code.

Instead of using a HashMap you can use a generational arena, this allows you to detect old keys and is indexing almost as fast as indexing a Vec<T>. generational-arena is a good implementation of this. If you don't care about removing things, then you can use typed-arena

FWIW when I had a similar challenge, I ended up using the "index pattern" with a HashMap<usize, T> in place of Vec<T>. This allows deleting elements at the cost of extra memory, storage complexity, and sacrificing some cache locality. The advantage is it's hard to screw it up, 100% safe Rust, and it doesn't fight the borrow checker.

2 Likes

Ideally this is the method I would go for (or something similar), but I agree with KrishnaSannasi that it's very easy to mismanage the memory and shoot yourself in the foot. I think it could be made safe(r) using Pin and NonNull somehow but I'm still fiddling with the details.

I hadn't thought about using another map for the value storage, that's not a bad idea.

1 Like

Alternatively you could use a slab.

1 Like

Yep, generational arena is basically a Slab that doesn't reuse keys, but still does reuse the backing array. There is the extra cost of storing a "generation" which is just an integer. But this shouldn't be noticeable.

Ok, I think I have a safe implementation that doesn't use any extra fields to store the values. If someone could look over it and make sure that everything seems sound I'd greatly appreciate it. I've added comments where necessary to help explain things.

Link to playground

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.