MagicRc, MagicWeak, MagicHashSet

RcAddr<K> is like a Rc<K>, except Eq/Hash is defined based on address of K rather than value of K.
WeakAddr<K> is like Weak<K> except Eq/Hash is defined based on address  of Krather than value of K.
MagicRc<K> is sorta like a RcAddr<k>
MagicWeak<K> is sorta like a WeakAddr<k>
MagicHashSet<K, V> is sorta like a Rc<RefCell<HashSet<MagicWeak<K>, V>>>

Now, here is the magical part. When the last MagicRc<K> is dropped for a given K (i.e. the strong ref count goes to 0, and the drop on K itself is about to be called), I want a drop handler to fire off and rmove the element from all MagicHashSet<K, V> that has a WeakAddr<K> to the same object.

Put another way: although MagicHashSet<K, V> is sorta like a Rc<RefCell<HashSet<Magic<K>, V>>> , except when drop handlers are executing, all of the keys should be convertable to a MagicRc<K>. This is because when a MagicRc<K> hits ref count 0, in it's drop handler, it calls remove on itself from all Rc<RefCell<HashSet<MagicWeak<K>, _>>> for the relevant MagicWeak<K>

Sorry for the confusing terminology above. The logic of 'Magic' here is as follows:

RAII/drop has this amazing property where when an object goes out of scope, we can use it to free resources.

I want a "magical" Rc/Weak with the property tht when the RefCount of the MagicRc goes to 0, it looks around at all hash maps using a MagicWeak as key, and removes the (k,v) pair if the MagicWeak is pointing to object of strong ref count 0.

Now, having it search through all memory for this is unreasoanble, so I'd like this class MagicHashMap , which someone does some type of tracking along with MagicRc to achieve this.

I'm currently trying to hack something together. I think it can be done, but it is very messy so far. I am wondering if this problem already has a solution (or a name).

===

EDIT: (In response to @2e71828 's excellent question): "Lazy removal" (i.e. next time HashMap is accessed) does not work for this application. The reason is that I am keeping "snapshots" of fairly complicated data structures in memory. As soon as I no longer need a "snapshot", I want everything realted to that snapshot to be freed / dropped.

You can probably make something more easily that’s almost as useful by making the removals lazy: instead of proactively removing the weak keys, make a HashMap that silently does the removal whenever one appears in the course of its normal operations.

Otherwise, you’ll have some complicated work to avoid contentions when a strong count goes to 0 while the map is locked by an exclusive reference.

2 Likes

Can you elaborate on the behavior you're looking for? Would it be acceptable to remove the rc=0 keys the next time the HashMap is used mutably (e.g. via entry/get_mut)? Is it ok to scan the keys at that time or would you want to maintain some kind of list of known rc=0 keys that still need to be removed from some map?

1 Like

@2e71828 : Thank you for the excellent question. I am keeping "snapshots" of complicated structures in memory. "lazy" free does notwork. The moment there are no longer any Rc's to a snapshot, I want all resources related to that snapshot to drop / free.

@jethrogb : I'm not sure I fully understand what you are proposing. It sounds like @2e71828 's idea of lazy free/drop. This unfortunately does not work, as I am keeping "snapshots" of complicated data structures, and the moment there are no longer any Rc's to a given snapshot, I want all data related to that snapshot to drop/free.

@2e71828 , @jethrogb : There are really helpful clarification questions. At the very least, it is clear that

SnapShotRc, SnapShotWeak, SnapShotHashMap would be better names than MagicRc, MagicWeak, MagicHashMap

If you store those as Rc<Box<Snapshot>>, then the memory used by the contents of the box will be freed when the strong refcount goes to zero, and the only reources that the Weaks will keep alive are the reference counts and the memory for the Box’s pointer.

If you need proactive removal of even those resources, I’d probably have a registry of Weak<dyn Fn(MyWeak<Snapshot>)> callbacks somewhere to do the removals— either inside the Rc of each item or a global one depending on what performance vs. memory tradeoff you need. That lets the data structures that hold these decide locally how to handle a drop notification, whether that’s removing it immediately or queueing it to happen after the current operation is finished.


You might also be interested in the im crate, which uses reference counting internally to share state between clones. Snapshotting then becomes a simple clone() of the map, and future updates of the original use a copy-on-write scheme to keep the snapshot unchanged.

3 Likes

Can you clarify if this assumes that we, at compile time, can write a pub struct SnapShotState that captures the state of a snapshot?

What I had in mind was that there would be a SnapShotHandle object, and then, at run time, objects can dynamically add themselves to particular SnapShots. This model makes it a bit more difficult to, at compile time, define a pub struct SnapShotState that fully captures all values.

In particular, if we have crate-snapshot and crate-later-work. I would like crate-later-work to be able to use snapshot functionality, even though crate-snapshot is a dependency of crate-later-work.

I am using the im crate, but even in this case, I still want snapshot to eagerly free. For example, if we delete a bunch of items, no longer refer to the old snapshot, then I want all the values to actually be dropped (they might be holding on to GPU memory / external resources.)

Rc/Arc does drop() its contents as soon as the strong refcount reaches zero. The only resource that Weak keeps alive is the memory allocation that formerly contained the contents, but never any external resources.

As long as there are no cycles, this will recursively drop any network of Rcs as soon as the last inbound strong reference is dropped. This should hold for all of im’s internal reference counts as well: any subtree that isn’t needed by a live reference will be immediately dropped, and its resources freed (less some cpu memory if there are stull Weaks around).

2 Likes

It does, but that can be transitively through other containers like Rc, Box,Vec, and HashMap.

Just write a wrapper around a HashMap<Key, Weak<T>> that exposes the same API as regular HashMap, except that .get() et al return None if the Weak for a given key fails to upgrade to Rc.

@2e71828 , @H2CO3 :

Please correct me if I am wrong. It seems there are two approaches:
(1) snapshots store values, objects store weak refs to location inside snapshot
(2) objects store values, snapshots store ref to objects

It seems you are pushing for (1), whereas I am trying to solve via (2).

No, if the snapshot itself is the thing that should be dropped when the last Rc is dropped, then the snapshot is the thing that should be in an Rc. That is what an Rc does: it drops the contents when the strong count reaches 0.

If you're concerned about the allocated memory and only the allocated memory dedicated to the snapshot when you put it in an Rc (which is not freed until the weak count reaches 0), you can minimize that by putting it in a Box first, but I'm not sure that's the objection you're raising.

What exactly is the objection you're raising? What "resources" are you worried about that will not be cleaned up if you use Weak with lazy removal?

2 Likes

Consider a 2-d table. In the top row, we have objects. In the left column, we have snapshot ids.

Some cells are filled (not all snapshots involve all objects).

When a snapshot-id is dropped, everytying in it's row should be dropped.
When an object-id is dropped, everything in it's column should be dropped.

Who retains strong Rc's to the cell's ? Not both rows & columns; one is going to be strong ref; one is going to be weak ref; so in the weak ref case, we need some way to still clear the corresponding row/column.

EDIT:

Thanks to everyone for insightful clarification questions. I have restructured the question as 2d table dropping of objects

1 Like

"Pushing" is quite a stretch while you are the one asking for advice.


That aside, it's unclear whether "the" hash map you are trying to make is the main storage or the snapshot. But regardless, if you store your data in Rcs, it will be correctly freed upon reaching a strong refcount of 0, as others have already pointed it out, and you can detect this fact by checking whether or not the corresponding Weaks fail to upgrade(). It doesn't have anything to do with how exactly your code is organized, this is how Rc and Weak work in any case.

I apologize if that came off as unnecessairly argumentative. What I wanted to express is suggesting / advocating.

With regards to the second point, I think part of the confusion is that in my initial question, I asked for help with a particular implementation rather than the actual problem. I think 2d table dropping of objects restates it better.

When you talk about snapshots, I think of things like filesystem backups. I want the backup to hold its own copy of a file because I want to be able to restore it to the state it had when I backed it up. Similarly, deleting a file shouldn’t automatically delete all of its backups— that would defeat their purpose.

Using this mental model, I assumed that you’d want a structure kind of like this:

struct Snapshot {
    objects: HashMap<ObjectId, Rc<ObjectData>>
}

struct SnapshotCtrl {
    shapshots: Vec<Rc<Snapshot>>,
    index: HashMap<ObjectId, Vec<Weak<Snapshot>>>
}

But it sounds like you want deleting an object to also delete all of its backups automatically, without affecting other objects that might’ve been stored in the same snapshot. That means your snapshots are no longer immutable, which is an unusual choice and brings lots of complications.

Maybe a good way to move forward would be for you to catalog the various insert, update, and delete operations that the snapshot system as a whole needs to support so that everyone understands the requirements.

1 Like

Valid point. My choice of terminology is definitely confusing here. I see you replied to 2d table dropping of objects so I'll be replying there.