Does ref counting have terrible memory access?

Context: playing with an persistent b-tree impl

Question: does ref counting just have absolutely horrible memory access patterns ? When creating a new node, for every children we need to read the ref count, increment it, write it back out. This is basically totally random memory accesses.

Are there clever ways around this or is this just a fundamental limitation of ref counting ?

Yes, reference counting has this sort of bad memory access pattern (and that's one of the reasons why it's less popular in language designs than more complex garbage collection algorithms).

However, note that you can use Rust's core features to avoid updating reference counts in some common cases:

  • If you move an Rc<T> from one place to another, no refcount updates happen.
  • If you borrow an Rc β€” that is, pass around a &Rc<T> β€” then again, no updates happen, unless some function decides it needs to keep the reference and clone()s it.
  • (And, of course, you can borrow the contents, &T, if cloning isn't expected either.)

Thus, it's easy to write code with a minimal amount of reference count updates which would require lots of care without Rust's moves and borrows.

That said, this doesn't help you with your stated problem β€” if you create a new parent for some nodes in a persistent tree, then yes, you have to increment the counts of every child. Note that this isn't especially worse than e.g. iterating over the tree β€” that involves β€œrandom” accesses to many nodes too.

7 Likes

At this point I find myself wondering: why not just use an arena-based tree e.g. (shameless plug) gctree?

2 Likes

I am playing with my own implementation of 'persistent vector backed by btrees'. It is already arena based for allocations.

I am curious why you picked gc over a custom ref-count (in this case, a Vec). I am agnostic on what the right solution is, and am interested in hearing the thought process that led you to pick gc.

The GC in this case refers to the tree nodes, not general garbage collection.

When you delete a subtree, the nodes in that subtree are not dropped, but retained within the tree. In essence it's a performance optimization. When a new node is created, if there is a garbage node then that is used. Only if there is no garbage node available is a new node allocated by the allocator used by the program.

The reason I included it is because since the Node type is generic, when instantiated to a type it could be arbitrarily large (and thus potentially expensive to drop). In addition, my own use cases for the crate favor faster runtime speeds over smaller memory consumption.

2 Likes

I am rusty on rust, but isn't it the case you can have a "persistent" (shared) tree where only the tree as a whole has a ref count? Maybe you mean the "internals" of the tree are shared, but that would seem an unusual use case to me, regarding a btree as a kind of container.

[ Big disclaimer: I may be talking nonsense, as I said I am VERY rusty on rust ]

Internals

Internals are shared. See: Persistent data structure - Wikipedia

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.