How does Rust compiler deal with circular references?

Sometimes I've asked in regards to the Gc crate, as I was wanting to develop a programming language on Rust. I had gave up before because cloning Gc references was sucking and someone mentioned the Dynclone crate, which didn't work with Gc.

I need to use Gc only for symbols. It is possible for two symbols to reference each other, basically because I've a name-to-symbol mapping, though there are possibly other reference loops as well.

So, today I'm trying again to clone a Gc:

impl Symbol {
    fn f(self: &Gc<Symbol>) {
        let s = self.clone();
        // do something with s
    }
}

Sadly I get:

error[E0658]: `&Gc<(dyn Symbol + 'static)>` cannot be used as the type of `self` without the `arbitrary_self_types` feature
 --> src\shockbasic-sdm\src\symbols\mod.rs:8:16
  |
8 |     fn f(self: &Gc<Symbol>)

This seems to be available with nightly builds only, but I can't use nightly because the Rust Language Server for Visual Studio Code will not work right now with nightly. I don't want to stop using this IDE.

I'm not sure whether I've tried on nightly before, but I'm pretty sure I tried hard on this cloning situation.

Since Rust compiler also does have a semantic data model, how does it handle these circular references or does it lack?

Iā€™m not deeply familiar with rustc, but AFAIK it uses arenas for lots of things which would allow creating circular structures.

More information e.g. here: Memory Management in Rustc - Guide to Rustc Development

1 Like

I've written compilers in Rust before, and I usually used the following pattern for implementing a symbol table or a type registry:

  • Types and symbols are created using factory functions.
  • They are interned and cached, so that every distinct type has a single instance computed on-demand.
  • The cache is the single source of truth. It holds types by strong reference, behind an Rc.
  • Types only refer to each other using Weak pointers. It is always safe to upgrade and unwrap those, since the cache always holds a strong reference to them.
  • Types are in addition behind a RefCell so that the circularity can actually be achieved by mutation after the fact.
  • The last two constraints are usually encapsulated in a pair of container types, which I call RcCell and WeakCell. These abstract away the annoying bookkeeping, and they do one other thing: they define Eq and Hash in terms of pointer identity, for fast interning.
5 Likes

That idea of building a cache makes sense! Thanks!

When using RcCell and WeakCell, do you mean Rc<RefCell<T>> and Weak<RefCell<T>> (from crate rccell) or RefCell<Rc<T>> and RefCell<Weak<T>>? Also, isn't it possible to have interior mutation by using Cell anyway?

I mean Rc<RefCell<T>>; that is the idiomatic way, because doing it the other way around isn't useful (RefCell<Rc<T>> doesn't allow mutation of the pointed object).

Cell only works with Copy types.

If I used RefCell<T> my Symbol trait would not be object safe, for example:

trait Symbol {
    fn callee(self: RcCell<Self>) -> Option<RcCell<dyn Symbol>> {
        None
    }
}

Due to the self parameter, it'd not be object safe, while simply self: Rc<Self> would leave the surrounding trait as object safe.

Why don't you simply accept &self in the trait? You can then impl Symbol for RcCell<Concrete>.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.