How does Rust compiler deal with circular references?

Sometimes I've asked in regards to the Gc crate, as I was wanting to develop a programming language on Rust. I had gave up before because cloning Gc references was sucking and someone mentioned the Dynclone crate, which didn't work with Gc.

I need to use Gc only for symbols. It is possible for two symbols to reference each other, basically because I've a name-to-symbol mapping, though there are possibly other reference loops as well.

So, today I'm trying again to clone a Gc:

impl Symbol {
    fn f(self: &Gc<Symbol>) {
        let s = self.clone();
        // do something with s
    }
}

Sadly I get:

error[E0658]: `&Gc<(dyn Symbol + 'static)>` cannot be used as the type of `self` without the `arbitrary_self_types` feature
 --> src\shockbasic-sdm\src\symbols\mod.rs:8:16
  |
8 |     fn f(self: &Gc<Symbol>)

This seems to be available with nightly builds only, but I can't use nightly because the Rust Language Server for Visual Studio Code will not work right now with nightly. I don't want to stop using this IDE.

I'm not sure whether I've tried on nightly before, but I'm pretty sure I tried hard on this cloning situation.

Since Rust compiler also does have a semantic data model, how does it handle these circular references or does it lack?

I’m not deeply familiar with rustc, but AFAIK it uses arenas for lots of things which would allow creating circular structures.

More information e.g. here: Memory Management in Rustc - Guide to Rustc Development

1 Like

I've written compilers in Rust before, and I usually used the following pattern for implementing a symbol table or a type registry:

  • Types and symbols are created using factory functions.
  • They are interned and cached, so that every distinct type has a single instance computed on-demand.
  • The cache is the single source of truth. It holds types by strong reference, behind an Rc.
  • Types only refer to each other using Weak pointers. It is always safe to upgrade and unwrap those, since the cache always holds a strong reference to them.
  • Types are in addition behind a RefCell so that the circularity can actually be achieved by mutation after the fact.
  • The last two constraints are usually encapsulated in a pair of container types, which I call RcCell and WeakCell. These abstract away the annoying bookkeeping, and they do one other thing: they define Eq and Hash in terms of pointer identity, for fast interning.
5 Likes

That idea of building a cache makes sense! Thanks!

When using RcCell and WeakCell, do you mean Rc<RefCell<T>> and Weak<RefCell<T>> (from crate rccell) or RefCell<Rc<T>> and RefCell<Weak<T>>? Also, isn't it possible to have interior mutation by using Cell anyway?

I mean Rc<RefCell<T>>; that is the idiomatic way, because doing it the other way around isn't useful (RefCell<Rc<T>> doesn't allow mutation of the pointed object).

Cell only works with Copy types.

If I used RefCell<T> my Symbol trait would not be object safe, for example:

trait Symbol {
    fn callee(self: RcCell<Self>) -> Option<RcCell<dyn Symbol>> {
        None
    }
}

Due to the self parameter, it'd not be object safe, while simply self: Rc<Self> would leave the surrounding trait as object safe.

Why don't you simply accept &self in the trait? You can then impl Symbol for RcCell<Concrete>.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.