Is it possible to, in Rust, implement an interpreter for a language that does not follow Rust's reference semantics?

Hello! I'm new to Rust. Usually when I'm new to a language I implement a super-simple toy interpreter as I've found that lets me delve into loads of different topics to get to know the language. So, I've been doing this with Rust, and it's been fine so far, until I tried to implement objects, which… has been rather difficult.

The main issue I've been having is that this language is your usual dynamic language that lets you mutate anything at any time with the mark-sweep GC cleaning up behind you etc so, this in my view basically violently backstabs the Rust borrow checker and I'm not sure how it can ever work without causing UB, especially with regards to allowing multiple mutable references to the same object, like yes sure I could slap a RefCell over the object but that just goes against the language's semantics because I do want to allow mutable aliasing!

I just don't see how this could work without well, spreading *mut everywhere and just doing that, I guess. I feel like there’s a super obvious way to do this, I mean, surely people have done interpreters in Rust before, do they just pay the price of unergonomic pointers and {Cell, RefCell, UnsafeCells}s everywhere? Did I just walk into the worst part of Rust without even knowing? I am sorry if for rambling but I have been fighting the compiler for like a week now and it's been honestly driving me crazy.

So yeah. I guess I want some pointers (ha) with regards to all of this. Thank you

3 Likes

A pretty common way to design around this (without sticking everything into its own Rc<RefCell<T>>) is to do two things:

  1. Have a unified value type:
    enum Value {
        Int(isize),
        Float(f64),
        String(String),
        Complex(HashMap<String, usize>),
        // etc...
        Reference(usize),
    }
    
  2. Have your stack be one big vector of values, and keep some kind of map from variable names to indices:
    let mut stack = Vec::<Value>::new();
    let mut names = HashMap::<String, usize>::new();
    
    Then passing stack and names by value, and doing some smart way of dealing with scoping would let you mutate values even though you have "references" to them. The references here are just indices into the stack array.

This idea of using indices into an array rather than references is the classical way of dealing with self-referential structures, like the one it sounds you're constructing here.

4 Likes

Also, welcome to Rust!

Hope you have a great time!

2 Likes

RustPython, a Python Interpreter written in Rust, basically wraps every python objects with Rc<RefCell<T>>. Actual code is a bit more complex to support multithreading only if certain compile flag is enabled.

3 Likes

It allows dynamically checked mutable aliasing. It doesn't go against the "language's semantics". Rust doesn't care whether you use Rc and RefCell. You may very well need to use these VERY useful types, when reference validity cannot be statically checked. It's a matter of picking the right tool for the job. You do need to consider reference cycles and avoid memory leaks. In my experience this isn't usually a big problem.

3 Likes

Hi, thank you for replying. Yes! I did this before after searching about this (specifically using the slab crate), but I found that I would need to pass the stack to basically every function that needed to actually get the object behind an index reference and that was really unergonomic and annoying… so instead of that I tried to use pointers.

But I'm scared that's not really going to work going forward because if I use Deref/DerefMut or something to turn the pointers into references I'm gonna have to deal with borrowing and potential UB.

To exemplify further my main fear is some program doing a.some_method(a) (in the language, not Rust), which would basically be "borrow a mutably, call some_method passing a pointer to a, some_method inspects a by borrowing it mutably—oh dear” which causes UB because you'd have two mutable references active at once. I'm not sure if I'm just thinking the aliasing rules are more strict than they really are or not.

I made a playground demonstrating this: Rust Playground. I know the example is a bit contrived but that's what I've been having trouble with. This program should panic once you run it.

That would work but if I didn't have cycles, and I'm implementing a tracing GC, and not reference counting.

If you look at my example above, something a bit cyclic would cause a panic with RefCell. I do want to allow dumb cyclic things though.

Panics with RefCell can be avoided when you keep the borrow_muts really short. No overlapping with other borrow_muts of what could be the same RefCell. No chain of borrow_muts “into” a struct that could be cyclic. Then you should be safe. If you need to mutate two things like in that playground, only borrow_mut one at a time while you actually mutate it. If you need to go deeper into something that could be a cyclic struct, only .borrow() or .borrow_mut() yourself one level deep, then .clone() the Rc<RefCell<_>> inside; drop the Guard you had from the outer borrow/borrow_mut, and only then .borrow_mut() yourself your way into the inner Rc<RefCell<_>> you obtained. This way there’s no panics even with cyclic structs.

Really, with this kind of approach, Rc<RefCell<…>> can do almost everything that your usual garbage collected languages offer, except from cleaning up cyclic things properly or supporting multi-threading.

2 Likes

Anyways, since you want to implement a yourself GC after all, probably there’s no need for Rc in the first place. If you use a Vec for your stack (as suggested above) and perhaps some form of arena (e.g. this crate) for a heap, you can mutate the values by indexing into those data structures directly, so there’s probably no need at all for RefCell/Cell/etc.

Edit: Ah, I hadn’t read the whole discussion above, you mentioned slab, which is basically the same idea as generational_arena, etc, so I’m not really providing new input here :sweat_smile:

Right, of course, but in a previous post I said that's pretty annoying, because I need to pass the “object slab” everywhere in order to exchange an index for an object. I could make a global variable but that would be even worse I think lol, not to mention that I would need to make it threadsafe if I make a global variable, even though I don't care about threading; I looked into thread-local storage but it seems like a bit of a faff in Rust, due to the with thing.

So I think I'm a bit stuck between a rock and a hard place, right? I either weave the “object slab” into many functions making it ugly and unergonomic, or I use pointers and deal with unsafety and potential UB.

Well, whilst writing this I just realised that even with the slab I don't think it would work. Like the previous example, I would have say slab[index].some_method(slab, other_index); but slab[index] returns &mut T and some_method takes in a &mut Slab<T>… that's the E0499 compiler error, “A variable was borrowed as mutable more than once”?

You need to keep both borrows and borrow_muts "short". If you have a "long" borrow which calls something that then does a borrow_mut on the same RefCell, that's a panic.

Cycles don't causes panics. If you keep the borrows short (which is easy), you won't have any panics. At some point you need to break the cycles though, or you will leak memory.

I can't control the cycles, since this is an interpreter for a language that doesn't follow Rust's semantics. As for borrows I could do what @steffahn said to “keep borrows short” but I definitely wouldn't call that easy, it seems rather error-prone! :frowning:

It is possible to get it wrong. For example in the code interpreter I am making I have this code:

  /// Load the named routine.
  pub(crate) fn load_routine( self: &DB, name: &ObjRef ) -> Option< RoutinePtr >
  {
    if let Some(r) = self.routines.borrow().get( name )
    {
      return Some( r.clone() );
    }
    meta::get_routine( self, name )
  }

If I wrote it instead like this:

  /// Load the named routine.
  pub(crate) fn load_routine( self: &DB, name: &ObjRef ) -> Option< RoutinePtr >
  {
    if let Some(r) = self.routines.borrow().get( name )
    {
      return Some( r.clone() );
    }
   else
   {
      meta::get_routine( self, name )
   }
  }

it no longer works because the borrow() last too long ( meta::get_routine updates routines ).
So you have to be careful, yes. But nevertheless, it's essentially a "trivial" problem.

1 Like

You could put it into a thread_local in a RefCell. If you don't have multiple threads, this way (with a thread local) you don't need to give it thread-safety.

Yes, but:

The with method yields a reference to the contained value which cannot be sent across threads or escape the given closure.

So I can't really do much with it. Like a function to get a reference to the object at index i is kinda impossible.

I know you don't like the idea, but I think you might find it less burdensome than you imagine. Typically, I'd make my functions methods of the "all my global data are here" type. They may not look like what you're used to, but the flow will be straightforward, and it's clear which functions potentially mutate anything in the world, and which ones don't, and on top of that the compiler will force you to make the code explicit in terms of ordering the mutations of your state, which is a good thing. e.g. I wouldn't want to assume that the order of calls to evaluate arguments in your you language is that same order that rust uses.

I'll also point out that if you're using this to learn to write rust code, you may as well try to write decent rust code. You may not like it (yet) but storing indices into an array is a common trust idiom, used for all sorts of graph-like structures, which is exactly what you've got in your program.

6 Likes

After thinking about it for a while, yeah, you're right. I'm not sure still how to deal with this, that I posted earlier,

but it's probably through minimising borrows somehow I guess.

And yes this is very much my intention :slight_smile: It's kinda funny that I grasped lifetimes, borrows etc. rather easily but this specific thing is just killing me. This is more philosophical (and unrelated I guess) but Rust kinda feels more like a functional language than an imperative one. It's just a whole different point of view.

Cheers!

1 Like

I got a playground link, which seems to have been edited or something. I think it needed to be more like this: Rust Playground

You put the RefCells INSIDE the struct definitions.

You can always side-step the problem and drop to a lower layer. For example, you don't have to represent objects in your interpreter with rust objects. You can always just have a big Vec<AtomicU8> as "memory", with usizes for pointers, and read/write that.

That's, of course, not particularly nice. But it's possible.

4 Likes

oh god I hate that I kinda want to do this :rofl:

1 Like