Global references to unique values

I'm writing an interpreter for the toy Monkey programming language, following Writing an Interpreter in Go.

In the section of the book that I'm on, we are building an evaluator for boolean expressions. The author is defining global references to some unique, constant values like TRUE, FALSE, and NULL, to avoid re-creating them each time one needs to be used.

For example, here is some Go code from the book:

(In object.go)

const ( // [...]
       BOOLEAN_OBJ = "BOOLEAN"
       NULL_OBJ  = "NULL"
   )

type ObjectType string

type Object interface { 
    Type() ObjectType 
}

type Boolean struct { 
    Value bool
}

func (b *Boolean) Type() ObjectType { return BOOLEAN_OBJ }

type Null struct{}
func (n *Null) Type() ObjectType { return NULL_OBJ }

(In evaluator.go)

var (
    NULL = &object.Null{}
    TRUE = &object.Boolean{Value: true} 
    FALSE = &object.Boolean{Value: false}
)

And then they can be used like so:

func Eval(node ast.Node) object.Object { // [...]
     case *ast.Boolean:
         return nativeBoolToBooleanObject(node.Value)
        // [...]
}

func nativeBoolToBooleanObject(input bool) *object.Boolean { 
    if input {
        return TRUE 
    }
    return FALSE 
}

The author says "Now there are only two instances of object.Boolean in our package: TRUE and FALSE and we reference them instead of allocating new object.Booleans," and the same goes for NULL.

Now, here is my attempt to translate this into Rust.

(in object.rs)

#[derive(Debug, Clone, Default)]
pub enum Object {
    Integer(i64),  // Adding this just to indicate we have many types of objects.
    Boolean(bool),
    #[default]
    Null,
}

(in evaluator.rs)

use std::sync::Arc;

use lazy_static::lazy_static;

use crate::ast::{Expression, Program, Statement};
use crate::object::Object;

pub type BoxedObject = Arc<Object>;

lazy_static! {
    static ref NULL: BoxedObject = Arc::new(Object::Null);
    static ref TRUE: BoxedObject = Arc::new(Object::Boolean(true));
    static ref FALSE: BoxedObject = Arc::new(Object::Boolean(false));
}

fn eval_expression(expression: &Expression) -> BoxedObject {
    match expression {
        Expression::IntegerLiteral { value, .. } => Arc::new(Object::Integer(*value)),
        Expression::Boolean { value, .. } => {
            if *value {
                Arc::clone(&TRUE)
            } else {
                Arc::clone(&FALSE)
            }
        _ => todo!(),
    }
}

As you can see, I'm using an enum to hold all the different object types, which seems more idiomatic than using, say, dyn Trait. But other than that, I think my code hews closely to the original Go code.

My question is: Is using Arc<Object> like this for unique objects holding TRUE, FALSE, and NULL the most optimal and idiomatic approach for imitating the global references we see in the Go code above? Is there a better way? This interpreter isn't going to be the most performant thing in the world, but still, I don't want to use Arc if it's way overkill and there are better solutions, and I'm not sure if it's the best use of static variables, either. At the same time, I don't know of another way to reuse the same global TRUE, FALSE, and NULL values, and the Internet hasn't been helpful in giving solutions. (ChatGPT suggested Cow, but I'm not sure it really understood what this code is trying to do.)

1 Like

why use references? use values instead. you can just define them as constants.

4 Likes

I agree that it's unclear why you would need these to be static, and even if you did, why they couldn't be Object typed.[1]

...but to be perfectly honest I'd probably use a constructor or From<bool> implementation or such over consts anyway.


  1. The only thing I thought of was pointer equality shenanigans which don't seem worth much. ↩︎

1 Like

The docs say "constants are inlined wherever they’re used, making using them identical to simply replacing the name of the const with its value. Static variables, on the other hand, point to a single location in memory, which all accesses share. This means that, unlike with constants, they can’t have destructors, and act as a single value across the entire codebase." In my case, wouldn't I want to have a single value across the entire codebase? With a const, the values would be allocated every time.

why you would need these to be static , and even if you did, why they couldn't be Object typed.

Having them be Object without wrapping them in an Arc would run afoul of the borrow checker, since Object is not Copy (I could make it so right now, but may not be able to depending on additional variant values I may have to add later).

If they're read-only, it doesn't matter if they're the same value or inlined everywhere. You'd only be able to tell the difference if you checked pointer equality. You should use constants unless your value can't be const constructed, which seems to be a possibility in the future, since you don't want to require Copy.

Const items can't heap allocate, since that would make them impossible to be const constructed. They might take up more space on the stack than a static, but using a reference to a const should be pretty similar to a static (probably only one instance per literal use at the most?).

Arc is no more copyable/clonable than Object. The only thing that gives you is being able to clone them somewhat cheaply and shared ownership. But for statics, you can already get as many & references as you want essentially for free at any time, so there's no need for Arc.

2 Likes

There's no allocation happening for the Boolean and Null variants of Object. (And Rust doesn't have autoboxing or the like.)

Incidentally, I did think of a potential use for consts (or static)s over just constructors: ease of comparison, should you need to compare to the boolean case a lot.

I'm not following your reasoning here. Maybe I should be more explicit about what I meant...

fn eval_expression(expression: &Expression) -> BoxedObject {
    match *expression {
        Expression::IntegerLiteral { value, .. } => Arc::new(value.into()),
        Expression::Boolean { value, .. } => Arc::new(value.into()),
        Expression::Empty => Arc::new(Object::NULL),
        // A `const`                  ^^^^^^^^^^^^
        _ => todo!(),
    }
}

You can still have const non-allocating versions of Object which you can construct on the spot (and then put them in an allocating Arc). There's no borrow or move problems because we're constructing the Object on the spot.

If we make this change though:

    pub static NULL: Object = Object::Null;

We can't do

        Expression::Empty => Arc::new(object::NULL),

Because we're no longer constructing the Object on the spot, we're trying to move a static.

For these trivial variants, the constructing-on-the-spot won't cost any more than allocating a stack slot the size of your enum and writing a handful of bytes (the discriminant and the value, if any). There's a good chance some of that will be optimized away even.

For the static approach you could just add a .clone(). And for these trivial cases, the clone is probably going to be as cheap as construction. But if you're only going to have these globals for the trivial cases anyway, why not just construct them on the spot?

(If you're going to be building complicated globals which allocate and the not, that could be a different story.)

2 Likes

But for statics, you can already get as many & references as you want essentially for free at any time, so there's no need for Arc.

I am struggling to wrap my head around how I can use references, since that would make me have to attempt to return a temporary value in eval_expression, when I am returning Object::Integer.

Sorry, I missed that the first time.

The suggestion of Cow is almost right, but you need to make your own since Cow unifies T and &T, not Arc<T> and &T.

pub enum ArcCow<'a, T> {
    Owned(Arc<T>),
    Borrowed(&'a T),
}

static NULL: Object = Object::Null;
static TRUE: Object = Object::Boolean(true);
static FALSE: Object = Object::Boolean(false);

pub fn eval_expression(expression: &Expression) -> ArcCow<'static, Object> {
    match expression {
        Expression::IntegerLiteral { value, .. } => {
            ArcCow::Owned(Arc::new(Object::Integer(*value)))
        }
        Expression::Boolean { value, .. } => {
            if *value {
                ArcCow::Borrowed(&TRUE)
            } else {
                ArcCow::Borrowed(&FALSE)
            }
        }
        _ => todo!(),
    }
}

The benefit of Cow is that you can use it just like it's a reference, so you can add some implementations to make that possible.

// Can't be derived because it doesn't require `T: Clone`
impl<'a, T> Clone for ArcCow<'a, T> {
    fn clone(&self) -> Self {
        match self {
            Self::Owned(arc) => Self::Owned(Arc::clone(arc)),
            Self::Borrowed(bor) => Self::Borrowed(bor),
        }
    }
}

use std::ops::Deref;
impl<'a, T> Deref for ArcCow<'a, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        match self {
            Self::Owned(arc) => &**arc,
            Self::Borrowed(bor) => bor,
        }
    }
}

Cow has a bunch more impls which should all be possible, but these are the most important ones. Also, you might want to pick a different name since it no longer provides clone-on-write.

If you want it to be more abstract, you can pre-build the statics.

pub type BoxedObject = ArcCow<'static, Object>;

static NULL: BoxedObject = ArcCow::Borrowed(&Object::Null);
static TRUE: BoxedObject = ArcCow::Borrowed(&Object::Boolean(true));
static FALSE: BoxedObject = ArcCow::Borrowed(&Object::Boolean(false));

pub fn eval_expression(expression: &Expression) -> BoxedObject {
    match expression {
        Expression::IntegerLiteral { value, .. } => {
            ArcCow::Owned(Arc::new(Object::Integer(*value)))
        }
        Expression::Boolean { value, .. } => {
            if *value {
                TRUE.clone()
            } else {
                FALSE.clone()
            }
        }
        _ => todo!(),
    }
}

If you have non-const-constructible statics then you'll need to make an unused static to hold the thing you're referencing.

If you change these from statics to consts, then you can omit the clone.

if *value {
    TRUE
} else {
    FALSE
}

With all that said, I don't think there's any value to ensuring there's only one TRUE in the process. I think the Go version is just hindered by the difficulty of keeping things on the stack. In Rust, this is the default, and you can get references to stack values easily. So you should ditch Arc and just do

const TRUE: Object = Object::Boolean(true);

If you have values that are worth keeping unique, like a long string, then you can add that into Object.

enum Object {
    UniqueString(Arc<str>)
}

Oh yeah if you don't care about Arc for the integers, you can just use a regular Cow. I just realized this is probably what you actually wanted:

pub type BoxedObject = std::borrow::Cow<'static, Object>;

const NULL: Object = Object::Null;
const TRUE: Object = Object::Boolean(true);
const FALSE: Object = Object::Boolean(false);

pub fn eval_expression(expression: &Expression) -> BoxedObject {
    match expression {
        Expression::IntegerLiteral { value, .. } => {
            BoxedObject::Owned(Object::Integer(*value))
        }
        Expression::Boolean { value, .. } => {
            if *value {
                BoxedObject::Borrowed(&TRUE)
            } else {
                BoxedObject::Borrowed(&FALSE)
            }
        }
        _ => todo!(),
    }
}

the real problem is you are trying to mimic what people do in Go language. the code snippet of OP boxed every thing, including simple types with value semantics like Boolean and Null. I guess part of the reason is the Go version uses the runtime type information of the native language to implement the dynamic types of the interpreter. but in Rust, we don't have built in "runtime type information" support in the language itself, the programmer instead often encodes the dynamic types with a sum type, a.k.a. enum. we can use trait object and dynamic dispatch, but that's often less performance due to the extra allocations for simple values (and non-idiomatic).

one other problem of reference semantic in rust is, even if you direct translate the Go code into rust using trait objects, it is still not enough because you can't compare trait objects for equality by default, you have to implement Eq on the trait object type too, which is not trivial for a new learner. for Box (and Arc), we have ptr_eq, but it doesn't behave correctly, for Integer objects, for instance.

so in summary, reference semantics for dynamic types is possible, but it's easier and more idiomatic to use value semantic. ditch the type BoxedObject = Arc<Object>, just use Object. or rather, I would suggest call it Value instead of Object. instead of trying to create "unique" OBJECTS of FALSE, TRUE, or NULL, just use Object::Boolean(true) as constant VALUE. you can even be fancier by implement PartialEq<bool> so you can compare directly to primitive bool:

impl PartialEq<bool> for Object {
	fn eq(&self, other: &bool) -> bool {
		match self {
			Object::Boolean(ref b) => *b == *other,
			_ => false,
		}
	}
}
fn test() {
	let b = Object::Integer(42);
	if b == true {
		//...
	}
}
7 Likes

In Go with interface pointer types allocation is mandatory. The author seems to replace N allocations with single global allocation which makes sense. In Rust enum doesn't allocate on its own. And you're trying to replace zero allocation with single global allocation here.

8 Likes

That's what I'm trying to avoid :slight_smile:

the Go version uses the runtime type information of the native language to implement the dynamic types of the interpreter

The Go version actually defines a Type method on the Object interface, which I excluded here since I didn't think it was relevant to my implementation.

we can use trait object and dynamic dispatch, but that's often less performance due to the extra allocations for simple values

you can't compare trait objects for equality by default, you have to implement Eq on the trait object type too, which is not trivial for a new learner.

I agree, I've had to implement Eq for dynamic trait objects before, and that, among other things, is why I'm avoiding trait objects in this implementation, and encoding the Monkey objects as a sum type instead.

In Go with interface pointer types allocation is mandatory. The author seems to replace N allocations with single global allocation which makes sense. In Rust enum doesn't allocate on its own. And you're trying to replace zero allocation with single global allocation here.

That's a good insight about Go, thanks!

With all that said, I don't think there's any value to ensuring there's only one TRUE in the process.

Makes sense!

For what it's worth, the book's author does use the unique nature of TRUE and FALSE to make a minor optimization for boolean comparisons, but I don't know if it's worth using Arc and then doing ptr_eq to replicate that.

In all, it seems like static and Arc are not worth the trouble here, at least at the stage I'm on.

That optimization is built-in to Arc.

3 Likes

I made a few programming language interpreters in the past, and they all had something like an Object data type that you have. (Example)
I found that there is a representation that is usually quite efficient for these kind of data types.

Let's assume that an object is either a boolean or a list of objects.
Then we could do it the way that you originally did:

enum Obj1 {
    Bool(bool),
    List(Vec<Arc<Obj1>>),
}

I realised that it is in most cases more desirable to do it as follows:

enum Obj2 {
    Bool(bool),
    List(Arc<Vec<Obj2>>),
}

(Notice that Arc and Vec are swapped between the two versions.)

Why is that better? Because whenever you have a list of n elements, Obj1 uses n Arc instances, whereas Obj2 uses a single Arc instance.
(There are use cases where Obj1 may actually be better, but in all my scenarios so far, Obj2 was preferable.)

Suppose, for example, that you have a list of three booleans: [true, false, true].
Representing this in Obj1 takes at least three Arcs, because each boolean is wrapped in one.
Representing this in Obj2 takes only one Arc, because the list is wrapped in one.