How to store and get a closure from a HashMap in an idiomatic way?


#1

Hi,

I still try to convert the the-super-tiny-compiler from JS to Rust as a learning exercise.

In this compiler you have a “visitor object” which looks like this (in JS):

{
  NumberLiteral: {
    enter(node, parent) {
      // do stuff...
    },
    exit(node, parent) {
      // do stuff...
    }
  },
  StringLiteral: {
    enter(node, parent) {
      // do stuff...
    }
  },
  // maybe more visitors for other node types
}

NumberLiteral and StringLiteral are different types of nodes. The parent param is optional. The enter and exit methods are optional, too.

This object is used like this:

// `node.type` could be `'NumberLiteral'`
let methods = visitor[node.type];

if (methods && methods.enter) {
  methods.enter(node, parent);
}

Here is what I think I need to do to achieve the same in Rust:

I need a HashMap which uses a NodeType enum as key and as values a Visitor struct with enter and exit fields of a type like Option<Box<Fn(Node, Option<Node>)>> (e.g. enter and exit are “optional” - they can be a function which accept a node and optionally a parent node). The Box is there because Fn will be a closure (similar to this example I guess).

However I struggle with implementing this. I already have something like NodeType called Node which is an enum which has associated data. I’d like to reuse this, but I don’t know how to use a certain type of Node as a key in the HashMap without worrying about the associated data. I guess it is not possible, but I’m not sure what is this best way to solve this.

Here is my example. This is the line which currently stops the compiler, because I don’t write { /* fields */ } for Node::Programm (which I can’t know exactly).

If it would compile, than $ cargo test --test test -- --nocapture should print test works!.

I think I have to either not use Node directly, but introduce a NodeType without associated which is also used by Node or don’t use HashMap at all, but a completely different solution?


#2

I think your NodeType is the right idea, and you can have a simple mapping function like

impl<'a> From<&'a Node> for NodeType {
    fn from(node: &Node) -> NodeType {
        match node {
            &Node::Program { .. } => NodeType::Program,
            /* etc. */
        }
    }
}

Or a simpler Node::type(&self) -> NodeType if you don’t want to use From.

Since there are limited types, it might be nicer to use a Vec instead of a HashMap.

Taking a step way back, why not just have a trait Visitor and implement this for Node? If you need to pass these around in a generic way, you just use trait objects, &Visitor or Box<Visitor>.


#3

Thank you so far. I think I’ll try something like Node::type(&self) -> NodeType for now.

So Node is a struct here, right? And type a method returning the associated NodeType? Every Node can have different associated data depending on the NodeType. I think I have trouble expressing this.

In TypeScript I would use a union type which could look like this:

interface Programm {
  body: Array<Node>
}

type Node =
     { type: 'Programm', data: Programm }
   | { type: 'StringLiteral', data: string }
// | other nodes...

I’d write it that way in Rust:


pub enum NodeType {
    Programm,
    CallExpression,
    StringLiteral,
    NumberLiteral,
}

pub struct Programm {
    body: Vec<Node>,
}

pub struct CallExpression {
    name: String,
    params: Vec<Node>,
}

#[derive(Debug,PartialEq,Eq,Hash)]
pub enum Node {
    Programm {
        nodeType: NodeType::Programm,
        data: Programm,
    },
    CallExpression {
        nodeType: NodeType::CallExpression,
        data: CallExpression,
    },
    StringLiteral {
        nodeType: NodeType::StringLiteral,
        data: String,
    },
    NumberLiteral {
        nodeType: NodeType::NumberLiteral,
        data: String,
    },
}

But this doesn’t work, because found value NodeType::Programm used as a type for nodeType: NodeType::Programm.

Since there are limited types, it might be nicer to use a Vec instead of a HashMap.

Do you mean saving all visitors in a Vec and filter for the visitor I need?

Taking a step way back, why not just have a trait Visitor and implement this for Node?

I’m not sure if I need more than one visitor for a given node type in the future. I think the idea behind the given compiler is to run the traverse and transform step with different visitors on the “same” ast, so you could write plugins and custom transform steps.

When I have a trait Visitor and implement this for Node I can only have one visitor for a node type, right?


#4

It doesn’t have to be a struct. You can implement a type method on the enum you had originally.

Right, each variant is not a type in itself. But a Node::Programm is always a NodeType::Programm, so you don’t need a field for this anyway. That’s what I was trying to show in the match node {...} above, that you can just match each Node variant to convert to the corresponding NodeType.

Not necessarily filter - you can set up the node type to represent the integral Vec index, just the same as it was the index of the HashMap.

Hmm, right, it needs to be structured for the kind of visitor. There’s actually a demo visitor pattern here:


#5

Nice. Haven’t thought of that.

I figured out how to implement that. Thank you a lot for your help.

Now I have two other problems (one related to this issue, the other one more general):

1. no field body on type Node (see here) → I can now get the NodeType of a Node with node.get_type(), but how can I do it the other way around? :smiley: I know node is a Node::Programm here. How can I force Rust into knowing that? I can’t do something like (node as Node::Programm).body, because Node::Programm is a value and not a type.

I can only think of rewriting this:

NodeType::Programm => traverse_nodes(node.body, &Some(node)),

Into this:

            NodeType::Programm => {
                match node {
                    Node::Programm { body } => traverse_nodes(body, &Some(node)),
                    _ => (),
                }
            }

This works, but it is very verbose. Can I simplify that, because I know the value of Node and maybe don’t need the match?

2. traverse_node unresolved name (see here) → It looks like a closure can call another closure only if it was declared before in Rust. What is the recommended pattern in Rust to solve this? Both closures need to call each other, so I can’t just switch the order.

Thank you so far!


#6

I think in this case, the outer match on NodeType is unnecessary, not really telling you anything you couldn’t get from Node directly. Just use the inner match on the node itself.

That’s mutual recursion, and recursive closures in Rust are kind of hard.

Plain old functions might be ok for you, passing captured locals as parameters instead. You can still nest the functions in the same place you currently have the closures for scoping their visibility.

Or similarly, you can “desugar” the closures yourself. Create a local struct type with whatever captured data you need like the map, implement traverse_node and traverse_nodes as methods on that struct, and recurse to your heart’s content. :slight_smile:


#7

You’re right good catch. I thought in a too complicated way. Thank you for your review and your time.

I try that now… have to fight a little bit with moved values now :frowning:

That is an interesting solution. Has this any downsides? Maybe that would be even more readable. I’ll try that, too.