How to convert unstructured tree into structured / recursive data?

This is a pretty general question, but hopefully someone here can have some guidance. For context, the problem I'm trying to solve is building an AST from pulldown-cmark. I will present the problem in a simplified manner.

I have a tree of Tags. A Tag is similar to the following enum:

enum Tag {
    Paragraph,
    List,
    Strikethrough
}

I also have a Tree that is built from these tags.

For reference, let's say the Tree looks like this:

struct TreeNode {
    children: Vec<TreeNode>
    data: Tag
}

This is the actual tree. The actual tree has a different format with a single child field, and each node has a next field to go to a sibling.

The issue is, the tree does not express child / parent constraints well. For example, a node with a data field set to Paragraph could have a child node with the data field set to List even though that wouldn't make sense.

I want to turn this tree into a strongly typed recursive data structure that has these constraints built in.

For example:

enum Block {
    List(Vec<Block>),
    Blockquote(Vec<Block>),
    Paragraph(Vec<Inline>),
}

enum Inline {
    Strikethrough(Vec<Inline>),
     Italic(Vec<Inline>),
     Link((String, String)),
    Text,
}

However, I'm not sure how to do this. Every idea I have seems like it involves a ton of boilerplate b/c I need to do something of the form "If the current node is X, then the only acceptable child nodes are Y & Z" and this seems like a ton of manual labor.

Does anyone have thoughts?

1 Like

This is roughly what I do in grafiea:

fn walk_block(node: &TreeNode, block: &mut Vec<Block>) {
   match node.tag {
     // add to block directly
     // or recurse into walk_block
     Tag::List => {
       let inner_block = Vec::new();
       for n in node.children {
         walk_block(n, &mut inner_block);
       }
       block.push(Block::List(inner_block));
     }
     // or call walk_inline
   }
}
fn walk_inline(node: &TreeNode, block: &mut Vec<Inline>) {
...
}

I prefer to pass a &mut Vec to the function instead of the function returning a Vec because it adds the flexibility of pushing any number of items to the list.

You could generate the conversion TreeNode -> AST with macros: here is a rough example: Rust Playground
There is the cost of writing the macro, but after that the code is pretty clean:

enum Tag {
    Block,
    List,
    Blockquote,
    Paragraph,
    Inline,
}

struct TreeNode {
    children: Vec<TreeNode>,
    data: Tag,
}

enum ConversionError {
    NoEnoughChildren,
    IncorrectTag(Tag),
}

#[convert_ast]
enum Block {
    List(Vec<Block>),
    Blockquote(Vec<Block>),
    Paragraph(Vec<Inline>),
}

#[convert_ast]
enum Inline {}

// etc...

Of course maybe generating the conversion is not so simple...