"nested" lifetimes in structs? Is this possible to express?

I'm trying to build a parser for a language where files can include other files. I don't need to go into details about the language as the problem I will describe is pretty generic. Specifically, my parser, instead of creating copies of bytes from the file/input it is parsing, keeps references to byte slices, such that we have a (tremendously simplified) struct like:

struct ASTNode<'src> {
  // actually a tree of nodes, but pretend we have only one thing right now.
  attr: &'src [u8],
}

// used like
let contents: Vec<u8> = // read from a file.
let ast = Parser::parse(&contents); // ast now refers to bytes owned by contents

Now, I'm confused about how I would support these include statements. Every time the parser encounters one of these, it is going to create a sub parser, and acquire an AST that refers to bytes owned by a vector created in parse(). Apart from trying to solve the problem of how that vector is made to live long enough in the program space (which is more of an API design problem than a Rust question), I'm very confused how to express that nesting in the AST. The straightforward, but invalid way would be:

struct ASTNode<'src> {
  attr: &'src [u8],
  includes: Vec<ASTNode<'other_src>>, // where 'other_src is actually going to be different for each entry in the vec as each include has a lifetime bound to a separate bytes vector.
}

Is there a good solution out of this?
One thing I can think of is having some kind of arena with a fixed lifetime, and all allocations to read a file are made in that arena (perhaps just a giant Vec where I keep appending every file read) and then having all ASTNodes bound by the lifetime of that arena. Is this the only reasonable approach?

Thanks!

The arena approach is a reasonable one; another is to store a Rc<Vec<u8>> of the source file in each AST node. That will keep the entire source file in memory as long as an AST node needs to reference it.

Is there any advantage of Rc<Vec<u8>> over Rc<[u8]>?

Rc::make_mut() can be used to implement simple persistent data structure efficiently, but I don't think it applies OP's use case.

Using Rc<Vec<u8>> introduces two levels of indirection, you dereference the Rc to get the Vec, then you dereference the Vec's pointer field to get at the bytes. The Rc<[u8]> stores the bytes inline (implemented somewhere around here), so the compiler can calculate a pointer to the bytes based on the Rc's value on the stack.

1 Like

Probably not; my brain doesn’t like putting unsized types anywhere but a box; the compiler is fine with it.

The only two things a Rc<Vec<u8>> allows you to do that a Rc<[u8]> doesn't are:

  1. Produce a &Vec<u8> to the data.
  2. If there is only one handle to the vector, then you can change its length with Rc::get_mut.

Note that the first is only useful if you are dealing with bad libraries that take an &Vec<u8> instead of an &[u8], which never makes sense to do.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.