How to parse AS3 include directive

I wrote a lot of parsing for the ActionScript 3 language for Adobe AIR, however the include directive is leading me to instantiate a subparser, which is causing an outliving borrow.

fn parse_include_directive(&mut self, context: DirectiveContext, start: Location) -> Result<(Rc<ast::Directive>, bool), ParserFailure> {
    // ...

    let mut replaced_by_source: Option<Rc<Source>> = None;

    // ...

    // If source was not resolved successfully, use a placeholder
    if replaced_by_source.is_none() {
        replaced_by_source = Some(Source::new(None, "".into(), &self.tokenizer.source.compiler_options));
    }

    // ...

    let replaced_by_source = replaced_by_source.unwrap();

    // ...

    // Parse directives from replacement source
    let replaced_by = Self::parse_include_directive_source(&replaced_by_source, context);

    // ...
}

fn parse_include_directive_source<'a: 'input>(replaced_by_source: &'a Rc<Source>, context: DirectiveContext) -> Vec<Rc<ast::Directive>> {
    let mut parser = Self::new(&replaced_by_source);
    if parser.next().is_ok() {
        parser.parse_directives(context).unwrap_or(vec![])
    } else {
        vec![]
    }
}

I'm getting a compilation error at the above line let replaced_by = Self::parse_include_directive_source(&replaced_by_source, context);

error[E0597]: `replaced_by_source` does not live long enough
    --> crates\as3_parser\src\parser.rs:3321:64
     |
15   | impl<'input> Parser<'input> {
     |      ------ lifetime `'input` defined here
...
3315 |         let replaced_by_source = replaced_by_source.unwrap();
     |             ------------------ binding `replaced_by_source` declared here
...
3321 |         let replaced_by = Self::parse_include_directive_source(&replaced_by_source, context);
     |                           -------------------------------------^^^^^^^^^^^^^^^^^^^----------
     |                           |                                    |
     |                           |                                    borrowed value does not live long enough
     |                           argument requires that `replaced_by_source` is borrowed for `'input`
...
3338 |     }
     |     - `replaced_by_source` dropped here while still borrowed

I have simplified the question to this code only:

    fn parse_include_directive_source(replaced_by_source: Rc<Source>, context: DirectiveContext) -> Vec<Rc<ast::Directive>> {
        let mut parser = Self::new(&replaced_by_source);
        if parser.next().is_ok() {
            parser.parse_directives(context).unwrap_or(vec![])
        } else {
            vec![]
        }
    }

Here is the new function and the structure:

pub struct Parser<'input> {
    tokenizer: Tokenizer<'input>,
    previous_token: (Token, Location),
    token: (Token, Location),
    locations: Vec<Location>,
    activations: Vec<Activation>,
}

impl<'input> Parser<'input> {
    /// Constructs a parser.
    pub fn new(source: &'input Rc<Source>) -> Self {
        Self {
            tokenizer: Tokenizer::new(source),
            previous_token: (Token::Eof, Location::with_line_and_offset(&source, 1, 0)),
            token: (Token::Eof, Location::with_line_and_offset(&source, 1, 0)),
            locations: vec![],
            activations: vec![],
        }
    }
}

Moved to a function out of the impl and it worked.

Glad to see you got it resolved! The issue here looks like it was due to your <'a: 'input> bound: this means that 'a had to live at least as long as 'input, which of course it wouldn't as a local variable. I think it should have worked without the bound, and even without an explicit lifetime at all.

Normally you use an Rc so that you don't need to deal with lifetime issues like this, though at a small performance cost (just an integer increment, though!): is there any particular reason you don't have the Tokenizer/Parser own the Rc with a clone()?

2 Likes

Sorry, I didn't show the tokenizer ^_^'. I used source.char_indices() and wanted to store it in the Tokenizer:

pub struct Tokenizer<'input> {
    pub source: Rc<Source>,
    current_line_number: usize,
    code_points: CodePointsReader<'input>,
}

impl<'input> Tokenizer<'input> {
    /// Constructs a tokenizer.
    pub fn new(source: &'input Rc<Source>) -> Self {
        let text: &'input str = source.text.as_ref();
        let source = Rc::clone(source);
        assert!(!source.already_tokenized.get(), "A Source must only be tokenized once.");
        source.already_tokenized.set(true);
        Self {
            source,
            current_line_number: 1,
            code_points: CodePointsReader::from(text),
        }
}

Instead of creating a sub-parser, you might also want to consider leaving the include directive in your AST and doing a later step which converts an AST/CST into some high-level intermediate representation with include directives resolved.

That should make things a lot simpler because it means the parser doesn't need to go out and read other files.

2 Likes

Actually, that raises a good point I missed: include might strictly have to operate at the Tokenizer level if it's a pure textual include, eg files containing only half of a class being stitched together.

How it works is pretty ambiguous in the as3 spec:

An IncludeDirective may be used where ever a Directive may be used.

include “reusable.as”

Semantics

An IncludeDirective results at compile-time in the replacement of the text of the IncludeDirective
with the content of the stream specified by String.

And that's it. So that certainly sounds like it should be able to include partial syntax, but then there's no reason to be restricted to the top level of a file (roughly what "directive" seems to mean)

3 Likes

ActionScript 3 allows directives to be mixed between statements (they are like statements). I am not implementing a compliant IncludeDirective because I think no developer used that directive significantly, and it was indeed concatenating text at the ActionScript Compiler 2 (ASC2), leading to loss of source lines and columns.

Well I don't think either the standard or the compiler was written by people with much experience with such things, so there's plenty of slop already for what's "correct".

I distinctly remember figuring out that array[index++] += x; incremented index twice, for example.

As such... well: better you than me I suppose :sweat_smile:

2 Likes