Problem in Lifetimes Issues with Allocator and Parser Struct Design

I am very new to Rust. After watching a few courses, I tried to write some of the code I needed, but it seems I've run into a problem with lifetimes, and I can't solve it.

I am using the OXC library to generate output from a JS file.
This is the code that ChatGPT provided me. However, it clearly includes some dynamic parts, and I feel the key part of it is incorrect, even though it works.

use oxc::allocator::Allocator;
use oxc::ast::ast::Program;
use oxc::parser::Parser;
use oxc::span::SourceType;
use std::fs;

pub fn source_to_ast(file_path: &str) -> Result<Program<'_>, String> {
    if !std::path::Path::new(file_path).exists() {
        return Err(format!("File does not exist: {}", file_path));
    }

    let js_code = Box::leak(
        fs::read_to_string(file_path)
            .map_err(|e| format!("Failed to read the JavaScript file: {}", e))?
            .into_boxed_str(),
    );

    let allocator = Box::leak(Box::new(Allocator::default()));
    let source_type = SourceType::from_path(file_path)
        .map_err(|e| format!("Failed to determine source type: {:?}", e))?;

    let parser = Parser::new(allocator, js_code, source_type);
    let parse_result = parser.parse();

    Ok(parse_result.program)
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::path::Path;

    #[test]
    fn test_parse_and_display_ast() {
        let test_file_path = "test_assets/test.js";

        assert!(
            Path::new(test_file_path).exists(),
            "Test file does not exist"
        );

        match source_to_ast(test_file_path) {
            Ok(ast) => println!("{:#?}", ast),
            Err(e) => eprintln!("Error: {}", e),
        }
    }
}

As you can see, it has used Box::leak multiple times, which I think is because I didn't provide a struct to store the data for persistence. It seems this was done to ensure the data lives long enough.

Assuming I use let allocator = Allocator::default();, unfortunately, I get the following error:

rust-analyzer: expected &Allocator, found Allocator
And when I use & , for example like this:

let allocator = Allocator::default();
let source_type = SourceType::from_path(file_path)
    .map_err(|e| format!("Failed to determine source type: {:?}", e))?;

let parser = Parser::new(&allocator, js_code, source_type);
let parse_result = parser.parse();

Ok(parse_result.program)

I get the following error on the last line:

rustc: cannot return value referencing local variable 'allocator' returns a value referencing data owned by the current function

I tried creating a struct to store the output, like the example below, but I still encountered lifetime-related errors and the same issues mentioned earlier:

pub struct ParseResult<'a> {
    pub program: Program<'a>,
    pub allocator: &'a Allocator,
}

I think I haven’t fully understood the core concept of this part. I hope I’ve managed to convey the steps I’ve taken so far.

Thank you in advance for your help! If you have any suggestions for the correct way to write this code, I’d really appreciate it.

Here’s the dependency I installed:

oxc = { version = "*", features = ["full"] }

In a sense, yes. It was done because the function signature you’ve asked for is otherwise impossible:

pub fn source_to_ast(file_path: &str) -> Result<Program<'_>, String> {

The Program type wants to borrow data from the source code in addition to everything else involved. Therefore, the source code you read from disk has to be around to be borrowed from as long as you use the Program.

You have to structure your program differently; you cannot package up “read file and return Program” into one function. You can write a function that does something with the Program but doesn’t return it, or you can write a function that takes &'a str for the source code (and &'a Allocator too) and parses the code (but that function would be largely equivalent to Parser::parse() anyway).

If you really, really have to, you can make a struct like your struct ParseResult by using ouroboros - Rust to define a “self-referential struct” and including the source code and not just the allocator, but you should consider this a last resort because libraries like ouroboros are doing something the language doesn't intend to support at all, so while they try to provide a safe interface to self-referentiality, they often fail to do it exactly right.[1]


  1. The risk of this is largely that they might let you write an unsound (segfaulting, etc) program by accident. So it's not too bad when used carefully. ↩︎

4 Likes

Your struct would need to actually own the allocator, but then it would be a self referential struct, which are best to avoid.

If you just need the return value to live for the duration of the caller, you could try taking a mutable reference to a "state struct", and putting the allocator in there:

struct State {
    allocator: Option<Allocator>,
    js_code: Option<String>,
}

pub fn source_to_ast<'a>(file_path: &str, state: &'a mut State) -> Result<Program<'a>, String> {
    if !std::path::Path::new(file_path).exists() {
        return Err(format!("File does not exist: {}", file_path));
    }

    let js_code = fs::read_to_string(file_path)
            .map_err(|e| format!("Failed to read the JavaScript file: {}", file_path)?;
    let allocator = Allocator::default();

    // vvvv
    state.allocator = Some(allocator);
    state.js_code = Some(js_code);
    let allocator = state.allocator.as_ref().unwrap();
    let js_code = state.js_code.as_ref().unwrap();
    // ^^^^

    let source_type = SourceType::from_path(file_path)
        .map_err(|e| format!("Failed to determine source type: {:?}", e))?;

    let parser = Parser::new(allocator, js_code, source_type);
    let parse_result = parser.parse();

    Ok(parse_result.program)
}

But if you try to keep passing around the program, you'll run into the same issues. In that case, the easiest thing you can do is just leave the code with those leaks.

1 Like

Thank you both of you @Kyllingene and @kpreid .
I am curious to know, using let allocator = Box::leak(Box::new(Allocator::default())); is safe and optimized?

My final goal is with https://docs.rs/crate/rustler/latest lib, send the ast to Elixir and send changed ast from Elixir to Rust.

I do not know what should I do, for example serialize the parse_result.program to josn or do you have another suggestion?

It is safe and efficient, but leaks memory. It is fine if you only do it once but if you do it repeatedly over the life of your program, your program’s memory usage will grow, which is usually undesirable.

You should organize your code in a way that the Program can borrow the source code.

fn do_the_thing(file_path: &str) -> Result<(), String> {
    let allocator: Allocator = Allocator.default();
    let source_code: String = load_source(file_path)?;
    let program: Program<'_> = Parser::new(allocator, source_code, source_type)?;

    // Now do whatever it is you want to do with the `Program`,
    // instead of returning it.
}

Or, use a different parsing library that does not insist on this borrowing.

2 Likes

Thank you so much! Some of the concepts are clearer to me now regarding how they work.

Sorry to bother you again, but I have a few more questions. You mentioned that if I organize the program and a user calls it once, there won’t be any issues.

Currently, I only have a single lib.rs file where I have included the following:

pub mod ast;

This is the same code snippet I shared in my first post.

In Rust, doesn’t a function’s lifetime end after its execution?

pub fn source_to_ast(file_path: &str) -> Result<(), String> {
    ...
}

What exactly do you mean by "repetition"? For instance, if a user clicks a button on a website, I call this function each time. My background is in dynamic languages, where the language itself often handles such cases, so this has been a bit confusing for me.

Here’s my next question:

pub fn source_to_ast(file_path: &str) -> Result<(), String> {
    ...
    // Convert to JSON
}

I convert Program into JSON and return it as the last line of the function. Will this still cause high memory consumption?

Finally, how can I verify if I’m doing something wrong? For example, how can I check whether memory usage is increasing abnormally?

Thanks a lot!

I said that you can use Box::leak() in a situation where the code that use it is called only once in the life of the program. When I talked about reorganizing the code, I meant in a way that allowed you to avoid using Box::leak().

Yes, that. A web server is an example of a program where you should definitely not use Box::leak() inside of code that is run for each request. It’s okay (but not ideal) to use Box::leak() in code that runs only once when the server starts up. In your case, you must not use Box::leak(), since you are trying to do things per-request which should be cleaned up after the request has been processed.

In “dynamic languages”, things generally act sort of like they are wrapped in Arc. The problems you are having are because oxc insists on using & borrows internally, which do not act like Arc and cannot be used in the same patterns.

“High” memory consumption is not the problem here. Continuously increasing memory consumption — a memory leak — is the problem, because it means your server will eventually run out of memory and crash. To avoid this problem, refrain from using Box::leak().

Use any ordinary process monitoring (aka “task manager”) software to watch the memory usage of your server, while you send it lots and lots of requests (using an automated load-test tool, not manual; clicking the button manually would take far too long to get good results). The memory usage will grow after startup but should eventually reach some approximately constant level. If it never levels off after thousands of requests, you have a memory leak.

It is possible to use an instrumented memory allocator, like allocation-counter, to write automated tests that your code does not leak. However, accidental memory leaks are not too common, so it probably isn’t necessary to go to this length. Simply do not use Box::leak(), which says right in the name that it leaks, and things will be fine 99% of the time.

4 Likes

The lifetime annotations point to where the data has been stored.

If you have <'a> on a struct, it means that this struct does not store any data marked by 'a, and is completely forbidden from keeping it. Rust references are not for storing data by reference, they are temporary views into data that must have already been stored somewhere else.

fn function(file_path: &str) -> Result<Program<'_>, String>

uses default lifetime rules, which are:

fn function<'data_from_the_argument>(
   file_path: &'data_from_the_argument str) 
   -> Result<Program<'data_from_the_argument>, String>

which means that all of the references that exist in Program can only use data contained in the &str, and nothing else. The function says that this is the only data source the returned type is allowed to reference (literally just the letters of the file path, not even file's content!).

This obviously isn't what you want. My tip for learning Rust is: never ever put any references inside structs. References are great as function arguments and some code inside functions, but in structs they don't do what most people assume, and will cause you endless struggle with Rust.

2 Likes

Ocx was carefully written to be a much higher performance library than what was already available, so its trading off ease of use here, though borrowing from your parse input is a pretty good idea generally.

You could pull some tricks to small tricks convince Rust that you're definitely keeping the context for parsing around like using a common dashmap or the like around at a higher level and moving your source into that before each parse (using a concurrent collection lets you add items while it has existing borrows), but it's probably not worth it rather than just finishing all your processing before everything gets dropped together at the end of the same function as mentioned earlier.

That processing might be simply building an owned version of whatever information you care about, though if it's the entire source tree you're probably in for a bad time and should consider another parser.

2 Likes

I'm so glad I asked this question here. There's a wealth of discussion and learning in this post. Thank you all :pray:t2::heart::fire:

The reason I want to return the data at the end is because I'm using the library rustler, so I can access it in Elixir.

For example:

fn source_to_parser<'a>(
    path: &str,
    source_text: &'a str,
    allocator: &'a Allocator,
) -> oxc::parser::ParserReturn<'a> {
    let path = Path::new(path);
    let source_type = SourceType::from_path(path).unwrap();

    let ret = Parser::new(allocator, source_text, source_type)
        .with_options(ParseOptions {
            parse_regular_expression: true,
            ..ParseOptions::default()
        })
        .parse();

    ret
}

I don't know how this should look when it's converted into a #[rustler::nif] . Specifically, I wonder how Elixir would handle creating something like an Allocator . If this were all in Rust, it would be as simple as doing the following in my program:

fn main() -> Result<(), String> {
    let allocator = Allocator::default();
    let source_text = fs::read_to_string("./test.js").unwrap();
    let parser_return = source_to_parser("./test.js", &source_text, &allocator);
    println!("{:?}", parser_return.program);

    Ok(())
}

You could use something like

static CONTEXTS: LazyLock<DashMap<Id, Context>> = ...

To get a static that you can reborrow from each call back into Rust, with some id you put in your nif object, but you should be careful about removing from the map or it's just a more complicated leak.

2 Likes

Thank you, I could not be able to implement this in side my code had so much type error and borrowing. but I think if I could do this, it is a good solution especially I can delete it by id after my work is done