Mutable references with lifetimes

Hey everyone.

I have a question about how I can mitigate the use of mutable references when used together with lifetimes in my specific case.

I know that this is a question that is asked again and again, and I looked at numerous posts about it and I think I understand the underlying issue. I just can't wrap my head around a mitigation, that works for my use case.

I don't have a super minimal example, as the tips I found in other posts were always to generic or not applicable to my issue, so I hope it is fine that I'll post a specific example.

First some information on what I want to accomplish, and then code:

I am trying to build a tool that takes in a markdown document, parses out some metadata contained in codeblocks, removes those codeblocks, so they dont appear in the final document, apply further processing like templating, and then renders that to HTML.
It should also keep track of the findings, as well as all headings, in order to render a table of contents.

This should be provided as a library, as it will be driven from two different user facing pieces, a CLI and another graphical tool.

The graphical tool should also render a preview. So I want the core library to be as "modular" as possible, so that I can skip expensive steps when generating the preview.

To parse the markdown I use comrak.

I want to store the AST, as the references to the findings and headings point to nodes allocated within it. Further I do not want to reparse the entire document every time I want to work with the markdown in another method. Also I want to transform the AST in multiple steps in different methods.

First the initial code:

use std::fmt::Display;

#[allow(dead_code)]
#[derive(Debug)]
pub struct TomlError {
    pub inner: toml::de::Error,
    pub position: (usize, usize),
}

impl Display for TomlError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_fmt(format_args!(
            "Encountered a toml error on line {}:{}, {}",
            self.position.0,
            self.position.1,
            self.inner.message()
        ))
    }
}

#[allow(dead_code)]
#[derive(Debug)]
pub(crate) struct Finding<'a> {
    pub(crate) frontmatter: toml::Table,
    pub(crate) position: comrak::nodes::Sourcepos,
    pub(crate) node: comrak::Node<'a>,
}

#[allow(dead_code)]
pub struct PentopsCore<'m> {
    markdown_source: Option<String>,
    markdown_ast: Option<comrak::Node<'m>>,
    comrak_arena: comrak::Arena<'m>,
    comrak_options: comrak::Options<'m>,
    comrak_anchorizer: comrak::Anchorizer,
    headings: Vec<comrak::nodes::Node<'m>>,
    findings: Vec<Finding<'m>>,
}

impl<'m> Default for PentopsCore<'m> {
    fn default() -> Self {
        Self {
            markdown_source: None,
            markdown_ast: None,
            comrak_arena: comrak::Arena::new(),
            comrak_options: comrak::Options {
                extension: comrak::options::Extension {
                    math_dollars: true,
                    table: true,
                    front_matter_delimiter: Some("---".to_string()),
                    footnotes: true,
                    inline_footnotes: true,
                    alerts: true,
                    header_ids: Some(String::new()),
                    ..Default::default()
                },
                render: comrak::options::Render {
                    r#unsafe: true,
                    ..Default::default()
                },
                ..Default::default()
            },
            comrak_anchorizer: comrak::Anchorizer::new(),
            findings: vec![],
            headings: vec![],
        }
    }
}

impl<'t, 'm> PentopsCore<'m> {
    /// Creates a new PentopsCore, using the given template strings.
    /// Those can later be updated.
    pub fn new() -> Self {
        Self {
            ..Default::default()
        }
    }

    /// Updates Pentops with a new markdown by replacing the previous version.
    /// Also reparses the markdown, collecting findings and headings.
    pub fn update_markdown(
        &mut self,
        markdown_source: impl Into<&'m str>,
    ) -> Result<(), TomlError>
    {
        self.markdown_source = Some(markdown_source.into().to_string());
        self.markdown_ast = Some(comrak::parse_document(
            &self.comrak_arena,
            &self.markdown_source.clone().unwrap(),
            &self.comrak_options,
        ));
        self.comrak_anchorizer = comrak::Anchorizer::new();

        let root = self
            .markdown_ast
            .expect("ERROR: self.markdown_ast can not be None here");
        for node in root.descendants() {
            if let comrak::nodes::NodeValue::Heading(_) = node.data().value {
                self.headings.push(node);

                if let Some(sibling) = node.next_sibling()
                    && let comrak::nodes::NodeValue::CodeBlock(ref code) = sibling.data().value
                {
                    if code.info != "toml" {
                        continue;
                    }

                    self.findings.push(Finding {
                        frontmatter: toml::from_str(&code.literal).map_err(|e| {
                            let pos_in_report = {
                                let mut line = sibling.data().sourcepos.start.line;
                                let mut column = sibling.data().sourcepos.start.column;

                                for char in code.literal.chars() {
                                    match char {
                                        '\n' => {
                                            line += 1;
                                            column = 0;
                                        }
                                        _ => {
                                            column += 1;
                                        }
                                    }
                                }

                                (line, column)
                            };
                            TomlError {
                                inner: e.clone(),
                                position: pos_in_report,
                            }
                        })?,
                        position: node.data().sourcepos,
                        node,
                    });
                }
            }
        }

        Ok(())
    }

    /// Preprocesses the markdown input, i.e.
    ///     - remove frontmatter
    pub fn preprocess(&mut self) {
        let root = if let Some(r) = self.markdown_ast {
            r
        } else {
            return;
        };

        let mut nodes_to_remove: Vec<comrak::Node> = vec![];

        for node in root.descendants() {
            match node.data().value {
                comrak::nodes::NodeValue::FrontMatter(_) => {
                    nodes_to_remove.push(node);
                }
                _ => {}
            }
        }

        for finding in &self.findings {
            if let comrak::nodes::NodeValue::Heading(_) = finding.node.data().value {
                if let Some(sibling) = finding.node.next_sibling()
                    && let comrak::nodes::NodeValue::CodeBlock(ref code) = sibling.data().value
                {
                    if code.info != "toml" {
                        panic!(
                            "ERROR: all findings should immediatly be followed by a toml codeblock, otherwise they are not a finding"
                        )
                    }
                    nodes_to_remove.push(sibling);
                }
            }
        }

        for node in nodes_to_remove {
            node.detach();
        }

        println!("ast: {:#?}", self.markdown_ast);
    }
}

fn main() {
    let mut pentops = PentopsCore::new();
    pentops
        .update_markdown(
            r#"
# Hello World

## This is a finding
```toml
foo = "bar"
```
        "#,
        )
        .unwrap()
}

This produces this error:

error: lifetime may not live long enough
  --> src/main.rs:87:34
   |
70 |   impl<'t, 'm> PentopsCore<'m> {
   |            -- lifetime `'m` defined here
...
82 |           &mut self,
   |           - let's call the lifetime of this reference `'1`
...
87 |           self.markdown_ast = Some(comrak::parse_document(
   |  __________________________________^
88 | |             &self.comrak_arena,
89 | |             &self.markdown_source.clone().unwrap(),
90 | |             &self.comrak_options,
91 | |         ));
   | |_________^ argument requires that `'1` must outlive `'m`
   |
   = note: requirement occurs because of the type `typed_arena::Arena<Node<'_, RefCell<Ast>>>`, which makes the generic argument `Node<'_, RefCell<Ast>>` invariant
   = note: the struct `typed_arena::Arena<T>` is invariant over the parameter `T`
   = help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance

error: could not compile `help` (bin "help") due to 1 previous error

I understand this happens, because comrak::parse_document(...) takes in an arena, as a reference. The arena is used to create the markdown AST nodes, so the node lifetimes are bound the the arena, here 'm.
I create the arena when PentopsCore is created.
The comrak::parse_document(...) creates a Node<'m>. The reference created by &self.comrak_arena only lives for the scope of the function, but needs to live at least as long as 'm.

When I update the method signature to in order to reflect that, that specific passage doesn't have a problem anymore.

    pub fn update_markdown(
        self: &'m mut PentopsCore<'m>,
        markdown_source: impl Into<&'m str>,
    ) -> Result<(), TomlError> {

But that leads to the following problem:

error[E0597]: `pentops` does not live long enough
   --> src/main.rs:187:5
    |
186 |     let mut pentops = PentopsCore::new();
    |         ----------- binding `pentops` declared here
187 |     pentops
    |     ^^^^^^^ borrowed value does not live long enough
...
199 | }
    | -
    | |
    | `pentops` dropped here while still borrowed
    | borrow might be used here, when `pentops` is dropped and runs the destructor for type `PentopsCore<'_>`

For more information about this error, try `rustc --explain E0597`.
error: could not compile `help` (bin "help") due to 1 previous error

This happens, because of the "Borrowed forever" problem discussed here.
This means self: &'m mut PentopsCore<'m>, borrow the reference for as long as the struct lives, meaning it is borrowed forever and a is massive anti-pattern.

I think I get it. The problem is, I don't get it enough to fix it myself.

I seemingly found a fix, or at least I though, by introducing a second lifetime parameter for PentopsCore

#[allow(dead_code)]
pub struct PentopsCore<'m, 'c> {
    markdown_source: Option<String>,
    markdown_ast: Option<comrak::Node<'m>>,
    comrak_arena: comrak::Arena<'m>,
    comrak_options: comrak::Options<'c>,
    comrak_anchorizer: comrak::Anchorizer,
    headings: Vec<comrak::nodes::Node<'m>>,
    findings: Vec<Finding<'m>>,
}

With this I could update the method signature to this:

    pub fn update_markdown(
        self: &'t mut PentopsCore<'m, 'c>,
        markdown_source: impl Into<&'m str>,
    ) -> Result<(), TomlError>
    where
        't: 'm,
    {

Doing this, it managed to compile and run. Here I am not 100% sure why that works. I'd be very happy about an explanation for this too.

But as you might guess, this leads to the following problem, as soon as I call pentops.preprocess(), or do anything else with pentops in main.

error[E0499]: cannot borrow `pentops` as mutable more than once at a time
   --> src/main.rs:215:5
    |
202 |     pentops
    |     ------- first mutable borrow occurs here
...
215 |     pentops.preprocess();
    |     ^^^^^^^
    |     |
    |     second mutable borrow occurs here
    |     first borrow later used here

For more information about this error, try `rustc --explain E0499`.
error: could not compile `help` (bin "help") due to 1 previous error

And here I am hard stuck. I tried Boxing the arena, I tried Boxing the parse_document() call. I tried lazy_static! for the arena (didn't work because something wasn't Sync). I can't think of a way to refactor this, where I can still process the AST in different methods.

I know that this is a really long and specific post, but I really tried to figure this out for myself, but after a whole day, I didn't get any closer.

I would really appreciate if somebody could enlighten me.

&'t mut PentopsCore<'m, 'c> where 't : 'm is terrible. it's a trick, it's actually just &'m mut PentopsCore<'m, 'c> in disguise, which you know is bad.
indeed :
wherenever you have &'a Foo<'b>, there is an implicit 'b : 'a. and if you add an explicit 'a : 'b (which you did), then you get 'a == 'b.

you should name each lifetimeyou use in your struct with a clear and informative name, and only make them identical if you know they are the same.

you should definitely store a &'arena comrak::Arena<'arena>, instead of an comrak::Arena<'arena>.
the function signature from comrak is clear : the only useful way to use an Arena<'a> is to have a &'a Arena<'a>, but if you store an Arena<'a>, then the only way to have a &'a Arena<'a> is to borrow yourself forever, therefore you should never store a Arena<'a> in a struct unless you expect to borrow that struct forever

and yest that measn you need to forget about using Default, the code i ended up with is as follows :

use std::fmt::Display;

#[allow(dead_code)]
#[derive(Debug)]
pub struct TomlError {
    pub inner: toml::de::Error,
    pub position: (usize, usize),
}

impl Display for TomlError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_fmt(format_args!(
            "Encountered a toml error on line {}:{}, {}",
            self.position.0,
            self.position.1,
            self.inner.message()
        ))
    }
}

#[allow(dead_code)]
#[derive(Debug)]
pub(crate) struct Finding<'a> {
    pub(crate) frontmatter: toml::Table,
    pub(crate) position: comrak::nodes::Sourcepos,
    pub(crate) node: comrak::Node<'a>,
}

#[allow(dead_code)]
pub struct PentopsCore<'arena, 'options> {
    markdown_source: Option<String>,
    markdown_ast: Option<comrak::Node<'arena>>,
    comrak_arena: &'arena comrak::Arena<'arena>,
    comrak_options: comrak::Options<'options>,
    comrak_anchorizer: comrak::Anchorizer,
    headings: Vec<comrak::nodes::Node<'arena>>,
    findings: Vec<Finding<'arena>>,
}


impl<'arena, 'options> PentopsCore<'arena, 'options> {
    /// Creates a new PentopsCore, using the given template strings.
    /// Those can later be updated.
    pub fn new(arena : &'arena comrak::Arena<'arena>) -> Self {
        Self {
            markdown_source: None,
            markdown_ast: None,
            comrak_arena: arena,
            comrak_options: comrak::Options {
                extension: comrak::options::Extension {
                    math_dollars: true,
                    table: true,
                    front_matter_delimiter: Some("---".to_string()),
                    footnotes: true,
                    inline_footnotes: true,
                    alerts: true,
                    header_ids: Some(String::new()),
                    ..Default::default()
                },
                render: comrak::options::Render {
                    r#unsafe: true,
                    ..Default::default()
                },
                ..Default::default()
            },
            comrak_anchorizer: comrak::Anchorizer::new(),
            findings: vec![],
            headings: vec![],
        }
    }

    /// Updates Pentops with a new markdown by replacing the previous version.
    /// Also reparses the markdown, collecting findings and headings.
    pub fn update_markdown(
        &mut self,
        markdown_source: impl Into<&'arena str>,
    ) -> Result<(), TomlError>
    {
        self.markdown_source = Some(markdown_source.into().to_string());
        self.markdown_ast = Some(comrak::parse_document(
            &self.comrak_arena,
            &self.markdown_source.clone().unwrap(),
            &self.comrak_options,
        ));
        self.comrak_anchorizer = comrak::Anchorizer::new();

        let root = self
            .markdown_ast
            .expect("ERROR: self.markdown_ast can not be None here");
        for node in root.descendants() {
            if let comrak::nodes::NodeValue::Heading(_) = node.data().value {
                self.headings.push(node);

                if let Some(sibling) = node.next_sibling()
                    && let comrak::nodes::NodeValue::CodeBlock(ref code) = sibling.data().value
                {
                    if code.info != "toml" {
                        continue;
                    }

                    self.findings.push(Finding {
                        frontmatter: toml::from_str(&code.literal).map_err(|e| {
                            let pos_in_report = {
                                let mut line = sibling.data().sourcepos.start.line;
                                let mut column = sibling.data().sourcepos.start.column;

                                for char in code.literal.chars() {
                                    match char {
                                        '\n' => {
                                            line += 1;
                                            column = 0;
                                        }
                                        _ => {
                                            column += 1;
                                        }
                                    }
                                }

                                (line, column)
                            };
                            TomlError {
                                inner: e.clone(),
                                position: pos_in_report,
                            }
                        })?,
                        position: node.data().sourcepos,
                        node,
                    });
                }
            }
        }

        Ok(())
    }

    /// Preprocesses the markdown input, i.e.
    ///     - remove frontmatter
    pub fn preprocess(&mut self) {
        let root = if let Some(r) = self.markdown_ast {
            r
        } else {
            return;
        };

        let mut nodes_to_remove: Vec<comrak::Node> = vec![];

        for node in root.descendants() {
            match node.data().value {
                comrak::nodes::NodeValue::FrontMatter(_) => {
                    nodes_to_remove.push(node);
                }
                _ => {}
            }
        }

        for finding in &self.findings {
            if let comrak::nodes::NodeValue::Heading(_) = finding.node.data().value {
                if let Some(sibling) = finding.node.next_sibling()
                    && let comrak::nodes::NodeValue::CodeBlock(ref code) = sibling.data().value
                {
                    if code.info != "toml" {
                        panic!(
                            "ERROR: all findings should immediatly be followed by a toml codeblock, otherwise they are not a finding"
                        )
                    }
                    nodes_to_remove.push(sibling);
                }
            }
        }

        for node in nodes_to_remove {
            node.detach();
        }

        println!("ast: {:#?}", self.markdown_ast);
    }
}

fn main() {
    let arena = comrak::Arena::new();
    let mut pentops = PentopsCore::new(&arena);
    pentops
        .update_markdown(
            r#"
# Hello World

## This is a finding
```toml
foo = "bar"
```
        "#,
        )
        .unwrap()
}

now if you really really do want to store the arena in your struct, and be Default, and have new with no argument, then there prob is a way but i would stongly recommend against it. it's self-referential crates, like ouroboros. because ultimately this is what you are trying to do, be self-referential, and this is dangerous stuff that only works with specialized tools like what ouroboros provides.

note of smth i found while improving the API a bit :

    pub fn update_markdown(
        &mut self,
        markdown_source: impl Into<String>,
    ) -> Result<(), TomlError>
    {
        let markdown_source = markdown_source.into();
        self.markdown_ast = Some(comrak::parse_document(
            &self.comrak_arena,
            &markdown_source,
            &self.comrak_options,
        ));
        self.markdown_source = Some(markdown_source);

markdown_source should really be used like this. especially the clone in the original was completely uncalled for.

the way i went around to make it work with ouroboros is to make a thin wrapper :

#[ouroboros::self_referencing]
struct OuroboCore<'options> {
    unused : (),
    #[borrows(unused)]
    #[not_covariant]
    a : comrak::Arena<'this>,
    #[borrows(a)]
    #[not_covariant]
    pentops : PentopsCore<'this, 'options>
}


impl Default for OuroboCore<'_> {
    fn default() -> Self {
        Self::new((), |_| comrak::Arena::new(), |r| PentopsCore::new(r))
    }
}

impl OuroboCore<'_> {
    pub fn update_markdown(
        &mut self,
        markdown_source: impl Into<String>,
    ) -> Result<(), TomlError>
    {
        self.with_pentops_mut(|p| p.update_markdown(markdown_source))
    }

    pub fn preprocess(&mut self) {
        self.with_pentops_mut(|p| p.preprocess())
    }
}

this only works if you fix update_markdown as i showed

4 Likes

Hey,

Sorry for the late reply and a big thanks for your detailed answer and the different solutions.
Good to know that it is just not possible with "standard" rust.

Thanks for providing the ouroboros solution. I think it is a lot more friction than it is worth, and the user of the library can just provide the arena. I come from a mostly zig background, so providing an allocator to a function is fine for me.

I tried to use descriptive names for lifetimes, but thought I got a syntax error. Maybe I just interpreted that wrong. I'll definitely do that in the future.

Just so I understand correctly: The self-referential nature here comes from the fact, that the lifetime of a variable inside the struct is bound to a lifetime of another variable in the struct, right?

And when I pass a string, is that not just a clone in disguise, as it has to copy the string in order to pass ownership?

not exactly but kinda ?
self-referential means a field in the struct "borrows from" to another field in the struct.
closer to your words, "a lifetime of a field inside the struct is derived from a reference to another field inside the struct."

so like for example, if you have

    markdown_ast: Option<comrak::Node<'arena>>,
    comrak_arena: &'arena comrak::Arena<'arena>,

this is fine because you just use a copy (or a short time borrow) of comrak_arena to populate markdown_ast.

    self_ref: &'a u32,
    referred : u32

this is obviously the simplest example of not fine (it isn't even constructible).
a test would be, if you have

    fielda: A<'a>,
    fieldb: B

if the construction for fielda is for<'short> FnOne(&'short B) -> A<'a>, then its fine, because you know A can't store a refence to B inside of itself, so its not self-referential, but if you have
FnOne(&'a B) -> A<'a>, then here, because 'a is part of A, then it can store the reference inside itself, so it is self-referential.

no. you need a String because you have markdown_source: Option<String>. so you should take ask for what you need.
parse_document does not need a String, so you should not make one. for it.
let us look at the original :

        markdown_source: impl Into<&'arena str>,

this Into is useless, because there are no useful types aside from &strs that implement impl Into<&'arena str>. also it is very restrictive, because if the caller has a String it has to borrow it for a long time instead of just giving it.

self.markdown_source = Some(markdown_source.into().to_string());

this is the expensive creation of a String

&self.markdown_source.clone().unwrap(),

this is the second expensive creation of a String, but then it gets immediately destroy after. so it is just wasteful.
you could do

self.markdown_source.as_deref().unwrap(),

but the unwrap is still very unnecessary, you know it is Some.
so just put the String in the Option after instead, so you don't need the unwrap.
so if we remove all the unnecessary song and dance we get :

pub fn update_markdown<'a>(
        &mut self,
        markdown_source: impl Into<&'a str>,
    ) -> Result<(), TomlError>
    {
        let markdown_source = markdown_source.into().to_string();
        self.markdown_ast = Some(comrak::parse_document(
            &self.comrak_arena,
            &markdown_source ,
            &self.comrak_options,
        ));
        self.markdown_source = Some(markdown_source); 

the exact same thing as yours, but better.
but then, you have to ask yourself, what is the user gonna possibly give to me ?
they are basically forced to give you a &str, where you always make a new copy.
but what if they have a String that they want to give to you ? then making a copy is very wasteful.
if you ask for impl Into<String> instead, then if they have a &str, it will do the same thing as before, make copy, but if they give you a String then it al becomes completely free, which is better

1 Like