First rust project, a parser: facing the interior-mutability problem

Hello,
I am new to Rust and I am trying to write a pull-parser as a first project. I have defined a struct like this:

struct AbnfScanner<'a> {
    source: &'a str,
    iter: std::str::CharIndices<'a>,
    last: Option<char>,
    position: usize,
    last_position: usize,
    line: usize,
    column: usize,
    token: Token<'a>,
}

And it has some get_next function performing the next parsing step and storing the current token. It looks like this:

impl<'a> AbnfScanner<'a> {
    // ...new() and other functions not listed...

    fn get_next(&mut self) -> &'a Token
    {
        // Modifications to self / AbnfScanner and self.token happen here...
    }
}

This causes problems because I need a mutable reference to call get_next which even cause my simple unit tests to not compile:

        let token = abnf_scanner.get_next();
        let expected = Token{ class: TokenClass::TokenId(AbnfTokenId::Name as u32), content: &"rule", line: 1, column: 1 };
        if *token != expected {
            return Err(format!("Unexpected result: {:?} \n!=\n {:?}", token, expected));
        }

        let token = abnf_scanner.get_next();
        let expected = Token{ class: TokenClass::TokenId(AbnfTokenId::Whitespace as u32), content: &" ", line: 1, column: 5 };
        if *token != expected {
            return Err(format!("Unexpected result: {:?} \n!=\n {:?}", token, expected));
        }

I get multiple errors like this:

error[E0499]: cannot borrow `*abnf_scanner` as mutable more than once at a time

I read RefCell<T> and the Interior Mutability Pattern - The Rust Programming Language and it looks like it would solve my problem but I wonder if it is the best way to go or if there are other solutions. Here are some thoughts and questions:

  • if I use RefCell what is it doing, more precisely what runtime costs will that add?
  • I read that RefCell breaks the borrow checking. But that would not affect the code outside of my struct, right?
  • could I solve the issue by passing the struct from the outside? (meaning not using self / global functions which are not an implementation for the struct)

Sorry for this simple questions. But I have read a lot about Rust but still feel uncertain about what is the right way to go.

Thanks for your help/thoughts in advance.

It seems like you should just remove the token field and return Token<'a> by value.

fn get_next(&mut self) -> Token<'a> {

}
2 Likes

But wouldn't that copy all of the content of Token? I would like to delay copying to the caller/user of the parser so that the caller is free to decide if values need to be copied or not.

For completeness here the definition of Token:

#[derive(Debug)]
#[derive(PartialEq, Eq)]
pub struct Token<'a> {
    pub class:TokenClass,
    pub content:&'a str,
    pub line:usize,
    pub column:usize,
}

Unless TokenClass contains large values, it should be very cheap to copy. After all, an &str is just a shared reference, and cloning it does not involve cloning the string, but instead gives a new reference to the same string.

1 Like

I think what happens here is that the compiler infers the following signature:

fn get_next<'b>(&'b mut self) -> &'a Token<'b>

Instead what you want is just:

fn get_next(&mut self) -> &'a Token<'a>

However I think this will probably result in other errors saying that you won't be able to give out a &'a Token<'a> because the lifetime of &mut self doesn't necessarily outlive 'a. Cloneing (or even Copying it if TokenClass is/can be Copy) the Token<'a> is probably better.

Now, to answer your original questions:

if I use RefCell what is it doing, more precisely what runtime costs will that add?

Every time you call borrow/borrow_mut on a RefCell it needs to check if it is allowed to give out such reference based on what other references to its inner value are alive (the rules are the same as the borrowing rules of the compiler). This means an additional branch (if you do nothing wrong this will be predicted almost always, meaning it's pretty much free) and potentially some missed optimization due to increased complexity derived from that branch. Depending on what you're doing all of this should be relatively cheap.

I read that RefCell breaks the borrow checking. But that would not affect the code outside of my struct, right?

It doesn't "break" the borrow checking. It just moves it from compile time to runtime. It affects only code that directly touches the RefCell.

could I solve the issue by passing the struct from the outside? (meaning not using self / global functions which are not an implementation for the struct)

You mean passing the Token<'a> from outside? You could, but it will probably result in much uglier code and probably no benefit over just returning it.

You mean passing the Token<'a> from outside? You could, but it will probably result in much uglier code and probably no benefit over just returning it.

I just noticed that I would lose control over the data so I won't follow this idea.

Unless TokenClass contains large values, it should be very cheap to copy. After all, an &str is just a shared reference, and cloning it does not involve cloning the string, but instead gives a new reference to the same string.

Every time you call borrow / borrow_mut on a RefCell it needs to check if it is allowed to give out such reference based on what other references to its inner value are alive (the rules are the same as the borrowing rules of the compiler). This means an additional branch (if you do nothing wrong this will be predicted almost always, meaning it's pretty much free) and potentially some missed optimization due to increased complexity derived from that branch. Depending on what you're doing all of this should be relatively cheap.

As TokenClass is just an enum (one variant with an u32 field but no large data) I decided to follow the advice from @alice and copy the Token.

@SkiFire13 @alice Thanks for your help.