Borrow checker issues when using enums and structs

I am trying to build a C compiler in Rust for a university course.
It's my first time using Rust and I am just a beginner. I know that building a compiler is a difficult task but hopefully it won't be that hard :slight_smile:

The parser struct has a token that represents the token that is currently read.
When reading tokens the parser should create the AST of the program.

The problem I have is here

cannot move out of `self.token` as enum variant `Identifier` which is behind a shared reference
  --> src/main.rs:12:15
   |
12 |         match self.token {
   |               ^^^^^^^^^^
13 |             SimpleToken::Identifier(s) => {
   |                                     -
   |                                     |
   |                                     data moved here
   |                                     move occurs because `s` has type `String`, which does not implement the `Copy` trait
   |
help: consider borrowing here
   |
12 |         match &self.token {

The full code:

struct SimpleParser {
    token: SimpleToken,
}

enum Expression {
    VarDefinition(String, u32)
}

impl SimpleParser {
    fn match_token(&self) -> Expression {
        // let token = SimpleToken::Identifier("hey".to_string());
        // problem is here.
        match self.token {
            SimpleToken::Identifier(s) => {
                Expression::VarDefinition(s, 2)
            }
            _ => panic!("Bad")
        }
    }
}

enum SimpleToken {
    Identifier(String)
}

fn main() {
}

What I want

I do not want to borrow the value of the token, I want to simply take ownership and move the String from the Identifier enum to the VarDefinition enum.

Interestingly enough, if I have a local variable and I match on it then the code compiles.

fn match_token(&self) -> Expression {
    let token = SimpleToken::Identifier("ana".to_string());
    match token {
        SimpleToken::Identifier(s) => {
            Expression::VarDefinition(s, 2)
        }
        _ => panic!("Bad")
    }
}

I don't really understand how the borrow checker works and what the issue is here.
Why does it work with a local variable but not with a variable of the struct.

I guess the reason could be that after I move the String from the Identifier to VarDefinition it is possible to access self.token and read moved data.

That means you need to either

  1. leave another String in Identifier, like an empty string (probably not what you want)
    or
  2. declare that the SimpleParser instance is unusable by taking ownership in match_token: fn match_token(self) (instead of &self).

The easiest would be to clone the String.

The parser function will be recursive in the real implementation so I am not sure if that works with match_token(self).

The easiest would be clone that is true but that introduces extra memory allocations.

After reading the identifier I will not use it again, I don't really care what happens to it afterwards because I will get the AST from the parser and not worry about tokens.

Another option that I am currently thinking of is making is borrowing the String

enum Expression<'a> {
    VarDefinition(&'a String, u32)
}

But this will pollute the whole code base with a lot of 'a s :))

Maybe you can have Identifier use a option. Use &mut self and put none in Identifier after using s. I don't know how it would influence performance since I am also new to coding , but it probably won't matter.

Like this

struct SimpleParser {
    token: SimpleToken,
}

enum Expression {
    VarDefinition(Option<String>, u32)
}

impl SimpleParser {
    fn match_token(&mut self) -> Expression {
        // let token = SimpleToken::Identifier("hey".to_string());
        // problem is here.
        match & mut self.token {
            SimpleToken::Identifier(s) => {
                Expression::VarDefinition(s.take(), 2)
            }
            _ => panic!("Bad")
        }
    }
}

enum SimpleToken {
    Identifier(Option<String>)
}

fn main() {
}

Because you...

error[E0507]: cannot move out of [...] a shared reference

You don't have a variable of the struct type. You have a variable of a shared reference of the struct type. Without interior mutability (aka shared mutability), you cannot mutate data behind a shared reference.[1] If you change the signature to match_token(self) -> Expression, it works like the local variable does (because now self is the struct type).

Other options include

  • Just clone the String, as others mentioned
  • Have a &mut self receiver and std::mem::take the String
    • Or Option<String>, as others mentioned
  • Use Arc<str> or Rc<str> for cheaper cloning without new allocation
  • Use some sort of more-proper string interning
  • Leak the source code and use &'static str
  • Run the bulk of the program further down the call stack from where the source code is stored and use &'src str, as you mentioned

  1. Even if it's not shared, you can't leave a hole in the struct that might be observed. ↩︎

Just throwing it out there: if the thought is to call match_token on a nested Expression contained inside the parent expression, then the move of self can still be a valid use case.

I would like to use std::mem::take but I am not sure how to do it. I have been trying multiple options but I can't satisfy the borrow checker :slight_smile:

I feel like the other options mentioned are a bit awkward: I either have to use other libraries, use 'src everywhere in the code or use another wrapper like Option<String> or Arc<str>.

I wish this task would be simpler to accomplish. I just want to move a field of an enum to another enum.
Unfortunately, rust is afraid that I will access the moved object again but I would like to promise rust that I won't do that.

I tried this for instance but I get error[E0596]: cannot borrow a as mutable, as it is not declared as mutable

fn match_token(&mut self) -> Expression {
        match &mut self.token{
            SimpleToken::Identifier(ref mut a) => {
               return Expression::VarDefinition(mem::take(&mut a), 2)
            }
            _ => panic!("Bad")
        }
    }

I wish that I could understand how the ownership of match works.
Can someone explain me in simple terms what happens when I do match self.token vs match &self.token.

This works

use core::mem;
struct SimpleParser {
    token: SimpleToken,
}

enum Expression {
    VarDefinition(String, u32)
}

impl SimpleParser {
    fn match_token(&mut self) -> Expression {
        match &mut self.token{
            SimpleToken::Identifier(ref mut a) => {
               return Expression::VarDefinition(mem::take( a), 2)
            }
            _ => panic!("Bad")
        }
    }
}

enum SimpleToken {
    Identifier(String)
}

fn main() {
}

SimpleToken::Identifier(ref mut a) This means that in a put a mutable reference to value inside Identifier. Why don't we use
SimpleToken::Identifier( & mut a)
It's not the proper syntax .

a is already bound as a mutable reference, so you want mem::take(a). With mem::take(&mut a) you're trying to swap the value of the variable a (which is a reference) instead of the value inside the enum, and even if that succeeded, it would have no effect because you don't use a again.

When you write match self.token the pattern you write is paired up exactly with the type of the scrutinee self.token. When you write match &mut self.token, the pattern you've written is an enum variant, but the scrutinee is a reference, which is a type mismatch. At this point, match ergonomics kicks in and says "we're going to pretend there was an &mut in the pattern", and also changes the “default binding mode” for all variables inside to say that they should be bound as mutable references to the parts of the scrutinee, instead of moves/copies.

So, the code you've written is actually redundant. Here is the “desugared” strictly matching version:

match self.token {
    SimpleToken::Identifier(ref mut a) => {

That matches the field directly, and specifies that a mutable reference should be taken and bound to a.

Here is the “using match ergonomics” version, which is generally the current default Rust style (though some disagree that this was a good idea):

match &mut self.token {
    SimpleToken::Identifier(a) => {

Now, a is implicitly ref mut because the pattern skipped past an &mut to get to it.

Here's how match ergonomics translates the pattern for you:

match &mut self.token {
    &mut SimpleToken::Identifier(ref mut a) => {

It inserts an &mut in the pattern to make the types line up, and inserts ref mut on the variable bound by the pattern. (If it didn't do that second step, you'd get a "cannot move out of &mut self.token" error.) You can delete both &muts from this (they cancel each other out) and get the first version I showed.

2 Likes

The previous post addresses the specific code problems; this post is mostly about match patterns / binding modes more generally.


The details of match binding modes (originally called "match ergonomics") are not simple and even has some long-standing bugs. I personally find it easier to start from a pre-binding-modes understanding of pattern matching. This is a good article on the topic of patterns, written before binding modes existed. Before binding modes, you had to explicitly "peel back" references like &mut, just like you have to explicitly match on things like tuple variants.

The main thing the article doesn't cover are reference binding modifiers:

let ref a = a;     // like `let a = &a`
let ref mut a = a; // like `let a = &mut a`

Which are required because you're often matching some place nested within a larger data structure, where you can't just take a reference on the right-hand side.

The main benefit of pre-binding mode patterns are their (relative) simplicity -- the dual nature of the pattern and the expression being matched. Without binding modes, the pattern and expression must be "in sync". If you want to get your own matches back to a pre-binding-mode state, for example to aid understanding what is going on with the types involved, you can use clippy and deny the pattern_type_mismatch lint. However, it won't necessarily help if there are other errors that mask the clippy lint, without taking it step by step anyway.


Starting from an understanding of pre-binding-mode patterns, the main idea of binding modes is to enable rewriting (pre-binding-mode) matches like this:

let some_borrow = &mut something;
// ...far away...
match some_borrow {
//  vvvv Have to "peel back" the `&mut`
    &mut SimpleToken::Identifier(ref mut a) => {
// Have to not move the `String` ^^^^^^^

To something like this:

match some_borrow {
    // No "peeling back" required
    SimpleToken::Identifier(a) => {
    // No `ref mut` required

Where by not explicitly "peeling back" the &mut,[1] you enter a "binding mode" where new bindings like a are implicitly ref mut.

It's pretty straight-forward for simple cases like this once you get used to it, but the rules for when the binding modes change are complicated, the documentation incomplete, and the implementation buggy. So any attempt to explain it tends to either also be incomplete or to spiral in complexity.

Things that impact the binding mode or are otherwise taken into consideration include matching references against non-references, matching references against references, and binding modifiers like ref, ref mut, and mut. The rules are there as an attempt to "simply" "always do what you want", but as is typical with such attempts, the result is something with a lot of actual complexity.

Personally I find it convenient enough that I do make use of binding modes to a point, but try to avoid them and be explicit once my patterns get much more complicated than the simple example above.

Citations

Back to your code. This is mostly redundant with @kpreid's answer, but I want to point out that[2] you usually can find out what the compiler did by doing something like this:

        match &mut self.token {
            SimpleToken::Identifier(ref mut a) => {
                let a: () = a;
                return Expression::VarDefinition(mem::take(&mut a), 2)
            }
17 |                 let a: () = a;
   |                        --   ^ expected `()`, found `&mut String`
   |                        |
   |                        expected due to this

It's let you know that the a binding is a &mut String. And that's the same if you had done either of

        // Pre-binding-mode version
        match self.token {
            SimpleToken::Identifier(ref mut a) => {
        // Binding mode version
        match &mut self.token {
            SimpleToken::Identifier(a) => {

Which reflects some of the special rules mentioned before -- the ref mut qualifier while in ref mut binding mode is redundant.


  1. i.e. matching a non-reference pattern against a &mut value ↩︎

  2. even when it's a pain to get back to a non-binding mode state with clippy ↩︎

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.