Can't figure out how to avoid multiple mutable borrows in a parser

djk121 · September 13, 2018, 12:46am

I'm writing a parser for a format that is a sequence of records. Records can be either data or definition records; the definition records contain instructions for how to parse subsequent data records. There are a handful of other ways in which the results of parsing one record should cause all subsequent records to be parsed differently.

I've tried to structure my code with a parsing state struct that looks like this:

pub struct FitParsingState<'a> {
    map: HashMap<u16, Rc<FitDefinitionMessage>>,
    last_timestamp: Option<FitFieldDateTime>,
    timezone_offset_secs: Option<i32>,
    developer_data_definitions: HashMap<u8, FitDeveloperDataDefinition<'a>>,
}

I then pass a mut reference to this state object down into the code which parses the next record:

let (fit_message, out) = match header {
        FitRecordHeader::Normal(normal_header) => {
            match normal_header.message_type {
                FitNormalRecordHeaderMessageType::Data => {
                    let (data_message, o) = FitDataMessage::parse(o, FitRecordHeader::Normal(normal_header), parsing_state, None)?;
                    (FitMessage::Data(data_message), o)
                },
                FitNormalRecordHeaderMessageType::Definition => {
                    let (definition_message, o) = FitDefinitionMessage::parse(o, normal_header)?;
                    parsing_state.add(definition_message.header.local_mesg_num, definition_message.clone());
                    (FitMessage::Definition(definition_message.clone()), o)
                }
            }
        },
        _ => return Err(Error::unknown_error())
    };

I then want to do some post-processing on whatever record comes back, as some messages should cause things in the parsing state to change. This is where I run into problems:

match &fit_message {
        FitMessage::Data(m) => {
            match m {
                FitDataMessage::DeviceSettings(ds) => {
                    match *ds.deref() {
                        FitMessageDeviceSettings{time_zone_offset, ..} => {
                            // device_settings.time_zone_offset is 'in quarter hour increments',
                            // so a value of +15 = (15 increments * 15 minutes * 60 seconds) =
                            // = +13500 seconds
                            if let Some(tzo) = time_zone_offset {
                                parsing_state.set_timezone_offset((tzo * 15 * 60).into());
                            }
                        }
                    }
                },
                FitDataMessage::FieldDescription(fd) => {
                    if let Some(ddi) = fd.developer_data_index {
                        parsing_state.set_developer_data_definition(ddi, FitDataMessage::FieldDescription(fd.clone()));
                    }
                }
                _ => {},
            }
        },
        _ => {},
    }

The borrow checker complains that I'm trying to borrow parsing_state mutably more than once:

error[E0499]: cannot borrow `*parsing_state` as mutable more than once at a time
   --> src/lib.rs:246:25
    |
167 |                     let (data_message, o) = FitDataMessage::parse(o, FitRecordHeader::Normal(normal_header), parsing_state, None)?;
    |                                                                                                              ------------- first mutable borrow occurs here
...
246 |                         parsing_state.set_developer_data_definition(ddi, FitDataMessage::FieldDescription(fd.clone()));
    |                         ^^^^^^^^^^^^^ second mutable borrow occurs here
...
256 | }
    | - first borrow ends here

What I don't understand about this is why parsing_state is still borrowed after the first block above. I've tried nesting that block further into a separate block, but that doesn't help. I assume that what's happening is that down in FitDataMessage::parse I end up storing a reference to an Rc object that I retrieve from parsing_state into the message I'm parsing, but I'm guessing at that.

I'm new to rust, and this is my first serious dance with the borrow checker, so I'll try asking the general form of my question: if I'm writing a parser like this with some state that is going to change from record to record during the parse, what's the right rust-y way to keep track of that state? It's unavoidable that I'm going to end up with messageN needing to have a reference to messageM given the spec, so I suspect that I just haven't found the right magic combination of Rc/RefCell/Box/whatever to convince the borrow checker that I'm not a bad person. Help appreciated!

TomP · September 13, 2018, 1:15am

It's not that you have "to convince the borrow checker that [you're] not a bad person", it's that you have to avoid giving the compiler's backend unrestricted authorization to optimize your code whatever way it likes because your program has (and thus authorizes) undefined behavior.

vitalyd · September 13, 2018, 1:35am

It’s hard to say with just the snippet you posted but if I’m imagining right what you’re saying here, then the borrow of parsing_state is propagated into fit_message and since that object is still live at the point you try to borrow `parsing_state‘ again, the compiler rejects it.

Put another way, a function like:

fn foo<'a>(x: &'a mut Something) -> SomethingElse<'a> { ... }

propagates (ie retains) the borrow of Something, via the returned SomethingElse, because the lifetimes are linked up from input to output.

Instead of returning a reference to the Rc, can you clone the Rc and use the clone instead? That’ll sever the borrow.

Also, if you can put your code on the playground, a minimal example, it’ll be easier for everyone to help you.

djk121 · September 13, 2018, 11:45am

I'll try to figure out how to reduce to a minimal example, but there's rather a lot of code between the snippet upthread and here, and I suspect the problem is here:

impl<'a> FitMessageAccelerometerData<'a> {
    pub fn parse(input: &'a [u8], header: FitRecordHeader, parsing_state: &'a mut FitParsingState, offset_secs: Option<u8>) -> Result<(Rc<FitMessageAccelerometerData<'a>>, &'a [u8])> {
        let definition_message = parsing_state.get(header.local_mesg_num())?;
        let mut message = FitMessageAccelerometerData {
            header: header,
            definition_message: Rc::clone(&definition_message),
            developer_fields: vec![],
            timestamp: None,
            timestamp_ms: None,
            sample_time_offset: None,
            accel_x: None,
            accel_y: None,
            accel_z: None,
            calibrated_accel_x: None,
            calibrated_accel_y: None,
            calibrated_accel_z: None,
        };

        let inp = &input[..(message.definition_message.message_size)];
        let tz_offset = parsing_state.get_timezone_offset();
        let o = match FitMessageAccelerometerData::parse_internal(&mut message, input, tz_offset) {
            Ok(o) => o,
            Err(e) => {
                let mut err_string = String::from("Error parsing FitMessageAccelerometerData:");
                err_string.push_str(&format!("  parsing these bytes: '{:x?}'", inp));
                err_string.push_str(&format!("  specific error: {:?}", e));
                return Err(Error::message_parse_failed(err_string))
            }
        };

        match offset_secs {
            Some(os) => {
                message.timestamp = Some(parsing_state.get_last_timestamp()?.new_from_offset(os));
            },
            None => {
                match message.timestamp {
                    Some(ts) => {
                        parsing_state.set_last_timestamp(ts);
                    },
                    None => return Err(Error::missing_timestamp_field())
                }
            }
        }
        let mut inp2 = o;
        for dev_field in &message.definition_message.developer_field_definitions {
            let dev_data_definition = parsing_state.get_developer_data_definition(dev_field.developer_data_index)?;
            let field_description = dev_data_definition.get_field_description(dev_field.definition_number)?;
            let (dd, outp) = FitFieldDeveloperData::parse(inp, field_description.clone(), message.definition_message.endianness, dev_field.field_size)?;
            message.developer_fields.push(dd);
            inp2 = outp;
        }

        Ok((Rc::new(message), inp2))
    }

Specifically, this block:

     for dev_field in &message.definition_message.developer_field_definitions {
            let dev_data_definition = parsing_state.get_developer_data_definition(dev_field.developer_data_index)?;
            let field_description = dev_data_definition.get_field_description(dev_field.definition_number)?;
            let (dd, outp) = FitFieldDeveloperData::parse(inp, field_description.clone(), message.definition_message.endianness, dev_field.field_size)?;
            message.developer_fields.push(dd);
            inp2 = outp;
        }

I think where my mental model is broken is in how the HashMaps in parsing_state behave WRT borrows. I was thinking of them as holders that released a borrow at the end of this block, but that's clearly not correct. Maybe the issue is that the FitDeveloperDataDefinition object that winds up in dev_data_definition above is a reference (not in an Rc), then when I get field_description out of that, even though it's a cloned Rc, there's somehow a reference to dev_data_definition that winds up in the FitFieldDeveloperData struct?

This is confusing to me, as I would assume that because dev_data_definition goes out of scope each time through the loop, and because field_definition is a clone of an Rc, that there wouldn't be any references borrowed by the time that the parse function returns.

vitalyd · September 13, 2018, 12:06pm

If you don’t retain anything (ie no references to it) from parsing_state in that method then you should change the signature to:

pub fn parse(input: &'a [u8], header: FitRecordHeader, parsing_state: &mut FitParsingState, offset_secs: Option<u8>) -> Result<(Rc<FitMessageAccelerometerData<'a>>, &'a [u8])>

Note the parsing_state is not tied to the 'a lifetime anymore.

As mentioned, it’s hard to say whether that works with your code (I’m also on mobile so code "navigation" is suboptimal ) without seeing more of it, but if parsing state can have a different borrow lifetime, which doesn’t escape into the return value, then it may work.

trentj · September 13, 2018, 1:12pm

Looks like something messed up your code formatting @vitalyd -- the second half of the line is HTML-escaped.

vitalyd · September 13, 2018, 1:17pm

Argh, yeah - thanks @trentj; I've fixed it up.

djk121 · September 13, 2018, 2:30pm

@vitalyd: Thanks, I knew what you meant. And thank you for the quick responses and attempts to help.

Removing the lifetime parameter causes the compiler to complain about it being missing from that parsing_state object:

error[E0621]: explicit lifetime required in the type of `parsing_state`
     --> src/fittypes.rs:16906:9
      |
16868 |     pub fn parse(input: &'a [u8], header: FitRecordHeader, parsing_state: &FitParsingState, offset_secs: Option<u8>) -> Result<(Rc<FitMessageWorkoutStep<'a>>, &'a [u8])> {
      |                                                                           ---------------- help: add explicit lifetime `'a` to the type of `parsing_state`: `&fitparsingstate::FitParsingState<'a>`
...
16906 |         Ok((Rc::new(message), inp2))
      |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ lifetime `'a` required

I can't figure out a neat way to get a repro into the playground, so I just put the code up here:

https://github.com/djk121/fitparse/blob/master/src/lib.rs#L159

Thanks again for the help!

vitalyd · September 13, 2018, 5:01pm

Yikes, that's a lot of code .

Ok, I'm going to use FitDataMessage::parse() here, but will pretend you have just a single concrete FitFieldMesgNum variant that you're matching. In other words, let's pretend that impl block + method is:

impl<'a> FitDataMessage<'a> {
    pub fn parse(input: &'a [u8], header: FitRecordHeader, parsing_state: &mut FitParsingState<'a>, offset_secs: Option<u8>) -> Result<(FitDataMessage<'a>, &'a [u8])> {
        let definition_message = parsing_state.get(header.local_mesg_num())?;
        match definition_message.global_mesg_num {
            FitFieldMesgNum::FileId => {
                let (val, o) = FitMessageFileId::parse(input, header, parsing_state, offset_secs)?;
                Ok((FitDataMessage::FileId(val), o))
            },
            _ => Err(Error::unknown_error())
        }
    }
}

You have lots more variants there, but I believe they'll all end up needing the same tweak.

First thing to note is the change to the parsing_state parameter: it's using a different (elided) lifetime parameter for the borrow, but 'a for FitParsingState<'a>. Let's jump to FitMessageFileId::parse that's called there; it's quite lengthy, so I'll point just a few bits and pieces.

First, you'll want to make the same signature change to the parsing_state parameter: parsing_state: &mut FitParsingState<'a>.

Next, there are a couple of (easy to make) mistakes in code called from this loop.

FitParsingState::get_developer_data_definition() needs to disassociate the lifetime of the returned FitDeveloperDataDefinition from the borrow of self. The signature needs to be changed to this:

pub fn get_developer_data_definition(&self, developer_data_index: u8) -> Result<&FitDeveloperDataDefinition<'a>>

Note the added 'a to the FitDeveloperDataDefinition in the return type.

Similarly, FitDeveloperDataDefinition::get_field_description() needs to do the same for its return type:

fn get_field_description(&self, field_number: u8) -> Result<Rc<FitMessageFieldDescription<'a>>>

And I think that's it. You'll need to make similar changes for all your other parse fns used for the other variants, and perhaps similar lifetime parameter changes that get_developer_data_definition needed for other structs. But if you do that, it should compile.

My recommendation is comment out all the variants you have in the main parse, and add them back one-by-one (like I did with FileId only) and make sure you get it to compile.

djk121 · September 14, 2018, 1:05am

Yes, that worked. Thank you!

IIUC, the mistake I was making was applying the lifetime to the reference I was borrowing, not the members of the struct. The difference between &'a mut Foo and & mut Foo<'a>, right? If I follow, the latter means "the reference must match lifetime 'a", and the latter means "the lifetime of the reference is inferred, but members of the struct which have a lifetime parameter must live at least as long as 'a". Have I got that right?

If I do, I'm a bit confused by the difference - how could these two things be meaningfully different?

vitalyd · September 14, 2018, 6:10pm

Sort of, yeah. The former is really the issue here because 'a happens to be the lifetime parameter the types themselves are generic over, and is also the lifetime you "pass around" via the various return types. If you associate a mutable borrow of Foo with that same lifetime, you'll end up telling the compiler to keep it borrowed for much longer than you really intended. There's a secondary issue with having &'a mut Foo<'a> type of borrows due to Foo itself having a lifetime parameter; a mutable borrow makes Foo<'a> invariant over the 'a lifetime. I won't get into this here since I don't think this case even got that far, but it's another thing to keep in mind (the nomicon has more information on variance in Rust).

So the more important bit for your case is you want the mutable borrow of ParsingState to be as short as possible, and not at all related to 'a itself. This is where &mut ParsingState<'a> comes into play - the compiler will use some (elided here) lifetime for the mutable borrow, but it'll be as short as possible, typically just the method.

As for the issues with get_developer_data_defintion, it's similar. If you have a method signature like:

fn get_developer_data_defintion(&self) -> &FitDeveloperDataDefinition

Then if we were to write it out without lifetime elision, it'd be:

fn get_developer_data_defintion<'self>(&'self self) -> &'self FitDeveloperDataDefinition<'self>

So not only is the returned borrow associated with the borrow of self, but so is the lifetime inside FitDeveloperDataDefinition - we no longer carried over the fact that we, in fact, store FitDeveloperDataDefinition<'a>. This is a legal signature because we're essentially substituting a longer lived value (FitDeveloperDataDefinition<'a>) for a shorter one (FitDeveloperDataDefinition<'self>). The problem is the callers expect to have a FitDeveloperDataDefinition<'a>. So we just fix up the signature to be more precise.

Happy to expand more if you have other/further questions.

Topic		Replies	Views
Problems with a recursive parser help	16	1957	January 12, 2023
Understanding "`var` dropped here while still borrowed" help	5	3508	January 12, 2023
Cannot borrow error leaving me confused help	4	651	January 12, 2023
Approach to modify a struct using helpers within the struct? help	21	7112	January 12, 2023
Theory vs. practice lifetime and borrow help	3	1801	January 12, 2023

Can't figure out how to avoid multiple mutable borrows in a parser

Related topics