Writing binary parser with Nom

I'm trying to write an parser for and a binary format using Nom,
I have limited experience in Rust and wrote some server side code with Axum, etc. but this is time when I couldn't avoid encountering borrow checker (eh..).

There is code I have:

use nom::number::complete::{le_u16, le_u32};
use nom::{combinator::map, sequence::tuple, IResult};

#[derive(Clone, Debug)]
pub struct BinaryFormat<'bfmt> {
    binary_data: &'bfmt [u8],
}

impl<'bfmt> BinaryFormat<'bfmt> {
    pub fn new(binary_data: &'bfmt [u8]) -> Self {
        Self { binary_data }
    }

    pub fn parse(&self) -> IResult<&[u8], ()> {
        let (_, header) = parse_chunk_header(self.binary_data)?;
        println!("header: {:?}", header);

        // TODO: continue parsing 

        Ok((self.binary_data, ()))
    }
}

#[derive(Clone, Debug, PartialEq, Eq)]
pub struct ChunkHeader {
    pub typ: u16,
    pub header_size: u16,
    pub chunk_size: u32,
}

pub fn parse_chunk_header(input: &[u8]) -> IResult<&[u8], ChunkHeader> {
    map(
        tuple((le_u16, le_u16, le_u32)),
        |(typ, header_size, chunk_size)| ChunkHeader {
            typ,
            header_size,
            chunk_size,
        },
    )(input)
}

#[cfg(test)]
mod tests {
    use super::*;
    use anyhow::Result;

    #[test]
    fn test_parser() -> Result<()> {
        let dir_path = std::path::Path::new("data/binaries");
        let files = dir_path.read_dir()?;
        for file in files {
            let file = file?.path().canonicalize()?;
            println!("testing {:?}", file);
            let file_bytes: Vec<u8> = std::fs::read(&file)?;
            let parser = BinaryFormat::new(file_bytes.as_slice());
            parser.parse()?;
        }

        Ok(())
    }
}

During running the test, getting this error:

error[E0597]: `file_bytes` does not live long enough
  --> src/binary_parser.rs:53:44
   |
52 |             let file_bytes: Vec<u8> = std::fs::read(&file)?;
   |                 ---------- binding `file_bytes` declared here
53 |             let parser = BinaryFormat::new(file_bytes.as_slice());
   |                          ------------------^^^^^^^^^^^^^^^^^^^^^-
   |                          |                 |
   |                          |                 borrowed value does not live long enough
   |                          argument requires that `file_bytes` is borrowed for `'static`
54 |             parser.parse()?;
55 |         }
   |         - `file_bytes` dropped here while still borrowed

error[E0597]: `parser` does not live long enough
  --> src/binary_parser.rs:54:13
   |
53 |             let parser = BinaryFormat::new(file_bytes.as_slice());
   |                 ------ binding `parser` declared here
54 |             parser.parse()?;
   |             ^^^^^^^^^^^^^^
   |             |
   |             borrowed value does not live long enough
   |             argument requires that `parser` is borrowed for `'static`
55 |         }
   |         - `parser` dropped here while still borrowed

I had borrow checker problem within the code but managed to fixed it but now same in the test.

How can I better structure the code for my case?
Any guidance would be greatly appreciated!

Quick squiz at the nom docs suggests that the problem here is that errors coming out of the parser contain a borrow of the input. Those errors are being converted into anyhow::Error which requires the errors to be 'static. This leads the compiler to deduce that all the data you're borrowing in order to parse it must be 'static, which it can't be (since it's on the stack).

Edit: actually, anyhow::Error itself isn't the problem. The compiler wouldn't let you propagate the errors out of the scope in which the input data is borrowed from anyway.

It looks like you want to call e.to_owned() on the errors in the IResult before propagating them.

1 Like

One problem is that your signature of parse ties the returned lifetime to the lifetime of self (that's how elision rules are defined). You want to return &'bfmt [u8] instead.

That only gets rid of one of the errors, though; I'm still trying to figure out why the 'static bound exists.

1 Like

While your observation is correct, to_owned() won't fix the error because it isn't transitive (it doesn't convert to a type recursively containing owned types only).

After applying my other suggestion to make the signature less restrictive, the error can finally be fixed by Display-formatting the error instead: Playground.

1 Like

I changed return type of parser to parsed object (no buffer there), and handled IResult error within the parser method via match. Now it works. Thanks!

First, resolve the actual result type involved:

IResult<&[u8], (), _>
IResult<&[u8], (), Error<&[u8]>>
Result<(&[u8], ()), Err<Error<&[u8]>>>

There is a specific impl Err<Error<&[u8]>> that provides a to_owned method that converts from Err<Error<&[u8]>> to Err<Error<Vec<u8>>> which is owned, and thus won't induce the 'static borrow requirement.

You'd be correct if I'd said to call to_owned on the IResult, but I said to call to_owned on the errors in the IResult; that is, (...).map_err(|e| e.to_owned()).

1 Like