Dynamic parser tree lifetime issue

Hello Rustlers :wave:,

I am currently building up a dynamic modifyable parser tree based on a memory arena. This means we have parser nodes within a vector (all have the same lifetime) whereas these nodes are mainly Boxed trait objects. This enables building up a tree based on indexes, which reduces the complexity. The parsing itself is done via the nom framework.

The input of the main library method is an &[u8] whereas the output is a vector of parsed "results".
My main problem is now: If these results refer to contents of the input, it works only if the input lives longer than the parser tree.

This means this works:

fn ethernet_success() {
    // Create some input
    let input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 8, 0];

    // Create the parser tree
    let mut peel = Peel::new();
    peel.new_parser(EthernetParser);

    // Parse the result
    let result = peel.traverse(&input, vec![]).unwrap();
    assert_eq!(result.len(), 1);
}

But this wont:

fn ethernet_failure() {
    // Create the parser tree
    let mut peel = Peel::new();
    peel.new_parser(EthernetParser);

    // Create some input
    let input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 8, 0];

    // Parse the result
    let result = peel.traverse(&input, vec![]).unwrap();
    assert_eq!(result.len(), 1);
}
error: `input` does not live long enough
  --> tests/lib.rs:31:1
   |
29 |     let result = peel.traverse(&input, vec![]).unwrap();
   |                                 ----- borrow occurs here
30 |     assert_eq!(result.len(), 1);
31 | }
   | ^ `input` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

error: aborting due to previous error

error: Could not compile `peel`.

To learn more, run the command again with --verbose.

Currently (on my master branch) all parsers copy the parsed data, which works in any cases. To make references possible I added a lifetime to the Parser trait. The parser trait looks like:

//! General parser descriptions and traits
use nom::IResult;
use arenatree::{Node, Arena};

/// The type which will be stored within the tree structure
pub type ParserBox<'a, R, V> = Box<Parser<'a, Result = R, Variant = V> + Send + Sync>;

/// Arena tree for parsers
pub type ParserArena<'a, R, V> = Arena<ParserBox<'a, R, V>>;

/// A node within a `ParserArena`
pub type ParserNode<'a, R, V> = Node<ParserBox<'a, R, V>>;

/// The parsing trait
pub trait Parser<'a> {
    /// The type for result reporting, usually an enum
    type Result;

    /// The type of the parser itself, usually an enum
    type Variant;

    /// Parse using nom and return the result
    fn parse(&self,
             input: &'a [u8],
             node: Option<&ParserNode<Self::Result, Self::Variant>>,
             arena: Option<&ParserArena<Self::Result, Self::Variant>>,
             result: Option<&Vec<Self::Result>>)
             -> IResult<&'a [u8], Self::Result>;

    /// Return the actual enum variant of the parser
    fn variant(&self) -> Self::Variant;
}

The Result should refer to the input, for example with a simple EthernetParser:

impl<'a> Parser<'a> for EthernetParser {
    type Result = Layer<'a>;
    type Variant = ParserVariant;

    fn parse(&self,
             input: &'a [u8],
             _: Option<&ParserNode<Layer, ParserVariant>>,
             _: Option<&ParserArena<Layer, ParserVariant>>,
             _: Option<&Vec<Layer>>)
             -> IResult<&'a [u8], Layer<'a>> {
        do_parse!(input,
            d: take!(6) >>
            s: take!(6) >>
            e: map_opt!(be_u16, EtherType::from_u16) >>

            (Layer::Ethernet(EthernetPacket {
                dst: MacAddress(&d[0], &d[1], &d[2], &d[3], &d[4], &d[5]),
                src: MacAddress(&s[0], &s[1], &s[2], &s[3], &s[4], &s[5]),
                ethertype: e,
            }))
        )
    }

    fn variant(&self) -> ParserVariant {
        ParserVariant::Ethernet(self.clone())
    }
}

I will stop adding more code here since I think it is really hard to understand without the full context. The full source is available within a separate branch where I removed irrelevant code. The tests/lib.rs shows my current problem.

Do you have any idea how to solve this? Thank you very much for your help! Best greetings from germany!

The issue is that the Peel needs to live for as long as its input, because of that 'a parameter that it needs to thread around. That means you have to borrow the input for the lifetime of the Peel, so Peel needs to be dropped before the input its borrowed.

So far as I can tell, the lifetime is leaking all over the shop because you might need it on the associated type for Parser, like EthernetParser. To get around this, there's a proposed new feature called Associated Type Constructors, that would let you use an undeclared lifetime for the type Result = Layer<'a> associated type, and then bind that lifetime in the fn parse<'a> instead of the trait Parser declaration.

To get around it right now, you could try get that lifetime parameter off the associated type. From what I can see, what you end up borrowing is just a few bytes, so is it a big cost to copy them?

1 Like

Thank you very much for your reply. Dropping Peel before the input is no option because I want to process multiple input slices without initializing the whole tree everytime. Since the input can be around 1500 byte (like a network big network packet), it truly costs performance. An example would be a u8 string, which needs to be converted to utf8 and then owned to a String.

The version without the lifetime is within the current master branch of the repository, and it works just nice. It could also be possible to get rid of the trait objects, which will make it a bit more unflexible for other parsers to jump in.

Ah that makes sense :thumbsup: Sorry for not being more insightful, I haven't had much of a chance to play with your particular example. Sometimes you just have to fiddle with lifetimes to get the exact semantics right.

If you can get that lifetime onto the parse function instead of the trait then the lifetime of the parsers won't be tied to the lifetime of the input. With associated type constructors this should be possible. But unfortunately they aren't a thing yet. With raw pointers you could also do this. But that means losing the benefits lifetimes give you in the first place.

Where is that Result associated type actually used? If it's just to keep parsers monomorphic over the return type could you turn that into a generic parameter instead and use a function wrapper to bind that to a particular type with a lifetime when calling parse?

Yes the Result could be made into an generic, which will cause that I can not put the Trait into an Object any more. This means I can't use it within my tree structure. :no_mouth:

That's unfortunate :frowning:

You might have to resort to using raw pointers in that case. And bind those back to borrowed references taken from the lifetime of the input before handing it back to the caller. So you've got nice lifetimes at the edges of your API, but internally to save you sanity, result associated types use *const T from that original input.

You should be able to use higher-ranked trait bounds:

type ParserBox<R, V> = Box<for<'a> Parser<'a, Result = R, Variant = V>>;
2 Likes

I'm still not sure how that would get around the implementation of the EthernetParser requiring a bound lifetime though:

impl<'a> Parser<'a> for EthernetParser {
    type Result = Layer<'a>;
}

Yeah and I guess this is the main problem here. :hushed: I will try out the raw pointer solution in the next days, this will just mean that the parser returns a raw pointer which whill be casted to the actual Layer before pushing it into the result vector right?

1 Like

I'd be really interested if that HRTB example could be made to work for Peel. You'd probably end up having to use closures as the parsers because there isn't anything else that can inhabit that higher-kinded type for<'a> Parser<'a>. It's abstraction at the wrong point. But once we can work with type constructors for lifetimes in more places this will be the way to go.

Maybe @comex has some better insight here. I've had pretty limited experience with HRTB because I haven't been able to use them where I'd expect.

Good luck!

I just found out that the usage of raw pointers is also not really handy. First, I have to to something like Box::into_raw(Box::new(layer)) which then could be casted to an *const u8. After that I have to find out which type it is from the main traversal function to cast it back to the actual type. This is nearly impossible since the Peel structure is using the generic R and not an *const u8. Peel does and should not know anything about the type R, so a casting back into the object is not possible. Furthermore the Box::into_raw will leak memory since the relation to the result will be lost after traversal.

So using raw pointers is no option I guess. :dizzy_face: