Searching for a "decorating" parser combinator crate

I want to parse text into a struct with a lot of optional fields. To make things easier, I would like to use an existing parser combinator crate.

Problem:
All crates I looked at, including nom, combine, glue, pom, etc. only offer "output-only" parsers, i.e. each parser is a function Fn*(Input) -> Result<(Input, Output), Error>. With this, I would first need to parse the fields into an intermediate type, before I can create the final struct from those. For example:

// The struct we want to parse into.
#[derive(Default)]
struct Struct {
    field1: Option<String>,
    field2: Option<String>,
    // ...
}

// This enum is just boilerplate.
enum StructField {
    Field1(String),
    Field2(String),
    // ...
}

// And this implementation of From is unnecessary as well.
impl From<Vec<StructField>> for Struct {
    fn from(struct_fields: Vec<StructField>) -> Self {
        let mut result = Self::default();
        for struct_field in struct_fields {
            match struct_field {
                Field1(value) => result.field1 = Some(value),
                Field2(value) => result.field2 = Some(value),
                // ...
            }
        }
    }
}

// Here the actually meaningful code starts.
// For simplicity I ignore all error handling in this example.
fn parse_struct(input: Input) -> (Input, Struct) {
    let (input, fields) = nom::many0(parse_field)(input);
    (input, fields.into())
}

fn parse_field(input: Input) -> (Input, StructField) {
    alt((parse_field1, parse_field2 /*, ...*/))(input)
}

fn parse_field1(input: Input) -> (Input, StructField) {
    value = // parse field1 ...
    (input, StructField::Field1(value))
}

fn parse_field2(input: Input) -> (Input, StructField) {
    value = // parse field2 ...
    (input, StructField::Field2(value))
}

Things that don't work:
I tried to circumvent this by using (RealInput, &mut Output) as Input type in nom, but many combinators (rightfully) require Input: Clone, thus making this approach impossible. I expect that all other listed parser combinator crates have the same limitation, as they all offer methods similar to many0. Such methods allow an internal parser to fail, and thus need to back up (clone) the input before passing it to the fallible parser.

Ideal solution:
If a crate would additionally offer parsers structured as Fn*(Input, &mut Output) -> Result<Input, Error> or Fn*(Input, Output) -> Result<(Input, Output), Error>, it would be much easier to parse the struct with optional fields, as I can first create the struct with all fields set to None, and then populate the fields as they appear in the input. I would not need to create the enum-boilerplate and the code would be much shorter in return. Also I can handle errors (illegally formatted struct field values) directly where they appear, as opposed to handling them only when combining the fields into a struct.

Example of the ideal solution:

#[derive(Default)]
struct Struct {
    field1: Option<String>,
    field2: Option<String>,
    // ...
}

// For simplicity I ignore all error handling in this example.
// This method has the structure of a "classic" parser.
fn parse_struct(input: Input) -> (Input, Struct) {
    let mut struct = Struct::default();
    let (input, fields) = nom::ref_mut_many0(parse_fields)(input, &mut struct);
    (input, fields.into())
}

// This and the following methods have the structure of the proposed parsers.
fn parse_fields(input: Input, output: &mut Struct) -> Input {
    alt((parse_field1, parse_field2 /*, ...*/))(input, output)
}

fn parse_field1(input: Input, output: &mut Struct) -> Input {
    output.field1 = // parse field1 ...
    input
}

fn parse_field2(input: Input, output: &mut Struct) -> Input {
    output.field2 = // parse field2 ...
    input
}

Does anyone know a parser-combinator crate that offers an interface like this? If not, is there a reason why not, e.g. my descibed "ideal solution" being flawed in some way I do not see?

I am sorry. I am missing something very basic.

If you can write a parser for "String -> T", why can't you slightly modify that parser to do String -> Option<T>.

At that point, you should be able to directly output to the "struct w/ optional fields" without creating an "intermediate struct"

I think that nom::fold_many0 can get close to what you want (untested):

#[derive(Default)]
struct Struct { /* ... */ }

impl Struct {
    fn set_field(&mut self, field: StructField) { /* ... */ }
}

// For simplicity I ignore all error handling in this example.
fn parse_struct(input: Input) -> (Input, Struct) {
    let (input, fields) = nom::fold_many0(
        parse_field,
        Default::default(), // Empty struct
        |acc, field| { acc.set_field(field); acc } )
    )(input)
}

Thank you for your answers. The core of the problem is: "how to know which field gets which value?"

@zeroexcuses I surely can make the parser output Option<String>, however then I lose information which field it belongs to. Note that the fields can appear in any order.

@2e71828 You still use the enum StructField as boilerplate, as well as the set_field method, which would be equivalent to my implementation of From.

I'm not really understanding what you want, then, as I don't see how your "ideal" solution addresses this problem. Eventually, you'll need to write some code that tells the compiler how to map text strings to storage operations-- It doesn't know how the language you're parsing represents field1, field2, etc.


Also, note that my prior solution doesn't necessarily rely on an enum. You could also make a type alias like type StructField = (String, String);. You still have to tell the compiler what each text string means in terms of Struct at some point, though.

2 Likes

@isibboi : I ran into the same confusion as @2e71828 and basically gave up on trying to help. I get the impression that (1) there are multiple issues you are trying to address and (2) in your mind, this is 1 issue, and (3) this is causing confusion.

The ideal solution has a separate method for each field, that parses the value for each field however it needs to be parsed, and then assigns the field directly.

The non-ideal solution has again a separate method for each field that parses it however it needs to be parsed, but then does not assign the field directly. It instead stores which field should be assigned to through the enum variant, only to later unpack that variant and assign the value to the right field.

Difference: The ideal solution can assign directly to the fields, the non-ideal solution needs another type (the enum) as a proxy between parsing the field value and assigning to the correct field.


Even if I would use a tuple, that would still be unnecessary boilerplate. And then the mapping between field and tuple value is somehow implicitly dependent on order, which is arguably less readable than the explicit enum.

It's a bit messy, but I think you should be able to write something like this:

type Setter = Box<dyn FnOnce(&mut Struct)>;

fn parse_struct<I>(input: I)->IResult<I, Struct> {
    let mut out: Struct = Default::default();
    let mut iter = nom::iterator(input, alt((parse_field1, parse_field2)));
    for setter in &mut iter {
        setter(&mut out);
    }
    iter.finish().map(|i,_| (i, out))
}

fn parse_field1<I>(input: I) -> IResult<I, Setter> {
    let f1 = // parse field1 ...
    Box::new(move |s| s.field1 = f1)
}

fn parse_field2<I>(input: I) -> IResult<I, Setter> {
    let f2 = // parse field2 ...
    Box::new(move |s| s.field2 = f2)
}

NB: I haven't used nom myself and am working solely from the documentation; there are likely significant errors here.

1 Like

Thanks, this is pretty cool!

I think it is clean enough, and much less work than integrating my "ideal solution" into an existing parser combinator crate.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.