Nom: parsing multi-line blocks separated by empty lines?

Hi folks !
I am trying to parse the following input using Nom (simplified)

  let input = "
    foo
    bar

    other

    baz
    biz
    ";

Essentially multi-line blocks separated by empty lines (or until eof), into Vec<Vec>
The blocks can vary in size (ie different number of items/ lines with identifiers in them).
I got things somewhat working if I use tuple and hard code the number of items + many0

For two item blocks for example:

tuple((
  spacey(identifier) , 
  alt(( tag("\n"), eof )), 
  spacey(identifier), 
  alt(( tag("\n"), eof )), 
  alt(( empty_line, eof )) 
))(input)

but I am looking for an actual solution for matching until an empty line correctly, and have tried a ton of of other variations (manyTill, takeUntil etc), without success.

Would really appreciate some help on this :slight_smile:

Thanks in advance!

I've found that for me it always helps to write the input grammar in EBNF if possible, then translate to nom-speak. Your input is essentially described by:

input = *block
block = 1*nonempty-line *empty-line
nonempty-line = 1*non-nl NEWLINE
empty-line = NEWLINE

I haven't defined non-nl and NEWLINE, but the definitions should be obvious. Then, * easily translates to many0, 1* to many1, single characters to tag etc. Note that this grammar requires that there are no empty lines before the first block, and that a single empty line is invalid input. Yours may or may not conform to this.

I created a playground with the full parser. It works with complete inputs, and would have to be written differently if you need streaming.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.