Grammar rules in pest (a PEG parser generator)

Can someone shed some light into the grammar rules of pest (a PEG parser generator) or what I'm doing wrong in this simple example. I try to translate some bison/flex rules. One of them is:

NUMBER [-+]?([0-9]+|(([0-9]+\.[0-9]*)|(\.[0-9]+)))([eE][-+]?[0-9]+)?

Maybe I'm too tired to see my mistake but I did translate that to:

impl_rdp! {                                                                                             
    grammar! {                                                                                          
        // IDENT [a-zA-Z_][a-zA-Z_0-9]*                                                                 
        ident =  { ['a'..'z'] | ['A'..'Z'] | ["_"] ~                                                    
                     (['a'..'z'] | ['A'..'Z'] | ["_"] | ['0'..'9'])* }                                  
        // NUMBER [-+]?                                                                                 
        //        ([0-9]+|                                                                              
        //         (                                                                                    
        //          ([0-9]+\.[0-9]*)|                                                                   
        //          (\.[0-9]+)                                                                          
        //         )                                                                                    
        //        )                                                                                     
        //        ([eE][-+]?[0-9]+)?                                                                    
        number = {                                                                                      
            (["-"] | ["+"])? ~                                                                          
                (['0'..'9']+ |                                                                          
                 (                                                                                      
                     (['0'..'9']+ ~ ["."] ~ ['0'..'9']*) |                                              
                     (["."] ~ ['0'..'9']+)                                                              
                 )                                                                                      
                ) ~                                                                                     
                (["e"] | ["E"] ~ (["-"] | ["+"])? ~ ['0'..'9']+)?                                       
        }                                                                                               
    }                                                                                                   
}                                                                                                       

The full example code is here:

https://github.com/wahn/rs_pbrt/blob/master/examples/pest_test.rs

In the example I read the string to parse from a file, but basically it matches -.00123456789 but not e.g. -0.0123456789:

./target/release/examples/pest_test -i assets/scenes/pest_test.pbrt
FILE = assets/scenes/pest_test.pbrt
[Token { rule: number, start: 0, end: 13 }]

vs.

./target/release/examples/pest_test -i assets/scenes/pest_test.pbrt
FILE = assets/scenes/pest_test.pbrt
thread 'main' panicked at 'assertion failed: parser.end()', examples/pest_test.rs:89
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Are the brackets a problem? The examples I found were pretty simple and worked. Do I have to split into several rules?

I think the problem is that in

for the input -0.01... the first branch is chosen, which does not include a decimal point. Like in regex implementations, | is not required to try all branches and find the longest match. Reordering the branches should work.

1 Like

Hi @birkenfeld, thanks you your answer. Reordering works:

        // NUMBER [-+]?([0-9]+|(([0-9]+\.[0-9]*)|(\.[0-9]+)))([eE][-+]?[0-9]+)?                         
        number = {                                                                                      
            (["-"] | ["+"])? ~ // optional sign, followed by                                            
            (                                                                                           
                (                                                                                       
                    (["."] ~ ['0'..'9']+) // dot and digits                                             
                        | // or                                                                         
                    (['0'..'9']+ ~ ["."] ~ ['0'..'9']*) // digits, dot, and (optional digits)           
                )                                                                                       
                    | // or                                                                             
                ['0'..'9']+ // just digits                                                              
            ) ~ ( // followed by (optional)                                                             
                (["e"] | ["E"]) ~ // 'e' or 'E', followed by                                            
                (["-"] | ["+"])? ~ // optional sign, followed by                                        
                ['0'..'9']+ // digits                                                                   
            )?                                                                                          
        }                                                                                               

For the exponent "e" or "E" I had to use brackets as well, otherwise the one worked, but not the other.