The right way to parse non-self-described text files?

Where would you look for such?

I don't know. I'm sure wikipedia has references to such articles at the bottom of the page on pages related to the topic.

1 Like

Thanks for information :slight_smile:

You can use Google scholar to search for scholarly articles and then see where they were published.

3 Likes

The main scholarly society for Computer Science research is the Association for Computing Machinery; several of their journals might be interested in an article about a new parallel parsing algorithm:

You can also look at IEEE journals and the commercial academic publishers (Elsevier, Springer, etc.)

3 Likes

Oh. Amazing! Thank you :purple_heart:

I have been working on a tokenizer for a while now as I learn Rust. It is incomplete (especially lacking documentation, which I haven't yet written).

However, in its current state it is fully functional, and I am nearly complete writing the parser scripting language too!

If you are interested in such things (and it seems like you are), maybe give it a review, or even help me finish it!

Unfortunately the best documentation you are going to get at the moment is from the unit tests. If you have any questions please just ask. Someday soon I hope to complete this, with documentation, and publish it... right now time is too constrained.

To start with, the best examples of how to create token matchers and use them to parse a stream and turn it into tokens, you could look at the build-in scripting language parser matchers: rust-adextopa-core/src/script/v1/matchers at main · th317erd/rust-adextopa-core · GitHub

1 Like

Thanks for all recommendations.

My solution is:


  peg::parser!
  {
    /// Grammar to parse polygons.
    grammar parser() for [ u8 ]
    {

      rule nl() -> ()
        = ("\r")? "\n"

      rule float() -> f32
        = text:$( "-"? [ b'0' ..= b'9' ]+ ( "." [ b'0' ..= b'9' ]* )? ) {? wtools::string::number::parse::< f32, _ >( text ).or( Err( "float" ) ) }

      rule pair() -> [ i32 ; 2 ]
        = "(" e0:float() " "* "," " "* e1:float() ")" { [ e0 as i32, e1 as i32 ] }

      rule polygon() -> Polygon
        = val:float() nl() cells:( pair() ** "," ) { Polygon { cells : positions_flatten( cells ), val : val as i32 } }

      /// Parse polygons.
      pub rule polygons() -> Polygons
        = nl()* polygons:( polygon() ** nl() ) nl()* { Polygons( polygons ) }

    }
  }