Fastest way to format a file

tldr : I want to format a structured textual file format (LaTeX), what is the best way to do it ?

The "what" of the story

On my way to the rustification of my own kingdom, I am trying to build a program to format LaTeX files. Not to parse them mind you (but maybe; more on that later), I am aware of some "facts". To not reinvent the wheel I searched for past realizations that might follow that path, I found some, the most notable in my opinion use regular expressions to do its job.

The proposed solution(s)

To be a little fancier, a little voice whispered in my ears to build an AST of the files and to apply transformation to it (as Rustfmt does ?), is it a good idea ? Should I just stick to regular expressions ? What would be (if both are possible) the fastest one ?

I'm not an expert at all, but I would say an alternative to RegEx would be to use the Pest or Nom crates. I wouldn't be able to say anything about performance comparisons though. I like using those better than RegEx for parsing structured text.

Do not parse structured text with regex! Never, ever!


But sometimes they are a useful tool as a component of a parser.

1 Like

Yes, indeed this post is so well known that I think it became a meme. Regular expressions were the last resort in my list as a quick-an-dirty option.

Thank you for your answer. I indeed considered using pest or nom. I even tried a prototype in pest by porting a Scala codebase but the fact the parsing did not output Rust struct felt... unidiomatic.

I don't know if it is ironic or really frightening.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.