Hi all! I've been using nom a lot recently for parsing "programming-language-esque" things, and I think I've finally come up with a principled way of handling whitespace. Note: this will only really work if you're parsing "whitespace-insensitive" languages (though these languages still require whitespace in some places, like around keywords).
(See the attached gist for the full definition and some examples.)
The main idea is to get yourself asking questions like:
Is whitespace allowed to appear after/before/around this thing?
Is whitespace required to appear after/before/around this thing?
Is it required, or allowed for whitespace to appear exactly here?
I have yet to do much testing with this, but I have been able to refactor the parser for my pet-project and it at least looks a lot nicer. I'd appreciated any feedback y'all have on this, and if people like this, maybe we could make it into a crate! Thanks!
To make it general purpose, do you think it'd be worth separating out the logic for comments, or otherwise allowing configuration constitutes a comment? Right now the code looks limited to languages which use // comments. To be honest, though, I'm not sure what the best way to make that configurable would be, though...
I feel like this doesn't read very smoothly though. Plus, you'd have to rely on the user defining a struct with a name that reads well.
Option 2: Statically Configurable
Maybe you could just pass in the end-of-line comment signifier string as a config option in Cargo.toml somehow. Like you could specify "this language I'm parsing uses // to begin end-of-line comments".
Conclusion
Neither of these options allows the user to specify multi-line comments, so they are in no way a complete solution.