Parse Rust source code and track line numbers

Hi all,

I'm trying to parse some Rust source code and found the syn crate for this. This works great and the quick compile time is nice!

However, I don't seem to able to find the offset or line number for the things I parse. Am I missing something, or has this functionality been left out of syn? I believe syn is primarily meant for procedural macros, and there line numbers might not be needed.

If someone has a suggestion for a different parser crate, then I would be very happy to hear about it! :slight_smile:

Well, there will be GitHub - matklad/fall, once it is ready, but at the moment it is slow, unfinished, does not have a propper API yet, does not support all rust syntax and is not published to crates.io :slight_smile:

But the syntax tree it produces already preserves 100% of information about offsets, whitespace and all that other "irrelevant" information :slight_smile:

Here are some tests that give a feel about the current state of Rust parser: https://github.com/matklad/fall/blob/master/lang/rust/tests/inline.txt.

2 Likes

My Fuzzy Pickles crate is a parser that parses all of the Rust code I've thrown at it* (some 30K files from crates dating back to Rust 1.0), returning an AST of extents — the byte offsets of the beginning and end of each element.

You can then transform that to lines / columns (the error implementation does this). It currently uses impl trait, so it requires nightly, but theoretically it should be able to work with boxed trait objects.


* It does have one known issue - nested block comments aren't supported. I'm also pretty sure that the expression precedence is wrong, but that's "just" tweaking some numbers in a function.

2 Likes

BTW I've seen fuzzy-pickles, and from my understanding fall thing would be a perfect fit for it, once (and if) it is ready :slight_smile:

1 Like

Do you mean that fall would be a replacement for peresil?

I've got a number of crates related to my overall project (Strata Rust)... and it looks like I forgot to even change the README for Fuzzy Pickles when I extracted it from Strata Rust... :sweat_smile:

1 Like

Okey, I've definitely got confused, because I've seen Strata Rust (and that's what I was thinking about) and I have not actually seen fuzzy-picklers in it's extracted form, only as an "untitled rust parser" :slight_smile: So yeah, fall could be a replacement for persil, and for manually written AST/visitors. And lang_rust, which is implemented with fall and lives in its directory, could become a replacement for fuzzy-pickers.

A short elevator pitch for fall is that it is ("will be" would be more correct though) a parser/AST generator which is lossless (has comments, whitespace, and smartly attaches them to proper nodes), generates conveniet to use AST and agressively recovers from the errors.

So, you write bnf-like rules like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/fall.fall#L88-L112, then describe AST structure for important nodes like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/fall.fall#L311-L317 and then do whatever you want with the AST by visiting nodes using either untyped representation with (span, u32_node_type_id) or a typed representation based on AST node types, like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/editor_api/mod.rs#L30-L82.

Not that I advise to use it for anything, but, if you are writing your parser anyways...

The old syntex_syntax can do it. Parsed nodes have Span, which can be translated to line/column.

2 Likes

Thanks @kornel! That definitely looks useful. I might end up using that crate then, even though GitHub - serde-deprecated/syntex: No longer maintained just has one commit(?!) and now simply says:

Syntex is no longer maintained.

Hehe, you're really selling it well! :smiley:

The goal of preserving all information is great -- I think there need to be such a parser somewhere in the ecosystem for tools like linters and formatters, as you say.

So to sum up, am I right that the landscape of crates that parse Rust source code looks something like this:

Works on stable Rust:

  • syntex_syntax: full-featured, works on stable, no longer maintained.
  • syn: no line number information, fast compile times, works on Rust 1.15, maintained.
  • fall: preserves information about offsets, but is slow, unfinished, and unreleased (@matklad's own words :slight_smile: )

Requires nightly Rust:

I'm not fully sure how I should feel about being left out of the list... is fuzzy pickles disqualified for some reason or the other?

4 Likes

No, not at all... I'm terribly sorry that I forgot to include it.

Am I right in thinking that your crate targets nightly Rust?

2 Likes

The crate works with stable, but for the parser itself, I would love to be able to parse nightly Rust syntax as well.

Awesome, I've just updated my little "survey" above :slight_smile: I'll be happy to give it a try for my little project -- I switched to syntex_syntax now and that compiles rather slowly (which is why I tried the newer syn crate in the first place).

The project I've been working on is version-sync -- a small crate which lets you add an integration test to check that the

#![doc(html_root_url = "https://docs.rs/foo/1.2.3")]

line is kept up to date with the crate version number.

1 Like