Parse Rust source code and track line numbers


#1

Hi all,

I’m trying to parse some Rust source code and found the syn crate for this. This works great and the quick compile time is nice!

However, I don’t seem to able to find the offset or line number for the things I parse. Am I missing something, or has this functionality been left out of syn? I believe syn is primarily meant for procedural macros, and there line numbers might not be needed.

If someone has a suggestion for a different parser crate, then I would be very happy to hear about it! :slight_smile:


#2

Well, there will be https://github.com/matklad/fall, once it is ready, but at the moment it is slow, unfinished, does not have a propper API yet, does not support all rust syntax and is not published to crates.io :slight_smile:

But the syntax tree it produces already preserves 100% of information about offsets, whitespace and all that other “irrelevant” information :slight_smile:

Here are some tests that give a feel about the current state of Rust parser: https://github.com/matklad/fall/blob/master/lang/rust/tests/inline.txt.


#3

My Fuzzy Pickles crate is a parser that parses all of the Rust code I’ve thrown at it* (some 30K files from crates dating back to Rust 1.0), returning an AST of extents — the byte offsets of the beginning and end of each element.

You can then transform that to lines / columns (the error implementation does this). It currently uses impl trait, so it requires nightly, but theoretically it should be able to work with boxed trait objects.


* It does have one known issue - nested block comments aren’t supported. I’m also pretty sure that the expression precedence is wrong, but that’s “just” tweaking some numbers in a function.


#4

BTW I’ve seen fuzzy-pickles, and from my understanding fall thing would be a perfect fit for it, once (and if) it is ready :slight_smile:


#5

Do you mean that fall would be a replacement for peresil?

I’ve got a number of crates related to my overall project (Strata Rust)… and it looks like I forgot to even change the README for Fuzzy Pickles when I extracted it from Strata Rust… :sweat_smile:


#6

Okey, I’ve definitely got confused, because I’ve seen Strata Rust (and that’s what I was thinking about) and I have not actually seen fuzzy-picklers in it’s extracted form, only as an “untitled rust parser” :slight_smile: So yeah, fall could be a replacement for persil, and for manually written AST/visitors. And lang_rust, which is implemented with fall and lives in its directory, could become a replacement for fuzzy-pickers.

A short elevator pitch for fall is that it is (“will be” would be more correct though) a parser/AST generator which is lossless (has comments, whitespace, and smartly attaches them to proper nodes), generates conveniet to use AST and agressively recovers from the errors.

So, you write bnf-like rules like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/fall.fall#L88-L112, then describe AST structure for important nodes like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/fall.fall#L311-L317 and then do whatever you want with the AST by visiting nodes using either untyped representation with (span, u32_node_type_id) or a typed representation based on AST node types, like this: https://github.com/matklad/fall/blob/5f505905d2d24a48de29fce75031b9fbb8654e49/lang/fall/src/editor_api/mod.rs#L30-L82.

Not that I advise to use it for anything, but, if you are writing your parser anyways…


#7

The old syntex_syntax can do it. Parsed nodes have Span, which can be translated to line/column.


#8

Thanks @kornel! That definitely looks useful. I might end up using that crate then, even though https://github.com/serde-rs/syntex just has one commit(?!) and now simply says:

Syntex is no longer maintained.


#9

Hehe, you’re really selling it well! :smiley:

The goal of preserving all information is great – I think there need to be such a parser somewhere in the ecosystem for tools like linters and formatters, as you say.


#10

So to sum up, am I right that the landscape of crates that parse Rust source code looks something like this:

Works on stable Rust:

  • syntex_syntax: full-featured, works on stable, no longer maintained.
  • syn: no line number information, fast compile times, works on Rust 1.15, maintained.
  • fall: preserves information about offsets, but is slow, unfinished, and unreleased (@matklad’s own words :slight_smile: )

Requires nightly Rust:


#11

I’m not fully sure how I should feel about being left out of the list… is fuzzy pickles disqualified for some reason or the other?


#12

No, not at all… I’m terribly sorry that I forgot to include it.

Am I right in thinking that your crate targets nightly Rust?


#13

The crate works with stable, but for the parser itself, I would love to be able to parse nightly Rust syntax as well.


#14

Awesome, I’ve just updated my little “survey” above :slight_smile: I’ll be happy to give it a try for my little project – I switched to syntex_syntax now and that compiles rather slowly (which is why I tried the newer syn crate in the first place).


#15

The project I’ve been working on is version-sync – a small crate which lets you add an integration test to check that the

#![doc(html_root_url = "https://docs.rs/foo/1.2.3")]

line is kept up to date with the crate version number.