Any interest in a pretty-printing crate for Syn?

I'm thinking about pulling the rustc libsyntax AST pretty-printer out and converting it to use the Syn AST. Would anyone else be interested in using such a crate? Any gotchas I might run into?

The background on this is that we use the libsyntax pretty-printer fairly heavily in c2rust and recently ran into some breaking changes that are making us rethink that dependency. Several interfaces we used to use are now crate-private, which makes sense. Rather than try to work around or change that, I think it makes more sense to build our own pretty-printer, starting from the existing code.

I've investigated using Quote+Syn, but unfortunately to_tokens() by itself is insufficient for our use case; we need to inject comments into the output code. The compiler pretty-printer has limited support for comment handling, and we can extend that support if we break it out into a crate and use Syn instead of libsyntax. I couldn't find any other pretty-printers that handle comments, anyone know of one?

3 Likes

I'm interested.

An AST pretty printer would be very useful for macro development process.

2 Likes

Can’t you lean on rustfmt? Dump tokens to test somehow, and let rustfmt clean them up.

Long term, I feel like we should remove pretty printing code from rustc and just re-cast rustfmt as a library for formatting loseless syntax trees (which we don’t have yet, that’s a large yak to shave).

3 Likes

I looked into this, but it seems we can't just use rustfmt, unfortunately. We are less concerned with the "pretty" part of pretty printing AST, and more concerned with the ability to build the AST from scratch while injecting comments into the resulting code. Tokens don't include comments (other than docstring attributes, but they are handled specially), which means we would have to somehow artificially rewrite the resulting string to insert comments. Rustfmt sort of does this to preserve comments, by re-inserting them from the input spans while printing, but I didn't see any sort of good way to inject new comments into rustfmt.

Thinking a bit more here, another option would be to attach comments to spans, as the current rustc pretty-printer does, but use Syn to convert to tokens, interpolate the comments between the tokens, and pass the full string through rustfmt. I'll look into this more...

As @matklad pointed out, the underlying need is for lossless syntax trees, rather than rustc's current lossy AST. That's something that he's been working on, in the form of red/green trees, but as he pointed out, it's currently a long way from completion. Perhaps those who need this could help advance his project.

Looks like you have two problems:

  1. generate code with comments
  2. print it nicely

I think if you solve 1, you get 2 mostly for free from rustfmt. Solving 1 would be tough. It would b super sweet to just use syn and quote, but they by design don’t support comments (and quote actually can’t support comments at all with usual syntax). Forking syn to add comments seems like a ton of work. I’d do something utterly horrible, and

  1. generate comments as ‘comment_xxx’ identifiers in syn
  2. dump token tree to string
  3. s/comment_xxx/comment-text
  4. run rustfmt over this.
1 Like

I’d do something utterly horrible

Yeah, that's pretty much my plan for the sourcegen crate:

To add to the rustfmt point, it actually works pretty nice (but you need to do some extra work for doc comments and also generating empty lines for extra spacing is problematic, too).

I did have to write my own printing for proc_macro2::TokenStream so I can replace doc comments with /// (nightly rustfmt can do that for you, but I needed to support stable rustfmt):

Yeah, as horrible as treating comments as idents is, it might work (partially). However, that severely limits the places that comments can be located, and I'm not sure that's going to work for us.

I've been digging into your rust-analyzer and rowan code more today, and I'm wondering how viable it would be to generate (A/C)ST syntax nodes in a programmatic AST builder, dump all of that to text, then run that through rustfmt. The rust-analyzer CST seems to support comments as first class citizens, so it might actually be perfect for this, assuming it's not too crazy to port our current rustc libsyntax AST builder to build rust-analyzer AST.

I'm wondering how viable it would be to generate (A/C)ST syntax nodes in a programmatic AST builder, dump all of that to text, then run that through rustfmt.

That should work. Basically, we should just replicate the API from Swift's libsyntax: https://github.com/apple/swift/tree/master/lib/Syntax. However, rust-analyzer is an experimental project, and not backwards compatibility guarantees are provided now. So, from a practical point of view, I would advise against basing production work on top of rust-analyzer libraries, unless you are confident that you can support it yourself/migrate to the other library.

What do you want to do with generated rust code? Do you want to just dump into a file, or do you want to manipulate it in a structured way? That is, do you need an syntax data structure, or would just dumping bytes to io::Writer would work for you?

Long-term, I think building on rust-analyzer will be where we end up for a CST with comments. I'll be looking into this and try to keep up with the project. For now, I don't think we can even think of switching over yet until there's ongoing crates.io releases for ra_syntax.

Half of c2rust is a straightforward C -> unsafe Rust transpiler that just dumps valid Rust code into files. For this component we can be pretty flexible in what AST we use, as we don't need to do much other than dump it out after generation. The transpiler is pretty heavily tree-based, so I'd prefer to stick with some sort of AST, naturally.

Manipulation and refactoring is the other half, and for that we will continue using the rustc APIs and libsyntax for the foreseeable future. Eventually I'd love to rewrite it to work with the rust-analyzer project, as we have a similar use-case to an IDE frontend, but that will probably have to wait a good long while.