Looking for a project? How about a TOML parser that writes nice files?


#1

Every now and then, people post here looking for suggestions for project to learn Rust. I have one: A crate that can read TOML files, edit them, and save them without loosing comments and whitespace.

In killercup/cargo-edit#15, we saw that the toml crate by @alexcrichton can read TOMl just fine, but when editing and saving a file, it looses information that is important to humans (e.g. the order of tables, comments and whitespace). @flying_sheep suggested that writing a new crate that supports this might be a nice project for a Rust beginner (though not necessarily a person who is also learning to program).

If you want to write such a crate, here are some useful links:

  • This PR on the original toml crate has a “very rough implementation of round-tripping TOML parser”
  • You might want to look at parser combinators like nom or a parser generator like lalrpop
  • A bunch of tips for setting up a Rust library project

The use case I have for this is editing Cargo.toml files in cargo-edit but I’m sure such a crate will be nice to have in other project as well.

Please comment here or on Github if you want to do this, and/or if you need any help :slight_smile:


#2

A jade parser could be another nice starting project.


#3

This’d be awesome to have! In the past I’ve wanted to write commands like cargo config which can read/write configuration files, but this has stopped me in the past.

I’ve been meaning for awhile to rewrite toml-rs's parser (perhaps with nom or lalrpop, I’m sure they’ll be faster!), and at the same time I was hoping to also restructure the representation to preserve whitespace/comments, but I haven’t had much experience with this before so I wasn’t quite sure how to make an ergonomic interface! If you need any help poking around toml-rs just let me know, and I’d also be fine merging this in the same crate.


#4

@flying_sheep suggested that […]

hey that’s me!

people not redditing should also check out the complementary reddit thread, even though it contains completely off-topic criticism of discourse. but i still love you, reddit :heart:

i also stumbled upon peresil as a text parsing library, but i don’t know its inertia and stability.


#5

if someone attempts a toml parser in nom, I’d be happy to help :smile:
This is the kind of use case that will prompt new interesting features.


#6

A small note, while PR for toml-rs round-tripping, might look slightly dead, it’s not. I’m pretty much finished with changes to the parser and the internal representation. I’m currently meandering on how to expose an ergonomic public interface on another branch (“internals are made mostly of Rc<RefCell<>>, should public interface expose RefCells? or wrap Ref/RefMut? or just go with unsafe casts?”, “you are going to have slightly different results if you query TOML document as a key-value structure and if you flatly iterate over its elements. does it make sense? should structs be shared?”, etc). I couldn’t find an example of a TOML library in other language that handles whitespace/comments/ordering (though to be fair I didn’t look too hard) so I’m moving a bit slowly with this.
That being said I concur that giving toml-rs parsing side a bit of a refactoring with nom or lalrpop is a good idea.


#7

FWIW I’ve started to write a parser for toml using nom as a My First Rust Project. The grammar specification is a little weird (at least I’ve never seen one like it), but I think I get what it means. Other than parsing a toml file (preserving everything) and spitting it back out (which alone would be useless) what kind of features are you looking for?


#8

Cool! The basic features that cargo-edit uses are querying and writing
fields. The current TOML crate basically exposes the interface of a
BTreeMap for this.


#9

Just noting here that the the official toml grammar is in the form of an ABNF. It’s not on the master branch, so it’s kind of hidden.


#10

The parsing part is getting close to done I only need to enable nested arrays (I have arrays working, I just haven’t enabled vals to be arrays yet), inline tables, expressions and the top level TOML production. I only started writing tests about half way through, so I’ll need to go back and write tests for all my earlier functions. I’ll also need documentation everywhere. I neglected to do that. When parsing is finished I’ll probably write a visitor to print out a parsed doc and then have a few tests read in real TOML files, write them back out and compare the results with the originals. Then I’ll do some clean-up, move code around, make it both a binary and a library. And then it’ll be ready for it’s first release.

If anyone is interested the repo is here.


#11

tomllib (I renamed toml_parser) is out now at version 0.1.1. It is currently limited to parsing documents, getting values, setting/changing values and getting subkeys of a key. This should be enough to get cargo-bump going. For the next set of features I’m targeting cargo-edit.