Tom: yet another format preserving TOML parser

Hello! I am pleased to announce tom v0.0.1, a new experimental TOML parser, which preserves all whitespace.

Note the 0.0.1 version: it means that the code is very much experimental/unimplemented!(), and that you shouldn't use this crate for any serious project yet :slight_smile: However, I'd like to publish it nevertheless to get feedback, hopefully in the form of PRs and issues :slight_smile:

There are no API docs yet, but this API walkthrough should hopefully give you a general idea of what is available.

Some interesting features:

  • Lossless parsing by construction: the library does not use a traditional "nested enums" approach to AST: documents are represented as concrete syntax tree, all comments and whitespace are explicit nodes in the tree.

  • Powerful error recovery: the parse function does not return a Result, any utf8-string is interpreted as a TOML document (which might be just a pile of syntax errors of course).

  • Lossless editing: it is possible to create documents with arbitrary whitespace and comments. The API gives full control over placement of elements: you can insert a new key-value into the particular position, with specific whitespace around. It is even possible to create invalid TOML documents. You can totally win your local "ugly TOML" contest with this library :slight_smile: It is possible to build a less-powerful API which guarantees validity of documents on top of the raw API, provided by the library.

  • A nifty AST representation allows to create a generic type-safe/type-directed visitor in less than 50 lines of code. Using such visitor is also more convenient than the traditional approach of implementing a particular visitor interface.

Some notable missing pieces:

  • No high-level API. You'll need to figure out yourself that the following are equivalent (yep, TOML now has dotted keys):

    foo.bar.baz = 1
    
    [foo]
    bar.baz = 1
    
    [foo.bar]
    baz = 1
    
  • No correctness guarantees: some valid toml documents might produce syntax errors, some invalid toml documents might be parsed as valid.

  • While I am pretty sure about the underlying data-structures and representation, the surface API could use some design work.

11 Likes

Two updates:

  1. A wasm demo of the parser.

  2. I've pushed some docs explaining the inner workings of the library and created a bunch of E-mentor issues. If you want to learn more about parsing and corner-cases of TOML, this might be a good opportunity. If you already know everything about TOML, take a look at the needs-design issues: these are the things I don't know how to do :slight_smile:

2 Likes

Whitespace is one important aspect.
How does this crate deal with entry orderings?
Specifically, If I were to read in a Cargo.toml file, manipulate that and then write it back, would e.g. the dependency list remain in order? I've never really understood why ordering is not a part of the official TOML format.

It preserves ordering :slight_smile: Moreover, when you edit the document, you have full control over ordering. Here's the test that inserts a key-value in various positions: https://github.com/matklad/tom/blob/5a1b98e0b0b13529a917a9c2212f6813a9eb1948/tests/suite/edit.rs#L37-L108.

More generally, the underlying data structure is lossless by construction, so tom does not simply preserve X, Y and Z but rather not loses anything in the first place.

1 Like