Why does Cargo use toml?

/@_@\

I could've sworn it was the other way around. I had even looked it up in the past. Hooray for memory!

Bright side: look at all the extra markup language insights in this thread now :smile:

1 Like

I always wondered why Cargo uses TOML instead of YAML, which is somewhat similar, more popular (and adopted) and with a settled spec, whereas TOML readme starts with a disclaimer about the changing spec.

This, plus after what @BurntSushi said, it makes more sense now. :grinning:

I think YAML is a terrible format, with a horrendous spec, and lots of weirdness.

TOML works quite well, actually, and its super easy to implement.

1 Like

Yaml spec makes XML look like JSON[1].

Plus, spec doesn't address security issues of loading binary parts into memory via Yaml.

[1]Lets put that feeling in numbers. Not counting special validation rules. Yaml grammar has approx 211 rules, XML has 81 and JSON has about 15 grammar rules.

2 Likes

The TOML readme has a small section comparing itself to JSON, YAML and INI:

In some ways TOML is very similar to JSON: simple, well-specified, and maps easily to ubiquitous data types. JSON is great for serializing data that will mostly be read and written by computer programs. Where TOML differs from JSON is its emphasis on being easy for humans to read and write. Comments are a good example: they serve no purpose when data is being sent from one program to another, but are very helpful in a configuration file that may be edited by hand.

The YAML format is oriented towards configuration files just like TOML. For many purposes, however, YAML is an overly complex solution. TOML aims for simplicity, a goal which is not apparent in the YAML specification: http://www.yaml.org/spec/1.2/spec.html

The INI format is also frequently used for configuration files. The format is not standardized, however, and usually does not handle more than one or two levels of nesting.

From: GitHub - toml-lang/toml: Tom's Obvious, Minimal Language

While I can not speak for the Rust authors, having used the different formats in my own projects (as well as JSON variants such as HCL) I believe they made the right choice.

1 Like

While non critisising the choice for TOML, I think the issues with YAML are exaggerated. Only a very small and obvious subset of YAML is needed for use cases like Cargo's. I think YAML would have worked out equally conveniently, but this is bikeshedding.

1 Like

I don't think they are. Especially security issues aren't emphasized enough. The biggest issue is that YAML attempts to be many things to many people. It's JSON, it's a serialization format, it's human readable, etc. I've been reading the mailing list for YAML for quite some time now.

To me YAML has become F-35 of the ML formats. That is to say overly complex, does many things and isn't really good at any particular thing. I mean, can YAML parser beat JSON parser at parsing JSON (I don't think it is possible, simply because YAML has more states)?

Does anyone really use YAML parser to parse JSON in the wild? I mean sure - it's technically possible. It's also possible to sear meat in a toaster, but you generally avoid it.

It's really a shame, since I think basic YAML idea is really good. Using .travis.yaml is pretty neat. The indented syntax is great.

But using YAML for Cargo would have been horrible. Especially, if parser ever support converting YAML into native data. I don't think there is any parser that supports YAML 1.2. Hell I'm not sure there is a fully compliant YAML parser for Rust?

In this regards, TOML is perfect. It's simple, it seems to be mostly text and it doesn't attempt to be everything to all people. It's also dead easy to implement.

1 Like

I meant the issues with YAML for this use case are exaggerated. Surely, Cargo's subset of YAML would have been simple and safe.

But if you're using a subset of YAML, you're not using YAML. You're using a custom, specific format, that's poorly specified.

At that point, you cause all of the problems that you see with, say, INI style formats. Lots of stuff has config files in some variant to INI format, but everyone has slightly different formats and semantics. Some accept spaces in section names, some treat section names separated by . hierarchically, some allow spaces in keys without quoting, some don't, some allow keys without values, others don't, etc, etc. So you can't just take an off the shelf library and parse an INI file, all of the INI parsers have to have various knobs that can be turned and overridden to parse or write out INI files that comply with everyone's different interpretation.

As soon as you start subsetting YAML, you run into the same problem. If you use a tool that assumes one YAML feature is present when writing, but your subset doesn't accept it, then you can't use that tool to produce YAML that Cargo can read.

TOML covers many of the same use cases, but is much simpler. JSON is also simpler, but not a good format for hand-written config files. TOML seems like a decent happy medium; well specified, not too complex, reasonably familiar to those who are familiar with INI files, has a data model that maps well to the kinds of things people generally want to express and parse, doesn't have arbitrary restrictions on nesting depth like INI files have.

6 Likes

I don't think that there are cases where a YAML configuration file caused compatibility or security issues. But I've never investigated this ... can you point me to an example?

YAML Sucks. Gems Sucks. Syck Sucks. | Blag (I remember this one from a few years ago; this is seen in practice)

JSON::XS - JSON serialising/deserialising, done correctly and fast - metacpan.org (another old good one)

GitHub - cblp/yaml-sucks: YAML sucks. (found this just now while trying to re-find the first one, looks good though)

2 Likes

These links contain general rants about YAML. But I was asking for a case where a YAML configuration file actually caused trouble.

Granted, I know that YAML is convoluted when looked at its full specs, and I realize that it is not simple to write a parser with 100% support. I see you point. But I think this is theoretical when it comes to Cargo's use case. I've used YAML configuration files extensively, and it really is simple and safe. And easy to document precisely. And people know it, and it has a plugin for your $EDITOR. And most inportantly -- you don't need yet another markup language (pun intended).

One of the worst days in my open source maintainer life: YAML f7u12 | Tenderlove Making

This link says that YAML should not be used for foreign data, which does not apply to configuration files. (As a slightly off-topic note, I think the concerns raised in this article do not apply to a non-dynamic language like Rust anyway.)

a) remote code instantiation is not a language property, but a system property. I'm pretty sure someone will come up with a system that does dynamic instantiation through labels, which can lead to such problems. This is also very much a library problem. (if yaml wouldn't do that lookup, everyone would be safe)
b) Cargo.toml files in Rust are foreign files, they are just loaded through cargo, which only makes things messier.

@bronger: The problem is YAML attempts to be too clever and allows instancing of native data structures from characters. To me that's a bigger problem than load converting strings into objects. Sure a statically typed language will make it harder for attacker. It won't make it impossible.

TOML makes it impossible, barring some idiocy [1] to have such cases, simply because it doesn't attempt to be clever. It's just a configuration format.

Anyway, I don't think this talk is going anywhere. The answer for why TOML was chosen was pretty much a given:

  • TOML was implemented first
  • TOML fits all current use cases
  • TOML seems to fit all future use cases

[1]Assuming that the library consumer doesn't decide to for example use native data structure from TOML strings, based on arcane rules.

1 Like

I don't have anything to contribute to this discussion, but I've been using a lot of toml lately and it's my favorate format for human readable data storage and transfer ever. I love how simple it is. Lacks the warts of both YAML and JSON. It's very tasteful.

4 Likes

This isn't the case. YAML has a concept of tags on nodes, which can be used for many things. Some languages have facilities for interpreting those as class names. Some made the bad choice of doing that by default in their libraries.

TOML makes that very much possible. You just put the class name somewhere else.

I'm not arguing against TOML here, but YAML gets too much flak for things it doesn't.

2 Likes