Human-readable format for test data

What serialization would you use for a set of test data files? I.e. there is a long list of test cases in files that humans need to edit.

I'll probably use TOML. But everybody is saying TOML is a configuration format rather than a data format. Is there an alternative for data?

I find that distinction superficial and frankly quite useless.

If TOML works for you, feel free to use it for test data. After all, it's the configuration for the tests, isn't it?

7 Likes

TOML isn't great for data structure nested more than a little; it's at its best when it's storing ini-like structure where each heading is meaningful on its own. You can write more nested data in TOML somewhat elegantly, but if you're putting a multiline table inside an array, TOML isn't particularly obvious anymore, and you'd probably be better off with a format that makes nested structure easier to work with.

(The "next" release of the toml spec is currently slated to allow inline braced tables to contain line breaks, which makes expressing weird nesting more straightforward to achieve, although it remains contentious. If that lands and gets supported, I won't really see any common cases to avoid using TOML anymore.)

If you want a common standard format to use and TOML nesting causes problems, just use JSON. Or even use YAML; "typed YAML" with serde defuses the biggest YAML footguns (e.g. no being a bool when you expect string). If you don't mind something a bit more adhoc, RON is a fine choice.

4 Likes

YAML seems pretty nice for this, expect the serde-yaml crate is deprecated.

But my use case works fine with TOML, so I'm planning to just go with that.

What I want to know is: Since the world has been full of markup/configuration/human readable data formats since forever, with XML and JSON being famous examples, why did anyone have the urge to create TOML and why did Rust have the urge to use this obscure thing? Why did we need YAFML?

This footgun had been disarmed since the first non-draft version of the YAML spec, 1.2, released in 2009 (!).

YAML now defines a "tag schema" system, and three predefined schemas, failsafe (which is essentially a noop, all scalars are strings), JSON (accepts exactly JSON values, eg true, 1.4, as their values), and core, which adds some other number formats like hex, and nan/inf values. You can, of course, define whatever schema you like.

Seems like this persists mostly because PyYAML (among others) still, over a decade later, only supports 1.1, and pypi's search is so terrible you can't find any better libraries. (YAML.org suggests ruamel.yaml)

1 Like

JSON is a terrible format for anything humans are going to edit and review, XML is terrible at everything, INI is great for simple flat information, but has no specification and it's continuously extended way past what it supports until it's actually just a custom format. YAML is the least terrible popular option, but it's overly complex for both users and implementers (even when used properly, see my above comment!)

Tom therefore said, "hold my beer" and invented a format that combines all the worst features of the above!

The closest to a nice format I've seen is KDL, which I'd really like for config if it didn't require quotes for values, among a few other minor nits.

3 Likes

Because it's much nicer for human-edited config than the alternatives.

Nobody wants to read, much less write, XML by hand. JSON is OK but for simple things, you pay a price of verbosity in exchange for the uniform syntax (e.g., every string is quoted, every map must have curly braces, etc.)

2 Likes

I've used RON for this exact use case and it was pretty good.
The good thing is that it matches closely to the Rust type system so it's pretty straightforward to use.

1 Like

Can we have less language-bashing, please?

4 Likes

TOML is (basically) "INI files", which have been used since before XML and JSON were a thing. For all their faults, INI files are pretty obvious. I think TOML is a reasonable neutral/middle-ground format to use as a de facto standard within the ecosystem.

Shout-out to the figment crate for making it easy to support toml, yaml, json, environment variables all at the same time with very little fuss.

2 Likes

I use TOML for data that is intended to be human-writable, and RON for data that is intended to be only machine-writable.

There's a big fat caveat on what it means to be intended for human-writability. But informally, it means that the expectation is that a human commonly wants to micro-manage the data. RON is more appealing for cases where you just want to set-it-and-forget it. [1]


  1. RON supports niceties for humans, like comments and a pretty-printer. It's still primarily a text-based serialization format! ↩︎

That is quite a strong opinion.

I, too, have a strong opinion: JSON is the best format for human machine interaction in most cases I have found.

I have used a lot of CSV, XML, INI, and JSON. JSON is best unless the data has constant, short records in which case I would choose CSV

I assume you are referring to

Respect that people have differences of opinion and that every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer.

Sure, but I'm also allowed to have a negative opinion of a language so long as I'm not being a dick about it, like insulting developers for choosing it, right? Having opinions about languages is a god-given right that all developers must have, surely.

It was hopefully pretty obviously in jest anyway, especially given it's a reply to someone asking why it was created.

(In actuality, my opinion is that TOML is fine, especially if you have decent editor support, but surely there are better options if you're making a new thing)

I, for one, find it unwelcoming when people use hyperbolic, dismissive language without caveat, like "combines all the worst features of…". Yes, that could be a technical claim, but to me it sounds like “ah, if I invent something to unusual tastes and share it here, it may be not just critiqued but outright insulted”. And if it's joking, it's the “can't you take a joke?” kind of joke.

But I wouldn't have said anything if it weren't also for the remark about XML — I meant to address the trend, not just your message as sufficient by itself.

Plenty of people have, in fact, chosen to write XML (myself included), and it has useful features that no other well-known, widely-implemented data format does (which I am not going to mention in detail, because that's not the point). The Rust community should be welcoming to everyone, and that includes people who have reason to write XML, or write programs that process XML.

Are we allowed to make a distinction between people and ideas?
It's one thing to be welcoming a person, quite a different thing to welcome to a particular idea they have.

1 Like

Sorry, no. What I wrote is basically objectively true.

Also, this was literally the question.

As I see it, questions can be answered with politely delivered facts and personal opinions, instead of hyperbole. (But I will not comment any further on this topic, to not derail it any more than I already have.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.