Choosing config file format

I have a project in which the CLI interface has grown to contain too many options. I want to replace most of these with a config file.

Many of these options specify physical quantities with dimensions (using uom). uom contains parsers for quantities with units, so sepcifying an option whose unit is Length on the command might look like:

--size '200 mm'

In 3D (option type (Length, Length, Length)) it gets a bit annoying with the commas tightly packed between quotes:

--box-dimensions '23 mm','2.5 cm','1 m'

I was thinking of using TOML for the config file format, but these unitful quantities would be quite a pain to represent.

Can you suggest some other serde-supported human-readable format that would be more appropriate for writing config files which contain physical quantities with units?

The major human readable ones are toml, json and yaml. Choosing something more niche will be trade-off due to users being unfamiliar with whatever that option is.

I have to ask though, is there anything much wrong with, e.g.:

size = "200 mm"
box-dimensions = ["3 mm", "2.5 cm", "1 m"]
1 Like

No, there's nothing much wrong with that. I seems that, in the course of exploring alternative formats and thinking in terms of their supported types and how they naturally map to Rust via Serde, I ended up not seeing the wood for the trees in TOML.

1 Like

While in general I agree with using common formats, I will make the case for Ron: it's succinct, relatively simple (compared to toml and yaml), and has very direct translation to Rust.

In this specific case, you would use whatever your representation in Rust is, for example (Mm(200), Yard(3.5)) or (Length(200, Mm), ...).

1 Like

Hmm. Interesting.

Unfortunately specifying 10 mm in uom (without writing helper functions) looks like this: Length::new::<millimeter>(10.0): There's no mm type, the type is Length which can be constructed from a whole gamut of different length units. I have a module with a bunch of conveniently-named functions like this:

pub fn mm(x: f32) -> Length { Length::new::<millimeter>(x) }

Would Ron allow the use of these functions (or some other convenience mechanism) in the config file itself?

uom has Serde support. But TOML has a very restricted set of types. Is there a way of having Serde automatically deserialize the TOML string "200 mm" into field of type uom::Length?

Using deserialize_with

#[derive(Deserialize, Debug)]
pub struct Config {
    #[serde(deserialize_with = "bbb")]
    pub aaa: Option<Time>,
}

and this function


fn bbb<'d, D>(deserializer: D) -> Result<Option<Time>, D::Error>
where
    D: Deserializer<'d>
{
    let s: &str = Deserialize::deserialize(deserializer)?;
    let r: Result<Time, _> = s.parse::<Time>().map_err(de::Error::custom);
    Some(r).transpose()
}

it sort of works, but

  • Before, if the aaa was missing from the config file, Serde gave it a None value. Now it gives a "missing field 'aaa'" error.
  • Is there a more direct, less boilerplaty way of getting Serde to use what is already there?

By the way — assuming this is being passed in a Unix shell, that text is completely equivalent to

--box-dimensions '23 mm,2.5 cm,1 m'

The Unix command line model is “list of strings”, and quotes are a shell feature that allows distinguishing whitespace inside an argument-string from whitespace that separates different arguments. Shells let you use multiple quotes inside a single argument but it is equivalent (other than for shell variable interpolation) to using just one set of quotes around the whole thing.

This doesn't answer your question but I thought I'd mention it since you said “gets a bit annoying with the commas tightly packed between quotes” but you don't have to use those quotes.

1 Like

In this case, you can deserialize to Option<&str> and then .map().transpose() it into the Option<Time>.

1 Like

For CLI utilities, consider using command-line argument as a config file format. That is, literally put

--size '200 mm'

into the config file.

You can take a look at how bat does that.

λ bat ~/.config/bat/config 
--plain
--theme GitHub

This approach has a major benefit that there's only single sytnax the user needs to know

2 Likes

Does this not run into the edge cases involved with shell parsing? I mean, shells can have slightly (or sometimes very) different ways of handling arguments so it's not quite a single syntax.

I think this should be the high-order bit in your decision process. And I'd go further and say "just stick with json or yaml". B/c sure, toml seems to be catching on a bit, but the other two are everywhere. Figure out how to coax (if necessary, writing code) serde to properly deserialize your data. Data can last a long time, longer than code.

+1 for Ron format. I've used it twice now because I had data structure that was not representable in Toml. Namely, toml doesn't play well with algabraic data types (enums with values).

Hmm, it seems to be working now, but I'd swear that it didn't work a few months back. I have switched to a different zsh configuration in the last few days, so maybe that's it ... or maybe I was simply mistaken earlier, though that would be surprising, as I found the extra quotes irksome and recall checking and double-checking that I couldn't get rid of them.

This was my original idea a way back, but since then the configuration has become so complex that I think it can be simplified by using a syntax that is more naturally suited to nested structures.

Indeed, this has given me pause, too.

Agreed, but in this case I think I want to put all the configuration in the config file, and completely remove the possibility to set any of it on the CLI, in order to have a single source of truth that is easily tracked.

You mean like this:

fn bbb<'d, D>(deserializer: D) -> Result<Option<Time>, D::Error>
where
    D: Deserializer<'d>
{
    let s: Option<&str> = Deserialize::deserialize(deserializer)?;
    s.map(str::parse::<Time>)
     .transpose()
     .map_err(de::Error::custom)
}

?

It still gives me the missing field error.

I've described a solution to this particular problem here.