Parse toml array of tables while preserving order

Hello,

I would like to parse the following toml while preserving the order.

[[custom]]
message = "Test"
command = "Echo test"

[banner]

[[custom]]
message = "Test 2"
command = "Echo test 2"

By preserving the order, I mean I want the output to be:

  • custom
  • global
  • custom

I want to parse this into Vec<Box<dyn Component>> where custom and global can both be parsed into the trait Component.

I tried this, but map.next_entry::<CustomElement>() gives me the entire array, not just the next element.

impl<'de> Deserialize<'de> for Config {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        struct ConfigVisitor;

        impl<'de> Visitor<'de> for ConfigVisitor {
            type Value = Config;

            fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
                formatter.write_str("struct Config")
            }

            fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
            where
                A: serde::de::MapAccess<'de>,
            {
                let mut result = Config {
                    components: vec![],
                    global: GlobalConfig::default(),
                };

                while let Some(key) = map.next_key()? {
                    match key {
                        Fields::Global => {
                            result.global = map.next_value()?;
                        }
                        Fields::Banner => {
                            result
                                .components
                                .push(Box::new(map.next_value::<Banner>()?));
                        }
                        Fields::CustomElement => {
                            println!("{:?}", map.next_entry::<CustomElement>()?);
                        }
                    }
                }
                Ok(result)
            }
        }

        deserializer.deserialize_map(ConfigVisitor)
    }
}

After a lot of searching and trying different crates, I was not able to find any way to do this.

The reason I want to preserve the order is that the order of the elements in the configuration controls the order in which the elements are printed.

The project and issue for context: Add the ability to write custom messages or commands. · Issue #37 · rust-motd/rust-motd · GitHub

What you want is not compatible with serde's data model, so anything involving serde is not going to work. It would probably be easier to change the format than to make a parser that can work with this very weird requirement.

1 Like

Hi @jplatte Thanks for your answer. Serde is not a requirement. What data format do you suggest? Is this requirement so weird?

Is this requirement so weird?

In the TOML data model, your document is a table with two keys, banner and custom. You're asking for parts of those two key-value pairs to be interleaved with each other, but in a semantically significant way.

I can’t cite a particular statement in the TOML specification but I’m pretty sure you’re trying to attach meaning to a capability that is intended to exist for the author's convenience, not be part of the semantics of the document.

4 Likes

Maybe you could use toml_edit? But I agree, this is only useful for human reading purposes and shouldn't be used to change what a program does.

Are there any proposals for an elegant way to control the order of the elements from the configuration file? I'm currently adding the custom element, which can appear multiple times, but the order of the current ~20 elements is controlled in the same way. Changing the configuration format would be a breaking change, which I want to avoid.

You cannot practically do this without any breaking change or a custom parser.

Personally, I think that key-value-focused TOML is the wrong file format entirely for this job. A better one might be XML. XML may have a bad reputation today because of how much it was used for everything poorly during its period of popularity, but the point of XML (and its predecessors) is to be a markup language — an ordered sequence of text with annotations on portions of that text. A minimal XML document for your situation could be something like

<?xml version="1.1"?>
<components>
    <custom command="Echo test">Test</custom>
    <banner/>
    <custom command="Echo test 2">Test 2</custom>
</components>

or depending on what your components actually mean (Does every command necessarily have one associated message? Which one is more important?), perhaps this would be more fitting:

<?xml version="1.1"?>
<components>
    <message>Test</custom>
    <command>Echo test</command>
    <banner/>
    <message>Test 2</custom>
    <command>Echo test 2</command>
</components>

In any case, the elements in an XML document are explicitly intended to be ordered, and each have a name that means what kind of thing they are, not the key in a key-value structure, so it would be a much better fit for the semantics you want.

4 Likes

Personally I like KDL

custom "Test" {
  command "Echo Test"
}

banner

...

Is one way you could map your current model. There are serde parsers, but it's sufficiently funky a format it's nicer to use a specific library like Knuffel

There's also YAML and boring old JSON, plenty of options.

You could still use TOML if you're ok to be a bit uglier, I think this works?

[[items]]
[items.custom]
message = "Test"
...
[[items]]
[items.banner]

[[items]]
[item.custom]
...

There's other ways, eg a type property for each, but it's fundamentally pretty ugly due to how TOML works with it's top level flat structure.

2 Likes

I don't want to make a breaking change to the configuration file.

Wonderful example of an XY problem yet again.

If you simply need to extend your base .toml file with a few "custom messages or commands", you're much better off "reserving" the top-level keys for your base configuration (as defined here?) while treating every other [[table]] you encounter as another possible command.

See the playground here.

use std::error::Error;
use toml::{self, Table, Value};

fn main() -> Result<(), Box<dyn Error>> {
    // no `serde` - just the plain `toml` crate
    let mut config: Table = EXAMPLE.parse()?;
    // look up + remove immediately
    for reserved in RESERVED_KEYS {
        let ref key = reserved.to_string();
        let Some(table) = config.remove(key) else { continue };
        // match on [global] or [weather] or [docker] or [etc]
        match key.as_str() {
            "global" => println!("global: {table:#?}"),
            "weather" => println!("is not too bad"),
            "docker" => println!("doesn't ship"),
            _ => {}
        }
    }
    // look for commands in what's left
    for (name, value) in config {
        let Some(Value::String(cmd)) = value
            .get("command") else { continue };
        let message = value.get("message")
            .and_then(Value::as_str)
            .unwrap_or(&name);
        let color = value.get("color")
            .and_then(Value::as_str)
            .unwrap_or("system");
        println!("command: `{cmd}`; \
            message: `{message}`; \
            color: {color}")
    }
    Ok(())
}

Though for the sake of your own sanity in the future with regards to any possible extensions to your initial "standard" it does make sense to reserve a [custom] key specifically for this.

Usage-wise
[global]
left = "left"
right = "right"

[custom.command_1]
command = "echo hi"
message = "just saying hello"

[custom.command_2]
command = "echo bye"
1 Like

Thank you very much for your answer. It looks like that could work well. I'll give it a shot.

I don't see how it's an XY problem. The requirement was always to parse a new [custom] element without introducing a breaking change to the configuration. There are many responses suggesting different configuration languages (responses answering how to do Z).

1 Like

That's not what the full context of the original issue was about, though:

If the goal is to add custom messages or commands, there are many ways to go about it without trying to parse the .toml in a way its data model / specification was never designed for.

To your credit, it doesn't look there's been much of an attempt to understand the X (#37) you were actually trying to solve for with Y (parsing several [custom] tags inside the .toml as if they were separate entries) before telling you all the reason why it can't be done or recommending an alternative Z (XML/KDL/YAML/JSON). Guess that does make it a full-fledged XYZ case.

Fair enough, especially looking at only the title of the post, but I included plenty of context including a link to the project.