What's an idiomatic way to check a file’s format upon startup?

I want to load configuration for my project from a file and have the option to use both JSON and YAML formats. I would like to get feedback and suggestions on a way to get the extension of path and use match to determine the file format and initialize the program accordingly.

A Startup::from_file method works like this, loading the contents of json file into a struct:

impl Startup {
    pub fn from_file(path: impl AsRef<Path>) -> Result<Self, StartupError> {
        Self::from_file_setup(serde_json::from_str::<StartupInputs>(
                &std::fs::read_to_string(path.as_ref())?
        )?)

You can replace serde_json with serde_yaml but I can't see a smooth way to get the extension, check it's lowercase (if that's important?) and then match it.
The code below is what I have so far, which compiles:

impl Startup {
    pub fn from_file(path: impl AsRef<Path>) -> Result<Self, StartupError> {
        match path
            .as_ref()
            .extension()
            .ok_or(StartupError::ExtensionNotReadable)?
            .to_str()
        {
            Some("yaml") => Self::from_file_model(serde_yaml::from_str::<StartupInputs>(
                &std::fs::read_to_string(path.as_ref())?,
            )?),
            Some("json") => Self::from_file_model(serde_json::from_str::<StartupInputs>(
                &std::fs::read_to_string(path.as_ref())?,
            )?),
            Some(&_) => Err(StartupError::ExtensionInvalid),
            None => Err(StartupError::ExtensionNotReadable),
        }
    }

Edit:
If the path is “something.json” and I try to parse a json file through serde_yaml::read_to_string that doesn’t compile and vice versa. Assuming sometimes I’m going to get yaml files and sometimes json files, I want either to work upon startup.


Edit 2:
Thanks to everyone who responded. This is what I settled on:

impl Startup {
    fn from_json(file: &str) -> Result<Self, StartupError> {
        match serde_json::from_str::<StartupInputs>(file) {
            Err(source) => Err(StartupError::InvalidJson { source }),
            Ok(model) => Self::from_model(model),
        }
    }

    fn from_yaml(file: &str) -> Result<Self, StartupError> {
        match serde_yaml::from_str::<StartupInputs>(file) {
            Err(source) => Err(StartupError::InvalidYaml { source }),
            Ok(model) => Self::from_model(model),
        }
    }

    pub fn from_file(path: impl AsRef<Path>) -> Result<Self, StartupError> {
        let path = path.as_ref();
        let file: String = std::fs::read_to_string(&path)?;
        match path.extension().and_then(|s| s.to_str()) {
            Some("json") => Self::from_json(&file),
            _ => Self::from_yaml(&file),
        }
    }

Not an immediate answer to your problem, but YAML is a superset of JSON, so you can just always parse the config as YAML and it will work identically.

11 Likes

Is serde_yaml compatible with yaml 1.2 or 1.1 ?

Checking the docs, it says 1.2.

1 Like

This was my immediate thought as well. Parse as yaml and there will never be any issues of ambiguity.

Blockquote
YAML is a superset of JSON

Today I learned...

3 Likes

Almost, but not quite. See JSON is not a YAML subset

5 Likes

Actually, it is a superset of JSON. According to YAML 1.2 specification:

A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.

Note that I'm aware that some implementations (like the one provided by Ruby) do not comply with specification here. That said, this is not a problem when using serde_yaml.

Also, I'm not sure what "The Norway Problem" has to do with anything. JSON doesn't allow unquoted strings so this should never occur for valid JSON documents.

4 Likes

So what exactly is your question? Your second code snippet is more or less the way I would write it myself, with a few minor differences. I would bind let path = path.as_ref(); to avoid redundant conversions, and I would use from_reader instead of from_str with read_to_string, to avoid redundant allocations and extra indirection.

1 Like

As well as the other notes, I'd use more variables in general,, merge the handling of StartupError::ExtensionNotReadable, return or ? out of the error cases in the match, so the parse result can be the match expression value, and I'd move the common self construction to the end, so something like:

impl Startup {
    pub fn from_file(path: impl AsRef<Path>) -> Result<Self, StartupError> {
        let path = path.as_ref();
        let extension = path
            .extension()
            .and_then(|s| s.to_str()),
            .ok_or(StartupError::ExtensionNotReadable)?;

        let model = match extension {
            "yaml" => serde_yaml::from_reader::<StartupInputs>(
                std::fs::File::open(path)?,
            )?,
            "json" => serde_json::from_reader::<StartupInputs>(
                std::fs::File::open(path)?,
            )?,
            _ => return Err(StartupError::ExtensionInvalid),
        };

        Self::from_file_model(model)
    }
2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.