Configuration file vs library API

Abstract

I am creating a Rust application with a large amount of input parameters.
In the rust world, should I prefer a configuration file (with well-selected format) or a library API (easy with rust ecosystem) ?
Can you provide well established rust application examples ? (with one or the other way)

Intro

I have a very hard time choosing between configuration file and library API.
I tried many hits on research engines but didn’t find a post/question focused on my problem.
Here is a full overview of my journey with this problem.

Example

I am working on an application which needs a definition of materials (chemical elements), geometry (surfaces/volumes) and general parameters (numbers of threads, solver selection, ...). Let’s say an application which needs a large amount of input parameters (not all defined at this development phase -> ease of extensibility needed)

Passing all these arguments to the command line is not suitable due to the amount of parameters needed.

In fact, I did some tests with clap initially but when my number of input parameter and all possibilities grew exponentially my command line became a full dissertation ...

Two solution (that I found for now...) are available :

  • Configuration file
  • Library API

Configuration file

What I define as a configuration file is a simple file (human readable and human writable) containing all the parameters needed by my application. The format can be one of all commonly used formats:

  • JSON
  • TOML
  • YAML
  • XML
  • « key = value » format
  • ASCII (with custom syntax and custom parser)

Library API

My application can also provide a full API providing all convenient structures/methods/functions for the user to initialize my application parameters and launch the calculation/simulation/computational step.

Problem

Initially, using configuration file seems to be the most simple/convenient solution at first sight but my research on these formats leads to the following point:

As soon as my parameters/problems do not fit the selected format (key-value does not allow hierarchical format, JSON does not allow non-string map key, ...) defining all the parameters becomes a real challenge.

That’s why defining a custom (ASCII) format appears. This includes the full definition of grammar/syntax and a parser (:warning: maintainability). Adding parameters and new features can lead to a significant amount of changes (syntax + parser).

Introducing a new grammar and syntax led me to the direct use of a programming language: the rust language via a library API. This solution includes the need to create a library alongside my binary (relatively easy with cargo) but also the necessity to document all structures/enum/functions associated with this API. The main difficulty with this API is to make public the minimal amount of method needed and expose me to the risk of breaking change for each update.

Journey

This is what I tried so far and the difficulties I faced :

Simple key value format

I started this journey with a simple key value configuration file:

// snip
name = "example"
threads = 4
// snip

Pros:

  • really simple to parse
  • extensibility

Cons:

  • do not accept hierarchical structures (reason why I first used next step)

JSON TOML YAML XML ...

I tried some common formats for my parameters. They all support hierarchy and are human readable/writable (XML hit the limit for me for the human writable aspect).
They are widely supported and using them was pretty easy with all the libraries available in Rust.
At the beginning, I use them by writing my input file by hand. However, I realized I needed some recursion/references for some parameters afterwards. At this moment, I decided to use rust code and generate my structures and then serialize/deserialize them with serde.

At first this solution was fine but I hit the limits of these formats :

  • JSON: non string key for maps is not compliant JSON
  • XML: readability
  • TOML: hierarchy with tables introduced many sections and readability diminish with the amount of parameters introduced
  • YAML : indentation/non string key for maps format

The final limit for all these formats was the fact I actually needed some advanced syntax not available in these formats.

« I start from scratch » syndrome

I always get this phase in my development process. The actual thought is: « I didn’t find what I needed let’s try it by myself». I started to write my custom ASCII format and the parser for the syntax I just introduced. At first sight I was proud of my work but when I tried it, my format felt ... incoherent and error prone.

If my input was malformed, controlling the flow of the parser and providing well formatted and established error message is a tedious task. Eventually I felt overwhelmed by this solution (and I spent much more time on it that I would like to admit)

Library API

Eventually, I turn my internal code structures/enums/functions into a library (easy with cargo).

The main idea for the user is to download my library, add a dependency and create his own main function.

Download my library into "lib"

$ git clone ...

Create a binary application

$ cargo new app
$ cd app

Append a dependency on lib

# Cargo.toml

[dependencies]
lib = { path = "../lib" }

Create the main script

// main.rs

use lib::*;

fn main() {
   let config = lib::Config::new(...);
   let solver = lib::Solver::Variant;
   lib::run(solver, config)
}

All of these steps are easy in rust and very limited in other language (C, C++). Adding a dependency, compiling and linking can be a hard task in C/C++.

This approach includes following problems:

  • My internal structures were private and making them public expose me to breaking changes with my application
  • A full documentation is needed
  • The user must know the rust ecosystem

Question

In the rust world, should I prefer a configuration file (with well-selected format) or a library API (easy with rust ecosystem) ?

It would be really nice if you can provide some well established rust application examples.

Notes

  • I may have missed another solution, do not hesitate to lead me to other way I didn't explore (configuration file and library API can be a limited subset of available solutions).

Thanks

Using serde should work just fine. A JSON file doesn’t need to know that a key is an integer and not a string. The deserializer will know that by looking at how you’ve written the configuration API in Rust. In YAML everything is a string but it works because the programs know the type, even if the configuration format can only represent strings, pairs of strings and grouped pairs of strings. I would choose one to target and it will probably work with all of the languages you listed.

1 Like

Though I haven’t used it, this feels like the sort of problem serde is designed to solve. If you define a serializer / deserializer for your API objects, you get both a plain config file format for most users and a full extension ability for the power users.

1 Like

I agree with you serde was a good way ... initially.

I had to define the following struct with a map:

// rust code
#[derive(Serialize, Deserialize)]
struct Identifier {
    major: u32,
    minor: u32,
}

#[derive(Serialize, Deserialize)]
struct Struct {
    field1: i32,
    field2: String,
    map: HashMap<Identifier, f64>,
} 

fn main() {
    let variable = Struct{ ... }
    let serial = serde_json:::to_string(variable);
    println("{}", serial);
}

This just fails because a non string key is not compliant JSON:

{
  "field1": 123
  "field2": "string"
  <?????>: { ...
  }
}

YAML introduce complex syntax (? operator) specific to the non-string key (example from its specification):

? - Detroit Tigers
  - Chicago cubs
:
  - 2001-07-23

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
    2001-08-14 ]
}

TOML needs to readjust the order of my fields in my struct (can't be public if TOML needs specific order at each change)

XML readability/writability starts to decrease ...

However, all these problems can be solved (serde provides serialize_with/deserialize_with) but is it worth the amount of code needed to make it worked ?

This is the most powerful aspect of this solution.
Cargo is a good example of this with Cargo.toml.

However, creating my own syntax/format with serde (I spent a good amount of time on serde guide) is tedious and incurs a maintainability cost. Is it worth it ?

Only you can answer that question, and possibly only in retrospect— It requires things only you can know:

  • Who are your users and what are their skill levels?
  • How might the configuration schema change over time?
  • How much does the difficulty of configuration actually matter to your users?
  • Which solution will enable faster development / less maintenance for you, given your skillset?
1 Like

Thanks, I take into account your remarks.

This is what scares me the most what happens if my initial choice is a missed shot ?

Can you give me your opinion on the configuration file vs library API ?

Is it more convenient for the user to create a little script (library API) or a specific configuration file format ?

Hi,
why are you so scared about “failing”? From my experience I’d encourage you to fail early and often. Only this will give you the best chance to learn as much as possible. Trying to achieve the best solution with the very first shot and only theoretical experience is much more likely to fail in the long run.

My guiding principle in software engineering is: “Never fear to refactor!”

2 Likes

These are orthogonal solutions and you should absolutely do both:

  • Write a library that exposes a nice API for the core functionality
  • Write a utility that parses a configuration file and uses it to drive the library as a backend

If you write the first thing and it turns out your users are happy with that, maybe you don't even need to write the second thing. But even if you were sure you'd need both, you'd still be well advised to implement the core functionality as a library.

2 Likes

Refactoring my code does not scare me but introducing breaking changes afterwards does.
If my users rely on one way to communicate with my code introducing breaking changes can just destroy compatibility with previous computations/simulations.

After writing this post and your answer, I concur with the fact it is "orthogonal solutions".

I have a core library and I will make it public (pub, documentation and examples).

For configuration file, the rust community tends to use JSON (serde_json) and TOML (cargo). If these formats do not fit my needs is it fine to introduce a custom format ?

Personally I would try to avoid custom formats as much as possible. The Rust community does seem to lean toward TOML for configuration files and JSON for data interchange between applications. But that's by no means a hard and fast rule.

If these aren't enough then I think you should ask yourself if your configuration file is trying to do too much heavy lifting. I'm not going to lie, your mentioning of "advanced syntax" in configuration files scares me. But if you really do need something more powerful then I think it's much better to get something off the shelf then roll your own.

I'm not personally a fan of YAML or XML but they are both incredibly extensible. I'd prefer something like that over having to learn yet another configuration format (again, speaking personally).

Or perhaps what you're really after is a scripting language? Again, I'd highly recommend getting something off the shelf rather than designing and maintaining your own.

2 Likes

This is the kind of answer I expected.
To be honest I think introducing a custom format will just add boilerplate code.

By "advanced syntax" (maybe incorrect term), I mean syntax like this:

# I define some entities with an identifier
entity {
 id = A
 ...
}

entity {
 id = B
 ...
}

# snip

# Now I define a grid (can be triangular, rectangular, hexagonal, ...)
grid = {
          A
         B A
        B B A
       A B B A
        A B A
         A A
          B
}

This kind of syntax introduce a visual aspect to the problem and is adapted for user input but no common formats can handle it.

I thought about it too especially python/lua.
The main difficulty I have with this approach is that I just introduce a layer and fall into the same problem: Should I create a python API to launch my code directly (via some bindings) or should I generate a config file with it to launch my application afterwards ?

I think there's little to choose between configuration files and an API if your users never make mistakes. The problem is that there's no debugger for configuration files; there is for code implementing your API. Ask anyone who has tried to track down an error in a 500 line YAML file about the experience.

I believe you should only use a configuration file if your users are not programmers. If they can write programs but not in Rust, you can give them an API in their language of choice and have the program they write create the configuration file.

2 Likes