Should I use a custom DSL or just stick with YAML?

I'm working on a data processing project where you need to specify a bunch of pre- and post-processing stages so data can be sent to a Machine Learning model and we're wondering what the best way for representing this pipeline is.

This pipeline declaration ("Runefile") is then used to generate Rust code which is compiled to WebAssembly to be executed elsewhere (a "Rune").

The main concepts are:

  • Image - specifies the host functions available to the Rune
  • Processing Block - a Rust crate which contains an object which transforms some input P into some output Q (e.g. Tensor<f32> -> String)
  • Model - a TensorFlow model
  • Output - something which consumes data

At the moment we use a domain-specific language called a "Runefile".

FROM runicos/base

CAPABILITY<I16[16000]> audio SOUND --hz 16000 --sample-duration-ms 1000
PROC_BLOCK<I16[16000],I8[1960]> fft hotg-ai/rune#proc_blocks/fft

MODEL<I8[1960],I8[6]> model ./model_6dim.tflite

PROC_BLOCK<I8[6], UTF8> label hotg-ai/rune#proc_blocks/ohv_label --labels=silence,unknown,up,down,left,right

OUT serial

RUN audio fft model label serial

But we've been throwing around the idea of switching to YAML or letting the user compose a pipeline by executing some Lua script.

YAML Version
image: "runicos/base"

pipeline:
  audio:
    capability: SOUND
    outputs:
    - type: i16
      dimensions: [16000]
    args:
      hz: 16000

  fft:
    proc-block: "hotg-ai/rune#proc_blocks/fft"
    inputs:
    - audio
    outputs:
    - type: i8
      dimensions: [1960]

  model:
    model: "./model.tflite"
    inputs:
    - fft
    outputs:
    - type: i8
      dimensions: [6]

  label:
    proc-block: "hotg-ai/rune#proc_blocks/ohv_label"
    inputs:
    - model
    outputs:
    - type: utf8
    args:
      labels: ["silence", "unknown", "up", "down", "left", "right"]

  output:
    out: SERIAL
    inputs:
    - label

If you've ever needed to provide users with some sort of programmability, how have you chosen what syntax to use?

Some pros of having a custom DSL are that it can be a lot more concise than YAML (12 lines vs 40) and let you encode things which aren't normally representable in YAML without some creative coding (e.g. interpolated expressions or conditionals), but it also introduces a learning curve and suffers from the not-invented-here problem.

If you are interested, the corresponding issue contains some other concepts we were throwing around: