Should I use a custom DSL or just stick with YAML?

Michael-F-Bryan · May 2, 2021, 4:17pm

I'm working on a data processing project where you need to specify a bunch of pre- and post-processing stages so data can be sent to a Machine Learning model and we're wondering what the best way for representing this pipeline is.

This pipeline declaration ("Runefile") is then used to generate Rust code which is compiled to WebAssembly to be executed elsewhere (a "Rune").

The main concepts are:

Image - specifies the host functions available to the Rune
Processing Block - a Rust crate which contains an object which transforms some input P into some output Q (e.g. Tensor<f32> -> String)
Model - a TensorFlow model
Output - something which consumes data

At the moment we use a domain-specific language called a "Runefile".

FROM runicos/base

CAPABILITY<I16[16000]> audio SOUND --hz 16000 --sample-duration-ms 1000
PROC_BLOCK<I16[16000],I8[1960]> fft hotg-ai/rune#proc_blocks/fft

MODEL<I8[1960],I8[6]> model ./model_6dim.tflite

PROC_BLOCK<I8[6], UTF8> label hotg-ai/rune#proc_blocks/ohv_label --labels=silence,unknown,up,down,left,right

OUT serial

RUN audio fft model label serial

But we've been throwing around the idea of switching to YAML or letting the user compose a pipeline by executing some Lua script.

YAML Version

image: "runicos/base"

pipeline:
  audio:
    capability: SOUND
    outputs:
    - type: i16
      dimensions: [16000]
    args:
      hz: 16000

  fft:
    proc-block: "hotg-ai/rune#proc_blocks/fft"
    inputs:
    - audio
    outputs:
    - type: i8
      dimensions: [1960]

  model:
    model: "./model.tflite"
    inputs:
    - fft
    outputs:
    - type: i8
      dimensions: [6]

  label:
    proc-block: "hotg-ai/rune#proc_blocks/ohv_label"
    inputs:
    - model
    outputs:
    - type: utf8
    args:
      labels: ["silence", "unknown", "up", "down", "left", "right"]

  output:
    out: SERIAL
    inputs:
    - label

If you've ever needed to provide users with some sort of programmability, how have you chosen what syntax to use?

Some pros of having a custom DSL are that it can be a lot more concise than YAML (12 lines vs 40) and let you encode things which aren't normally representable in YAML without some creative coding (e.g. interpolated expressions or conditionals), but it also introduces a learning curve and suffers from the not-invented-here problem.

If you are interested, the corresponding issue contains some other concepts we were throwing around:

https://github.com/hotg-ai/rune/issues/135

system · July 31, 2021, 4:18pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Meet Ryan: elegant data format which is also kinda programming language announcements	3	470	May 13, 2023
Rust with Media pipe?	3	592	December 30, 2023
Tensorflow/Rust and using pretrained model for prediction help	3	1610	July 10, 2021
Porting Python's AI to Rust's AI	14	2991	April 20, 2019
GPU programming in rust help	10	1587	February 10, 2023

Should I use a custom DSL or just stick with YAML?

Related Topics