I'm working on a data processing project where you need to specify a bunch of pre- and post-processing stages so data can be sent to a Machine Learning model and we're wondering what the best way for representing this pipeline is.
This pipeline declaration ("Runefile") is then used to generate Rust code which is compiled to WebAssembly to be executed elsewhere (a "Rune").
The main concepts are:
- Image - specifies the host functions available to the Rune
- Processing Block - a Rust crate which contains an object which transforms some input
Pinto some output
Tensor<f32> -> String)
- Model - a TensorFlow model
- Output - something which consumes data
At the moment we use a domain-specific language called a "Runefile".
FROM runicos/base CAPABILITY<I16> audio SOUND --hz 16000 --sample-duration-ms 1000 PROC_BLOCK<I16,I8> fft hotg-ai/rune#proc_blocks/fft MODEL<I8,I8> model ./model_6dim.tflite PROC_BLOCK<I8, UTF8> label hotg-ai/rune#proc_blocks/ohv_label --labels=silence,unknown,up,down,left,right OUT serial RUN audio fft model label serial
But we've been throwing around the idea of switching to YAML or letting the user compose a pipeline by executing some Lua script.
image: "runicos/base" pipeline: audio: capability: SOUND outputs: - type: i16 dimensions:  args: hz: 16000 fft: proc-block: "hotg-ai/rune#proc_blocks/fft" inputs: - audio outputs: - type: i8 dimensions:  model: model: "./model.tflite" inputs: - fft outputs: - type: i8 dimensions:  label: proc-block: "hotg-ai/rune#proc_blocks/ohv_label" inputs: - model outputs: - type: utf8 args: labels: ["silence", "unknown", "up", "down", "left", "right"] output: out: SERIAL inputs: - label
If you've ever needed to provide users with some sort of programmability, how have you chosen what syntax to use?
Some pros of having a custom DSL are that it can be a lot more concise than YAML (12 lines vs 40) and let you encode things which aren't normally representable in YAML without some creative coding (e.g. interpolated expressions or conditionals), but it also introduces a learning curve and suffers from the not-invented-here problem.
If you are interested, the corresponding issue contains some other concepts we were throwing around: