I'm working on a data processing project where you need to specify a bunch of pre- and post-processing stages so data can be sent to a Machine Learning model and we're wondering what the best way for representing this pipeline is.
This pipeline declaration ("Runefile") is then used to generate Rust code which is compiled to WebAssembly to be executed elsewhere (a "Rune").
The main concepts are:
- Image - specifies the host functions available to the Rune
- Processing Block - a Rust crate which contains an object which transforms some input
P
into some outputQ
(e.g.Tensor<f32> -> String
) - Model - a TensorFlow model
- Output - something which consumes data
At the moment we use a domain-specific language called a "Runefile".
FROM runicos/base
CAPABILITY<I16[16000]> audio SOUND --hz 16000 --sample-duration-ms 1000
PROC_BLOCK<I16[16000],I8[1960]> fft hotg-ai/rune#proc_blocks/fft
MODEL<I8[1960],I8[6]> model ./model_6dim.tflite
PROC_BLOCK<I8[6], UTF8> label hotg-ai/rune#proc_blocks/ohv_label --labels=silence,unknown,up,down,left,right
OUT serial
RUN audio fft model label serial
But we've been throwing around the idea of switching to YAML or letting the user compose a pipeline by executing some Lua script.
YAML Version
image: "runicos/base"
pipeline:
audio:
capability: SOUND
outputs:
- type: i16
dimensions: [16000]
args:
hz: 16000
fft:
proc-block: "hotg-ai/rune#proc_blocks/fft"
inputs:
- audio
outputs:
- type: i8
dimensions: [1960]
model:
model: "./model.tflite"
inputs:
- fft
outputs:
- type: i8
dimensions: [6]
label:
proc-block: "hotg-ai/rune#proc_blocks/ohv_label"
inputs:
- model
outputs:
- type: utf8
args:
labels: ["silence", "unknown", "up", "down", "left", "right"]
output:
out: SERIAL
inputs:
- label
If you've ever needed to provide users with some sort of programmability, how have you chosen what syntax to use?
Some pros of having a custom DSL are that it can be a lot more concise than YAML (12 lines vs 40) and let you encode things which aren't normally representable in YAML without some creative coding (e.g. interpolated expressions or conditionals), but it also introduces a learning curve and suffers from the not-invented-here problem.
If you are interested, the corresponding issue contains some other concepts we were throwing around: