We currently work on a new product in our company which basically does some real time analyses on IoT sensor streams based on rulesets and some other data structures.
<random rust praise>
As a systems programming guy who worked with C/C++ yet, I'm totally happy with my decision to switch to Rust. Sure, sometimes I'm fighting the borrowchecker hard, but in months of development I never faced any crash which is absolutely fascinating. Especially since the functionality is pretty big by now. </random rust praise>
Anyway, to my question:
For that project, we want to provide a possibilty for our customers to provide us with standard ML Models (Regression Models, Random Forests, XGBoosted Trees) which are then taking part in the real time analysis.
I'm more kind of systems programming guy, so I don't have a big overview of what's going on in the ML world. I know all of our customers want to use Python in some kind of way (Scikit, XGBoosted).
So my main task is somehow integrating ML code (lets just guess python) with rust. That's why I ask for some general guidance here.
I currently see many possible ways:
make use of PyO3 to integrate the python interpreter with Rust and call a script which is doing the prediction. Reading this it seems to me that PyPy is not usable as a interpreter being called from Rust, right?
make use of RustPython to precompile the prediction code and run it from Rust directly. The only thing I'm not sure about is if it works with libraries such as XGBoost which are based on C++ and are just integrated over ffi in python, or if it just works with pure python libs.
run an external service with PyPy and provide a HTTP/gRPC interface to be called from the Rust service wihch then calls the prediction function of the Python ML models.
Again, all I search for here is a general architectural guidance on how I can integrate ML prediction functions(not the trainings etc.)with rust with very low latency. Maybe some of you already have experience here and can recommend me of some of those ways, or actually do the exact opposite and say "PLS DON'T DO X" .
Yes, that was actually bad wording. As far as I understand it, it will interpret the code into some kind of abstract internal tree and then run it, not compile it into native rust code. I was referring here to their basic example where they call vm.compile() which is likely interpreting instead of compiling like rustc/clang/gcc
In this case, I don't see what RustPython would bring to the table over using PyO3. PyO3's documentation seems to be more complete and up-to-date (the example you cited uses the rustpython_vm crate, of which all versions have apparently been yanked), so it will likely be easier to get started with.
What works for you will totally depend on the "let's just guess python" bit.
Sometimes, you can be given a raw model (e.g. foo.tflite) and load it with the Rust bindings to that framework (e.g. the tflite or tensorflow crates). That's easiest because it's just a library and because Rust libraries tend to be fairly well designed and tested on multiple platforms, most of the time it Just Works.
If you are given a Python script which loads the model and does pre/post processing then things get a bit more difficult. Now you need to worry about installing the script's dependencies (not trivial for Python, especially if you are running multiple scripts on the same machine due to version differences) and "somehow" communicate between your Rust process and the script.
Using something like Py03 to run the script using the CPython interpreter is a pretty good solution because you avoid the latency and serialization costs of inter-process communication (e.g. HTTP), and calling into Python is synchronous (simplifies error handling and execution logic). You also have access to CPython's internals, so you can do things like read a property and convert/view it as a Rust type, or pass a Rust object to the script to inject extra functionality.
Regardless of which concrete technology you use (gRPC, Py03, smoke signals, etc.) it'll be important to come up with a protocol that is general and ergonomic enough to support most customer use cases, while also having decent performance... You don't want to be copying massive tensors every time something moves from Python to Rust, so maybe use numpy arrays as your lingua franca and the numpy crate to let your Rust manipulate the Python object itself instead of copying it into your own Rust-specific tensor type.
And this is indeed the case most of the time. Data cleaning and feature extraction is probably the most important step in most ML use cases. Accordingly, there is usually nontrivial logic that transforms input from the external world to features understood by the model.
Thanks for your reply!
As the industry works sometimes, in the last days the priorities have shifted 180 degrees and I can postpone this topic a little. Nevertheless I'm currently planning to implement different ways for different models. If there is a crate or easily integretable C(++)-lib like for XGBoostedTrees, we would use a native implementation.
If it's only written in python, like scikit, I would probably do both,
try an external service since we already have external services with gRPC and HTTP2 implemented
try to make PyO3 work since the scope would be pretty clear
And then do some benchmarks on both.
As mentioned, the PyO3 scope is actually pretty controllable, since we only give the customer the opportunity to give us a trained model such as a scikit learn random forest exported to a file with joblib or pickle. Then, we would only load that file and do a model.predict(<somevalues>). The only thing which would be important is to not load the model at every predict call, but load it once and then pass the python object to the .predict() call. But that's trivial.
Yup, compared to running Python in an external service with CPython, I'm pretty sure that PyO3 would be much faster. The external service running with PyPy and gRPC/HTTP2 on persistent connections might maybe be faster, even it networking puts latency on top, since with our test a .predict() call on a typical modelwould take 20-22ms in CPython and be "10x faster" in PyPy (all according to my colleague, not tested myself). So 4ms + networking could be faster than PyO3.