ML Dataset Training acceleration tips, tricks

Anyone using pure Rust in there ML, AI Dataset Model Training?

“One example of using Rust to speed up model training is demonstrated by CrowdStrike. They replaced the Python feature extraction code with Rust, resulting in a decrease in total training time from 227 hours to 162 hours . This approach leverages the Rust compiled library to accelerate feature extraction operations, while still allowing for easy integration into existing Python code.“
Crowdsource Blog

Hmm.. In my, admittedly limited, experience of migrating calculation heavy code from Python to a compiled language I would have expected far greater speed up. Is there still a lot of Python involved in the training process?

I'm just speculating here (because the blog post doesn't give enough information to do anything else), but I think it is more the fact that they reimplemented a part of their pipeline in Rust that just isn't more significant to the overall wall-clock time (not trying to say that getting rid of ~60 hours by optimizing your preprocessing isn't impressive, because it absolutely is). I don't think the rest of their pipeline is slow because it is written in Python (to me it looks like the rest of the pipeline runs a lot of C++, given they use Tensorflow), it is just computationally more expensive than what they rewrote in Rust. Without knowing more about the model size, what they do to create their text embeddings during preprocessing (all we know is they perform some sort of "transformations") and how long their preprocessing actually takes (they only break down the total wall-clock time into the average time it takes per training epoch, which is very much meaningless in the context of analysing the speed-up of their preprocessing, which happens before any training epoch even begins[1]), it's hard to deduce anything more precise.

  1. I presume. If not (i.e. because their whole training set does not fit into memory), the easiest optimization would be split the pipeline into two parts, (I) where you do the preprocessing, saving the embeddings to disk, and (II), where you perform the actual training, using the embeddings from your hard drive. That way you perform preprocessing only once, not once per training epoch. ↩︎


This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.