I need help deciding about sharing my "School Grade" Machine Learning Library for Ideathon in my organization

Hey beautiful Rustaceans, thanks for all the hard work you put through to make this community growing.

I started learning Rust seriously 3/4 months ago for my personal projects. I was parallelly learning about Machine Learning, Deep Learning, Time Series Analysis for work.

So, this idea strike me to write a pure Rust Machine Learning Library. I started implementing 2D Matrix operations - addition, subtraction, hadamard product and matrix multiplication. Everything is using "Naive" algorithms, no fancy algorithms implemented in the library, yet.

Those worked, then I wrote a simple gradient descent algorithm. That also worked. I made up some random data set and used the data set on my program. That gave me a very close approximation of my linear factors.

Then I showed it to my team. The team is asking me to put this in Innovative Ideathon showcase.

It feels like a joke to me but they are serious about it. What should I do?

1 Like

We're in the middle of an "AI" hype bubble, so everything vaguely related may ride that wave.

If your library doesn't have any novel solutions, or outstanding performance, and you don't want to or don't see a path to giving it a unique selling point, then it's probably a bad idea to try to commercialize it. Such basic algorithms, although very useful and could be an element of valuable products, are themselves a commodity.

If they are serious about it, ask them to write a problem statement for the ideathon. You might see how quickly the enthusiasm fades. Or, assuming they actually start thinking about a problem statement, they will probably come to the same conclusion you have already reached (i.e., it’s not worth it), or something interesting emerges where your skills can be put to good use (best case).

Thanks for this suggestion. However, upon their press, I had to run my program against a set of 100 * 4 vs 4 * 1 matrices for 1000 iterations.

To my surprise, my Release Optimized rust program took ~22 ms while numpy took ~46 ms. I could not believe my eyes and hence trying to find where numpy surpasses my program in terms of performance. If this becomes the case that my simple program surpasses numpy for every case upto 1000 * 1000 matrices, I will definitely submit the idea.

To my surprise, my Release Optimized rust program took ~22 ms while numpy took ~46 ms.

Your example matrices are very small and so if your loops in the numpy version are written in Python, all the time in the numpy code is spent on overhead (ie. "everything other than the actual math"). Here the relevant definition of "small" is "does it fit in the L1 cache? L2 cache? L3 cache?". As you increase the size of the inputs so they no longer fit entirely in the cache, the benefits of numpy's more sophisticated algorithms should show. Start with "square" inputs (M=N=K) to see the effect clearly.

Implementing algorithms like this is a great way to learn about computer architecture, and see the effects of eg. CPU caches for yourself. I suggest that should be the main goal.

2 Likes

Yes, I am doing that only. I am planning to do a bunch of matrix multiplication of different shapes and sizes to understand where my code breaks and where I can improve on.