Rust in data science issue

TomMonkeyMan · July 19, 2023, 6:07am

Hi friends, I'm now working in both software development and data engineering. Recently, I've been contemplating the use of Rust for data science tasks, similar to what we typically do with Python pandas/numpy.

I've come across several tutorials and blog posts that discuss various Rust modules that could be helpful. However, I still have some concerns regarding whether using Rust for data science is considered good practice and if there is a growing trend towards adopting data works in Rust.

could you pls give me some thoughts with these concerns?

H2CO3 · July 19, 2023, 6:31am

Don't worry about that. Does it affect your results whether there's a "growing trend", after all?

If Rust fits your use case, then use it. For instance, if you want high-performance processing of well-formed data, or if you want to deploy a machine learning model in production without dependencies, Rust might be a good fit.

If on the other hand you think Rust won't be good for achieving your goal (eg. you value convenience over reliability), then don't use it. Simple as that.

vague · July 19, 2023, 6:35am

I can't speak for others. But if your task is data mining/visualization/analysis etc, you're choosing an ecosystem/tooling.

Rust is immature in data science for now.
Python or r-lang is more appropriate, but it all depends on your usecases.

TomMonkeyMan · July 19, 2023, 9:01am

thanks! make sense to me.

TomMonkeyMan · July 19, 2023, 9:02am

yup. I think the only reason why I'd choose Rust is its performance...

kevinmcfarlane · July 19, 2023, 3:52pm

Being a newbie to Rust at this point I would imagine that Rust might be somewhat lacking especially in the ecosystem of libraries.

If performance is your main motivation for looking beyond Python (at least until Mojo is released) then I would suggest taking a look at the underrated F# for Data Science. As F# has been around longer, I imagine it is more mature for this, although I've never used it myself for such purposes.

Also F# intersects with Rust to an extent. Immutability by default, pattern matching, implicit return, non-null, some similarity in syntax.

humphreylee · July 20, 2023, 1:12am

Almost same shoe as you, have been attempted Rust for data massaging and visualisation for quite a while. Before this, I did Julia & R mostly for data manipulation, curve fitting optimisation and visualisation (plotting). Nothing yet on machine learning. Sharing what I have encountered so far.

Rust language itself requires quite an effort to learn it.
In term of ecosystem, some of the common toolings - ndarray or nalgebra, polars, argmin{} and plotters are the common crates that I have used. These are not mature yet (< ver 1.0), but usable for simple/ most cases. However, what I find most difficult is to find working examples. So, got to read the documentation again and again, testing provided code examples, testing codes with many println!().
There are many creates that provide more or less similar functions, e.g ndarray, nalgebra, polars. They are different but yet similar. None is dominant at the moment, with some being supported by some crates and some by others -> fragmentation. It can be quite a challenging choice to make.

I think I will continue to push myself into Rust. Along these hard paths, I re-learned some of the difficult algebra and calculus, eg Jacobian and Hessian.

MintX · July 20, 2023, 2:26am

I don't think Rust is going to fully take off as the data language, since using graphs in rust is deliberately designed to be a miserable experience. But working with arrays is fine for the most part.

quinedot · July 20, 2023, 3:59am

No, the teams didn't sit down and brainstorm ways to make using graphs or other pointer-heavy data structures miserable. It's fallout from the approach that was taken to have synchronization-free mutation with memory safety and without data races and without garbage collection or some other runtime (i.e. removing aliasing for non-shared mutability types like &mut).

H2CO3 · July 20, 2023, 6:09am

That's a weird take. "Mature" is not a synonym for "1.0". As of today, ndarray has 7 million downloads, 88 released versions, and it has been around for 7 years. It is used by 707 crates, of which the top 22 have more than 100'000 downloads. If something has been released 7 years ago, under continuous development, and the better part of the ecosystem depends on it, then what's not mature about it?

They are totally different. ndarray is for representing arrays/tensors generically. It doesn't provide linear algebra routines. On the other hand, nalgebra is for doing linear algebra; its Matrix type only exists to have something to work on, but far less advanced in terms of array manipulation (e.g., it's strictly 1 or 2-dimensional and it only supports column major order). Polars is a DataFrame implementation, that's designed to work with potentially non-numeric and/or heterogeneous data (unlike the other two).

This doesn't really cause any more fragmentation than languages people traditionally associate with data analytics contain. I didn't see you complain about Pandas' DataFrame and NumPy's array causing fragmentation, for example. Isn't that a plain double standard?

NanoDijkstra · July 21, 2023, 3:57pm

What's miserable about using petgraph? It seems fine.

Topic		Replies	Views
Rust and Data Science // First impressions from an outsider	21	8289	July 3, 2018
Rust for "data first" problems?	20	3065	April 30, 2021
Which is easy to learn between Rust or Python? help	22	4013	May 1, 2024
Rust in scientific research (calculations)	4	2519	October 14, 2021
Can you recommend me to follow community	3	361	December 11, 2023

Rust in data science issue

Related topics