Which is easy to learn between Rust or Python?

Hello, @everyone!

I want to be a Data scientist, please help me to know which programming language should I learn Rust or Python and which programming language is easy to learn.

Thanks in advance!

If you don't know any programming yet, definetly pick up python. It is way easier and the ecosystem and tutorials for datascience and learning python are vast. Also, from personal experience, I can say it is a fun first language to learn.

8 Likes

Thank you @SebastianJL, Please let me know where should I learn Python. How much time will it take to learn?

You have two questions there:

  1. Is it easier to learn Rust or Python?

I'm pretty sure that if you have never programmed before Python would be easier to get started with. Especially since there are so many beginners tutorials and such out there.

If you have programming experience and don't know Python it will take little time to get into but Rust is far more interesting.

  1. What language to learn to get into Data Science.

In the data science world they use languages like Python, R, Julia, Scala and SQL. I suspect there are some pioneers trying Rust in that world but as you are a beginner that is likely not for you. And again pretty much all tutorials and course will be using Python or whatever and not Rust.

My understanding is that data scientists tend to have advanced degree in statistics, data science, computer science or mathematics. Often masters degrees. I suspect that once you have those qualifications under your belt you will not need to ask what language to use.

Meanwhile a google search for "beginners python tutorial" will turn up thousands of interesting starter materials. Apparently a beginner can pick up Python in 1 hour:

That's not long so you have time to look into Rust as well afterwards:
https://doc.rust-lang.org/book/

2 Likes

:+1:
Regardless of the job, if possible, I do coding interviews in Python. I too really like Python.

This was the last site I used. I don't know how appropriate it is for a Data Scientist or for someone new to Python but the exercises are entertaining...

https://www.codingame.com/start/

1 Like

I learned in a university course. But I had a book called something like: "Python for kids and other beginners" written by a dad and his pre teenage son. It was great because it didn't assume any computer knowledge besides starting an application or downloading and installing a program. I liked that it took time to explain things that other tutorials viewed as given. and if something was already clear to me I just skipped that part.

PS I tried to search the book online but I couldn't find it. If you want I might be able to find a link when I'm back home from holidays.

2 Likes

Sure, share the link with me. Thanks @SebastianJL

"python book dad son" google search yields link for me straight away :slight_smile: https://www.amazon.com/Hello-World-Computer-Programming-Kids/dp/161729702X

2 Likes

Yeah that's it. Thank you very much!

Thanks @RustyJoeM

You definitely need to learn Python, as it is the lingua franca of data science so to speak. Then you may or may not learn Rust.

But anyway, from the other answers, it looks like you're going for Python. :slightly_smiling_face:

In regard to data science, what is your existing background? Do you know a bit already or would you be starting from scratch? If the latter you will need at least some basic high school math to start with.

Hi @digitechroshni,

From experience, I can say with a 100% confidence that Python is much easier to learn. And in my understanding, Python is a language of choice for data works too. From my reading, it has some very mature and stable third party libraries around data works.

I did have a look at some charting libraries. I just can't believe how easy it is to generate pie charts, bar charts etc. using those libraries.

Good luck and have fun.

Best regards,

...behai.

While learning python is never a mistake, especially if you're doing data science, there's a few footnotes that might be handy to keep in mind as you're getting deeper in.

As you're doing data science, you'll probably be using something called "Jupyter notebook", which is actually pretty cool but for some reason doesn't really exist outside of the python ecosystem. Enjoy it while you can, I guess :person_shrugging:

Python is really slow (I'm simplifying things here, imagine about twenty asterisks), so when you're dealing with large amounts of data you pretty much have to use packages (code other people have published to be used by anyone, also known as libraries) like numpy or pandas, which means you need to essentially relearn how to do basic stuff like add numbers before you can get to the fancy stuff they're really great at. Not a huge deal, but...

Making that worse is the way Python deals with using packages has a lot of weird historical baggage and it's not great even when it all works out well. This is, unfortunately, pretty common among language ecosystems, but Rust is one of the newer language ecosystems pushing this forwards, so hopefully this does get better eventually.

As you get into larger and more complex programs, moving into the 10,000 lines of code range in my experience, having a simple, easy to use language stops being as useful, and having better support and guard-rails like types, testing, editor support etc start being much more useful, as the whole program stops fitting in your head and you start spending a lot more time trying to find where bugs are coming from.

Mostly the message here is that some bits will be really confusing and suck, but don't worry, that's normal.

2 Likes

I would say if you don't know how to program, don't worry about a programming language to start. The best first thing to do is language-agnostic: learn the concepts of programming! And I honestly recommend a simple environment, like Dr. Racket or something, where there is no syntax to think of or installation pain or editor setup and you can just play around, have fun, and program without constraints. And, most importantly, learn to think, to express your thoughts in code.

Once you learn how to think programmatically, then it makes sense to ask what tool to use to express your thoughts. And then, for data science, Python is probably best.

2 Likes

Hmm... I have never used Python enough to get really familiar with it. So I find myself wasting three working days trying to get PyTorch to be even be usable on the Nvidia Jetson AX Orin box that landed on my desk on Monday. It seems impossible to find a combination of this that the other that work together well enough to do anything. Starting from the correct version of Python to use. So just now Python is way down on my list of easy to learn languages.

I know, I have a unique and odd situation perhaps. I'm sorry, I'm grumpy about it.

The Case for Data Science in Rust

The question of which language is easier to learn has seen active discussion. As a mature and established flagship, Python offers battle-tested and feature-rich data science libraries with diverse and detailed training resources. Keep in mind that easier in this context regards getting started. Trivial operations in Rust can be surprisingly verbose. Strings, for instance, turn out to be quite particular, when trusty old quote marks seem to do fine for Python. As for easier to master, not so fast! I might argue that the more complicated the project, the better off I am starting it in Rust.

The question of which language should you learn is the more complex. Asking this kind of question on the online user forum does not solicit a knee-jerk "use Rust, of course!"; in the community such comments almost seem like a faux pas. Instead, you are apt to read: What do your colleagues use? What is your use case? Hard to say for sure, isn't it? Instead, I will offer some insight into why I use Rust in my day job, and how I manage to be productive and useful to my employers as a practicing data scientist.

Improved Sanitization

Importing data into a project environment where you can inspect and analyze the contents is simpler and easier in Python, but the data does not come into the project in a sanitary condition. Assumptions that I want to make about years being integers and values not being missing may be violated, and thus sanitization is often a step that must precede analysis. If I forget to check for a certain condition during this step, I will likely only find out later when the program starts spitting out nonsense for results, if I ever notice at all.

Strong typing in Rust forces me to deal with sanitization at the outer boundary of the program, when I initially import the data with serde. Values that violate the strict type assigned to the field throw an error, allowing me to identify and inspect suspect values immediately and set those observations aside if needed. I run into a lot fewer surprises during the analysis stage, and can move forward with more confidence.

First-Class Analysis Tools

Number crunching is the bread and butter and data science, and Rust offers first-class tooling for this purpose. The vast majority of analytic tasks can be easily tackled with the methods and types in the standard library. Vectors, HashMaps, the iterator methods available for these types, like map and fold, will suffice for most numeric data. For more complex datasets, the ndarray crate organizes data into n-dimensional arrays and the nalgebra crate provides linear algebra methods. The uom crate is a must-have for unit conversion, especially here in the states where our legal descriptions still list lengths in chains and rods, and the journey to embrace metric has miles to go. The rayon crate makes parallel computing easy, and I reach for it often to leverage the extra cores on my machine.

While Rust is more complex to learn than Python, large and complex projects can be easier to implement in Rust. A good example are transport models for gravels in rivers and streams. These models are a toxic stew of hydrologic physical models, hypothetical field models and approximations of convenience, where pressures are divided by length and scaled by grain size into abstract ratios, and they are surprisingly easy to get wrong. The strictness of the compiler, excellent linter support, as well as testing and benchmarking integration (with criterion), provide a superior developer environment for tackling flow equations that traditionally have warranted a whole team from the Army Corps of Engineers.

For categorical data, pattern matching on enums is idiomatic and ergonomic. Creating a newtype around a String is usually sufficient to leverage the type-checking abilities of the compiler to help reinforce invariants in your data model. Enums and match are available in Python, but I found that the idioms of Rust did not translate easily into their Python counterparts, and this may be a case of trying to force a round peg into a square hole. For dates and times specifically, the type safety of Rust can make importing this kind of data a real headache, but the confidence and ease working with these types once they are deserialized is a strong motivator.

Generalizations About Linear Modeling

It took me a few tries to settle on the linregress crate for linear modeling, after experimenting with linfa and even rusty-machine. The linregress crate was easy for me to figure out, and I only use it for one thing: to plot OLS fit lines on graphs. In school they put a big emphasis on generalized linear modeling, mixed models, Bayesian analysis, and Python certainly offers some excellent packages for these types of analysis. I do not know the analogues in Rust because I have have no need: 95% of the real-world problems I encounter call for summary statistics. A basic regression line suffices in the remaining 5%. Reaching for a fancy model can be a tactical error for the fledgling data scientist. A triple-spin kick may look impressive in a Hollywood film, but a straight punch is much more useful in a real fight.

In some industries, like the social sciences, generalized linear modeling is still all the rage. Increasingly machine learning is overtaking the market in predictive modeling, and again I cannot speak to the Rust ecosystem here. The areas where I would like to apply machine learning (like using drones to assess the extent of riparian canopy cover, or debris-flow risk on hillslopes based on observations of landslide scars) are not well-funded in my locale. I spend a lot more time wrestling with data quality issues for the limited range of metrics that we currently collect than I do analyzing and visualizing this data, and this is often the limiting factor is the potential complexity of analysis that I can bring to bear on a problem.

Reconciling different physical address databases between government agencies is an example of the type of data that calls for simple analysis, but ends up being a complex task because of data quality issues. People write "Lane" instead of "Way" or "St" instead of "Rd", omit the "SW", misspell the road name, omit the unit identifier, enter the wrong zip code, etc... What is essentially as simple as a test for equality becomes an epic journey to map heterogeneous descriptions to specific enum variants for the particular address components in question.

In Rust, I am able to compare roughly 180 million addresses per second, which is sufficient to compare all the addresses in the City against all the addresses in the County in about seven seconds. Run as an iterative process, this tool enables us to reconcile differences in addresses across agencies, preventing confusion regarding their validity, and helping to avoid potential delays in emergency response times resulting from bad address information.

Visualization

The plotters crate has served me admirably for plotting and visualization, although I note that the poloto crate has recently overtaken it in popularity. Every plotting library I have used, from "easy" languages like Python and R, regardless of syntax, I want to fiddle with the appearance, from the background color to the tick-marks. Invariably the calling code for the graphing function looks like the dinner ticket for a guest with multiple allergies, with a bunch of substitutions and special instructions. Achieving a basic level of minimum proficiency in more difficult in Rust, but producing high-quality graphics in any language requires a comparable level of skill, in my experience.

Collaboration

In my current field, yearly reporting requirements often mean that I end up running the same analysis periodically on a fresh set of data. Automating our analysis and reporting obligations is a significant part of my job. Being the wizard in the tower who can cast these incantations is a nice gig, but it is far better to share this magic with staff and allow them produce reports by themselves. The clap crate makes it easy to create a command line interface, and theoretically this can enable staff to use your program, but if your staff are anything like mine, the command line is a scary place where danger lurks at every press of the return key. If you want staff to enthusiastically embrace your tools, consider adding a rudimentary user interface using egui, or one of the competing GUI frameworks, such as iced or dioxus. The GUI ecosystem in Rust is admittedly immature, but is more than robust enough to produce the simple interfaces necessary to make reports from csv files. A few fields for specifying input and output file paths, along with a file picker and a submit button is usually sufficient.

When I compare this to our collaboration process in Python, we are restricted to running scripts in a virtual environment, using .env files to set our personal profiles for the script. I do not know the equivalent tools in Python for producing a CLI or GUI, nor am I arguing that Python is somehow inadequate to the task of data science. Rust being my language of choice, and already having some degree of comfort with it, I have found it eminently suited to the data science needs in my day job, and relatively easy to extend and share with the non-technical colleagues in my department.

Maintainability

Survive long enough in this industry and you will open mystery code that is months or years old, written in a cryptic dialect obfuscated by layers of ill-advised abstraction, wondering what possessed yourself to write such unhelpful doc comments. Picking up old Rust projects is relatively painless, because the compiler is there to reinforce the integrity of the base code when adding new functionality. Applying the same intelligence and algorithmic reasoning to my Python projects, albeit with less language experience, reopening old Python projects has been anxiety-inducing on multiple occasions. The introduction of minor new features has caused regressions up and down the stack and unexpected knockoff effects. Although a Rust project may be more effort to stand up, I would much rather maintain a complex Rust project than an equivalent project in Python. Some of this anxiety I expect to dissipate as I work longer with Python, but my preference for using a strongly-typed language backed by a borrow-checker, the reasons I gravitated toward Rust in the first place, inform my preferences for how I prefer to conduct science.

Conclusion

Having already taken the trouble to learn Rust for research, I find that with Rust I am more confident and productive than when I am using Python. Keep in mind that my preference for using Rust, for the standard elevator pitch reasons, colors this experience, as well as my comparative unfamiliarity with Python. After several happy experiences using the language, is it a subjective or objective decision to reach for it as my tool of choice?

6 Likes

You can use Jupyter Notebooks with F#, which you can interpret as a strongly typed performant Python. But, of course, it has a learning curve too, though not as steep as Rust's.

However, F#, being older, is more mature for data science tasks than Rust.

1 Like

Are you going to be a member of some institution / college / school that will be training / teaching / certify you as a data scientist? Your instructor will probably require you to use specific tools in your beginning instruction. In the long run you will probably need to learn how to use more than one tool.
So learn both until your professors tell you what you need for their courses.

Evcxr is a rust kernel for jupyter notebook. It is going to be slower than the python kernel for code snippets that execute quickly though due to compiling and linking taking some time in rust.

3 Likes

Damnit you're just reminding me I need to get that cranelift pr landed