[Modular: Community Spotlight: Outperforming Rust ⚙️ DNA sequence parsing benchmarks by 50% with Mojo 🔥](https://Mojo benchmark versus Rust, 50% faster)
Please don't shoot the messenger!
I received this via my Mojo mailing list today, passing it on for information, comment, tuning opportunities, etc.
PS not sure how to categorize this post, hence putting it in the Uncategozied section.
Admins: feel free to move this post to another forum section (Announcements? Community?).
Please let me know if I should avoid posting these kinds of articles in future.
I thought it may be of interest to Rust fans, such as myself.
Is this referring to these results?
The Rust and Mojo times look about the same to me.
Yes, and the result page I posted was a comparison to NeedleTail, I got the link from the article you posted. Do the numbers look like Rust is slower to you? Maybe I'm misreading something.
I wish I knew, I have no way to tell since he does not post the complete Rust code..... I have no idea where 50% Mojo gain over Rust comes from. The link you posted looks broadly the same betwee Mojo and Rust and even there, I dont know how well his Rust solution is coded there either.....
There is no reason to avoid it. But whenever posting a claim that X is faster than Y I suggest at least trying to find the benchmark results that show the difference. There is nothing to discuss if it is just a claim with no data to back it up.
Agreed! The BLOG author claims 50% but provides no easily accessible information to verify his claims.
Yeah, I was surprised also. Performance comparisons are really difficult to do in a fair way with data to support it. It takes a ton of time, and even then there are usually mistakes. I can't second guess what the author was thinking, but I can sympathize with how difficult it is. Maybe they'll post here with more info.
It looks like they might've forgot to run in release mode, judging by the numbers. modified: README.md · MoSafi2/MojoFastTrim@530bffa · GitHub
It also doesn't really seem like needletail is written for maximum performance.
I see, looks like you're right.
You may want to put a comment on Github asking if all the Rust benchmark performance runs were all done in a Release mode.
It is a typical Rust rookie error.
I have to admit I forgot to do it myself a few times a couple of years ago when my friend and I benchmarked C#.Net against Rust..... Embarassing! LOL
[GitHub - MoSafi2/MojoFastTrim](https://Setup Benchmark)
FWIW according to the above link in the text the author says Rust build and run is done in Release mode but that is document text.
They added that when they reduced the numbers, so I'm guessing that's when they realized. Would be nice if the article was fixed though.
Can someone enlighten me on what "DNA parsing" is? Is this just measuring dynamic programming performance? I.e. how well the compiler compiles programs that can be expressed as
nested for loops
that marches along some input arrays,
while writing entries to a matrix / tensor
It would also be good to know how much of a typical workload is constituted by "DNA parsing".
I saw that post - they get a little carried away with their benchmarks - it's going to bite them.
In the blog post he mentioned SIMD - that might be another difference, if mojo used SIMD and the rust tool didn't.
A big thank you to those Rust folks who replied on Github (links are elsewhere above for those interested in details!!). There is more there, worth a read, IMHO.
I am not familiar with Mojo but looking at it's code, it does not look any easier to read/understand than an idiomatic and performant Rust solution.
I certainly would like to see more GPU support added in Rust to compete in AI/ML area, something Mojo targets.
It looks like they might've forgot to run in release mode
The blog post used
--release for it's Rust numbers. The confusion comes from the 50% performance win being specific to running on an M2 mac. On an x86_64 Linux machine, the results are more or less equivalent.
My assumption is that the Rust compiler isn't producing Simd instructions. I remember reading that by default the Rust compiler will not produce Simd for ARM targets. You have to enable them by either enabling the correct cpu feature or targeting the native cpu. Maybe it should be smarter by enabling features based on the OS as well. All macOS arm cpus would have Simd support. But maybe it already does this and it is something else.
From my experiance all benchmarks done to compare two different programming language are uselless.
Speed very much depends at many different things.
- How well code was written in language A and language B. If code in A language was written better, than code in B. A will be faster, but it will not have any relationsheep to language, but to programers code.
- Have language A been optimized for particular use case by language authors? If language authors optimised and added code to do some work (like DNA sequence parsing) and you only need minimal code to run it and try same code on language B, it is natural language B will be slower. Because here is a lot of code behide language A who do most work. If you wrote same code in language B or used some library who did it for language B, in this case language B may be faster. Because in the end, computer runs not language you used, but machine code this language was compiled to.
In the end it does not matter which language is faster. Important point how well you as programer understand language and if you can write fast code using it.
If you try to write code in language you don't understand and algorithm you don't understand in language (lets say C) end result may be slower than code writen in Python.
That statement gives me headache. I know what you mean, at the bottom of the pile we have only the machine and the operations it supports.
But in some way for a language that is compiled to byte codes or other wise interpreted, the machine is not running instructions that implement our program, it's running instruction to interpret the source of our program or emulate whatever architecture the byte codes it is compiled to describe.
Interpreters aside, if a language assumes the operation of a garbage collector running in the background (as it were) then it is necessarily having to run a lot of code (the garbage collector) that does not appear in my program.
All that aside a language may mandate things like accessing arrays out of bounds is not allowed. Thus introducing bounds checks that do not appear in my algorithm. Other languages may not have such restrictions.
Putting all this together it seems to me that the syntax and semantics of languages does influence the performance that can be achieve. All things being equal, like having programmers who understand what they are doing and implementing the same algorithm etc.
I get the feeling that Mojo has gone out of it's way to make parallel execution over arrays much easier (SIMD) and supporting it in the very language itself (rather than some library functions that ultimately depend on using assembler or such like). I don't know.