Is it time to do another push on the Language Shootout?


#1

Now that 1.27 is live, with stable SIMD support, I was wondering if it was time to do another push on the language shootout game for Rust.

Looking at some of the places where Rust is slower than C, there are a few, like n-body, where it looks like the C code is doing SIMD directly but the Rust isn’t.

Has anyone tried to push these further recently?


#2

To save others time, I did try pushing on regex a few weeks ago. Enabling SIMD helps a little, but it isn’t enough. From what my memory tells me, the issue is that PCRE has lower match overhead, and this particular benchmark contains a lot of matches, so the overhead adds up. Fixing this probably requires something clever in the regex engine, but I don’t know what it is yet. (The core issue is that when the DFA finds a match, it has to rescan it backwards to find the beginning of the match. If I hack at the code and artificially remove that second scan—costing correctness—then the benchmarks even out.)

So… I’d say don’t waste time on the regex benchmark unless you want to really dig in.


#3

I’m afraid I still don’t quite grasp why this reverse scan is required. Is it just finding the starting index of the match? If so, is that somehow faster or more correct than just updating an index when the DFA is in a “possible start state” position? Or is that not even possible?


#4

Interesting to see that Fortran is able to run even faster on n-body.

The C code only uses SSE instructions (which can be inferred from the command line args), so I assume that ifort (Intel’s Fortran compiler) is using wider vectors somewhere.


#5

It may also help that Fortran was designed with auto-vectorization in mind. For example function array inputs are assumed not to alias by default, and the order of floating-point operations of equivalent priority is undefined.


#6

Yes. The point of the backwards scan is to find the start of a match. And no, you can’t just “update an index” in the general case because you don’t know when to do it. What I meant in my initial comment above about “something clever” is to figure out whether it’s possible to do something like that in a restricted set of cases, and if so, characterize said regexes and do the optimization. This is basically how all optimizations work in the regex crate: analyze and identify a special case (false positives aren’t allowed, while the false negative rate determines the effectiveness of the optimization), and then implement an optimization based on that analysis. Even the DFA itself fits into this framework.

RE2 works the same way. Intel’s Hyperscan also has special “start of match” logic. GNU grep’s DFA can’t find the starting position either, and instead defers to gnulib’s re_search routine (how that works, I don’t know).

There is some research on tagged DFAs that might be interesting.


#7

Okay, that makes sense. I had misremembered a previous conversation with you about that part of the implementation, then; I had thought you’d said that you didn’t know of any other production Regex engine with the backwards-search logic.