Yes. The point of the backwards scan is to find the start of a match. And no, you can’t just “update an index” in the general case because you don’t know when to do it. What I meant in my initial comment above about “something clever” is to figure out whether it’s possible to do something like that in a restricted set of cases, and if so, characterize said regexes and do the optimization. This is basically how all optimizations work in the regex crate: analyze and identify a special case (false positives aren’t allowed, while the false negative rate determines the effectiveness of the optimization), and then implement an optimization based on that analysis. Even the DFA itself fits into this framework.
RE2 works the same way. Intel’s Hyperscan also has special “start of match” logic. GNU grep’s DFA can’t find the starting position either, and instead defers to gnulib’s
re_search routine (how that works, I don’t know).
There is some research on tagged DFAs that might be interesting.