Blog: Counting Newlines Really Fast


#1

Wherein we get nerd-sniped and end up counting a lot of newline characters in very few instructions.


#2

I’m so jealous of people who understand what happens when you shift and multiply vectors of 1s and 0s enough to produce efficient algorithms with them :smiley:


#3

PS.: Note that this operation is likely not responsible for much of the runtime of either xi or ripgrep, but it was fun to pull off anyway.

At least for ripgrep, you’d be quite surprised. If you ask for line numbers on a large file, you can wind up more than doubling the time it takes to search.

So, really, I can’t wait to dig in and merge your PR. It will totally have an affect on some of the subtitle benchmarks, for example.

(This blog post was really helpful, because I totally wasn’t following your algorithm development otherwise. Thanks!)


#4

Can any of this work speed up std lines? (i.e. https://doc.rust-lang.org/std/primitive.str.html#method.lines or https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines)


#5

No, or, well, it’s already using memchr.

This specific routine is only for counting lines. It is in fact based on @bluss’s memchr, and exploits the fact that it only needs to count lines, it doesn’t actually need to find lines. (Which is required for iterating over lines.)

It’s kind of a niche use case, but when it comes up, it’s helpful to be fast.