Aho-corasick 0.7.0 released (rewrite, simpler, more stuff)

#1

aho-corasick is one of my older crates, and it was much overdo for some attention. I ended up rewriting it from scratch, using a lot of lessons I’ve learned since I initially wrote aho-corasick, and in particular from working on regex-automata.

Docs have been improved with more examples: https://docs.rs/aho-corasick

The new features are:

  • Support for leftmost-first and leftmost-longest matches.
  • Better stream search support, including the ability to do a search & replace on a stream with ~constant heap space.
  • Case insensitive support (ASCII only for now).
  • A lot more knobs for controlling the size of the Aho-Corasick automaton.
  • Some workload-specific performance improvements, mostly in terms of reducing space usage (e.g., by adding a byte-class map to reduce alphabet size). This can turn into a decent search time performance benefit if it means better use of your CPU’s cache.
23 Likes

#2

Awesome work, as always! The new leftmost-first and leftmost-longest support is really cool. The explanation of the new match semantics in the docs is really clear.

1 Like

#3

Great update!

1 Like