Aho-corasick 1.0 release is coming soon, review is welcome!

Feedback is welcome on the issue tracker: RFC: review API before 1.0 release · Issue #108 · BurntSushi/aho-corasick · GitHub

Even if you don't know what the heck "Aho-Corasick" means, I would still love to hear any feedback you might have.

2 Likes

I wrote this brief changelog summary in the issue thread, but I'll cross-post it here for visibility.


  • A new crate feature, std, has been added and is enabled by default. This makes the standard library optional and permits using aho-corasick in core/alloc-only contexts. But you do need alloc.
  • The top-level AhoCorasick will mostly behave the same as before. AhoCorasick::new is now fallible. The type is no longer generic. (In 0.7, you can choose the size of the state ID to use, but I did away with this and now just use u32 everywhere.)
  • APIs like AhoCorasick::find now accept an impl Into<Input>, where an Input provides a way to configure a search. There is an impl From<H: AsRef<[u8]>> for Input, so you can still call find with types like &str and &[u8]. But Input lets you set the bounds of the search, run an anchored search and so on. In 0.7, for example, in order to run an anchored search you had to configure it as a build-time parameter of the AhoCorasick automaton.
  • An AhoCorasick type can now be configured to support unanchored and anchored searches simultaneously using StartKind. It defaults to just unanchored searches.
  • The supports_overlapping and supports_stream routines were removed in favor of try_* search APIs. I could add the predicates back if needed, but they were principally useful to be able to guarantee that panics didn't happen when you called a search API with an unsupported configuration. But now there are fallible APIs for that.
  • There are three new sub-modules, automaton, dfa and nfa. These provide low level access to different types of Aho-Corasick automata, and are not expected to be used for most use cases. The key thing they unlock, by implementing the automaton::Automaton trait, is the ability to walk the automaton one state at a time. For example, you can now write your own search routines. This might unlock more interesting streaming use cases.
  • The packed submodule remains largely unchanged.

In terms of implementation, things largely remain the same with one very (IMO) awesome addition: there is a new type of Aho-Corasick automaton called a "contiguous NFA." The old NFA has been rebraned as a non-contiguous NFA. (And the old DFA is still called a DFA.) The contiguous NFA uses a lot less memory and is typically quite a bit faster than the non-contiguous NFA. It represents a very nice middle point between the slower old NFA and the piggish DFA. Indeed, most AhoCorasick configurations in the 1.0 release will use this new contiguous NFA.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.