How should I make regex compile only once?


#1

minimal (mostly) example: https://play.rust-lang.org/?gist=d5a5d6328923271b1ac4a324d6249813&version=stable&mode=debug&edition=2015

I have a regex I need to use in a loop (three lists of 200+ entries each), and that I may want to use in other functions all collected into a single mod.rs file. According to the Regex package it’s bad to redefine a regex for every instance in a loop. How should I be resolving this? Should I be creating it in a parent function then passing it as a parameter? Is there some equivalent of CONST for these sorts of “static” types? Should I retool my code to do the entire loop within a single function?

It is an anti-pattern to compile the same regular expression in a loop since compilation is typically expensive. (It takes anywhere from a few microseconds to a few milliseconds depending on the size of the regex.) Not only is compilation itself expensive, but this also prevents optimizations that reuse allocations internally to the matching engines.


#2

Put the regex into a lazy_static: example


#3

The lazy_static crate is the usual answer.


#4

I’m likely to use that regex in other places too, do you know if lazy_static will also carry between functions?


#5

lazy_static will only share uses from the same definition. You’d have to put it in a shared namespace to access from those multiple places.


#6

Would the top of main.rs work as a “shared namespace” for modules that have been broken out into separate xyz.rs files?

And thanks for all the help, I’m still learning how all these pieces fit together


#7

Yes, your submodules can access items from the top of main.rs, which would be the crate root. You can either import it with use MY_REGEX; (or I think use crate::MY_REGEX; in the upcoming 2018 edition), or refer to it with a full path like ::MY_REGEX


#8

Awesome, thanks for the help!