Help me reduce overhead of regex matching

So I think we have these options:

  1. Keep the API the same as today with minor breaking changes and an unavoidable performance hit.
  2. Duplicate the API as it is today so that there is a Sync type and a non-Sync type, with minor breaking changes. (e.g., @pixel's idea or something isomorphic)
  3. Drop Sync on Regex and force callers to cheaply clone Regex to use across multiple threads, with somewhat major breaking changes. (lazy_static! no longer works.)
  4. Expose a Cache trait and make Regex polymorphic over it with a default type parameter, with minor breaking changes.
  5. Revamp the API in terms of a single trait with multiple implementations, which will break all uses of regex today, but breakages should be trivial to fix.

"minor breaking changes" implies small things that I want to fix regardless of what we do here. Only niche uses of regex should be impacted (e.g., folks who implement the Replacer trait explicitly).

"major breaking changes" implies many or all uses of regex break.

I feel like there is no clearly correct choice here.