I want a regular expression API that is compliant as much as possible to JavaScript's RegExp, implementing std::pattern::Pattern, with a few allowances:
\u{...} without needing to specify u or v (unicodeSets) flags
regex does miss features currently which can be added, but I'm not sure regex is implemented in a compliant way. I've used regex a little, including for a replace with a callback receiving captures.
I got interested in regress, but I discovered it was the author's first "Rust" project and I also saw it doesn't implement Pattern. With Pattern, it could be used with str's native methods...
AFAIK regex uses different syntax for backreferences too. I just want to expose a regular expression API in a framework that is the same as JS (compatible, I mean).
Err, I guess I'll end up using regex really and wrap it into my type. I think it doesn't matter after all to have JS compatibility in my framework at all
So first of all, regress is certainly not ridiculousfish's first project. I have no idea where you got that from. ridiculousfish is a credible and trustworthy open source contributor.
Secondly, I'm the author of regex. I understand it isn't compatible with ECMAScript. What I'm asking you is why you need compatibility with ECMAScript regexes.
Thirdly, the Pattern trait is nightly only. Implementing it doesn't give you any new expressive power. It's at best a mild convenience. Not using a regex engine and implementing your own just because of the Pattern trait doesn't strike me as the wisest of decisions.
Fourthly, the regex crate supports both (?<name>re) and (?P<name>re) syntaxes.
Right. It probably doesn't. It usually doesn't. Not always. If you don't have a specific need to be ECMAScript compliant, then just use the regex crate.
I guess it's their first Rust code, but definitely not first project. It's a high quality crate with a good implementation.
Pattern is still unstable and AFAIK has no path to stability. Again, I'd recommend staying away from it. It will just make code harder to migrate to stable Rust if and when you do that in the future. You really should only be using unstable Rust features if you have an extremely compelling reason to do so. Using the Pattern trait is not a compelling reason.
Many developers work with multiple languages, e.g. a Rust backend with HTML/JS user interface. Having distinct syntaxes in each language can get pretty confusing.
PCRE is another widely used standard. Perhaps not officially standardized somewhere, but built into PHP, Perl, Apache, Nginx, R, sed and Grep, a Python library exists, so certainly a de-facto standard.
It's not a standard or specified anywhere. And PCRE and Perl have a large swath of differences, despite the fact that PCRE is called "Perl compatible." Moreover, PCRE, POSIX and ECMAScript all fundamentally share the most important downside: their worst search times are exponential because none of them implement regular languages. They all require some feature (like back-references or look-around) that are not known how to implement efficiently. (EREs in POSIX don't require back-references, but I believe most implementations provide them. Which further shows that specifications are not enough to give you a guarantee of uniformity.)
And grep does not have PCRE. It gas POSIX regexes. Some implementations of grep have PCRE that you can opt into, such as GNU grep. But GNU grep has three distinct flavors of regexes built into it. (BREs, EREs and PCRE.)
The only reason you see PCRE as a "de facto" standard is because its implementation is re-used in several places. It's not because people have re-implemented it various places.
None of PCRE, POSIX or ECMAScript provide the requirements necessary to implement regex engines that aren't susceptible to ReDoS. You could implement EREs strictly, but oops, that rules out environments that require UTF-16 such as Java and C# because none of POSIX can support UTF-16.
There is only one reasonable solution at this point: don't assume that all regex flavors are the same. Thankfully, most popular regex flavors share a lot more similarities than differences.
You seem to also be missing my most important point: nobody here has suggested adding a new regex flavor. The only suggestions have been for re-using existing flavors.
$ man grep | grep -A2 -- '-P,'
-P, --perl-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs). [...]
Obviously, this qualifies as "built into grep".
In the open source world, an implementation is even better than a committee-standard. Source code means there is no need to re-implement anything. Just link and enjoy proven code with guaranteed compatibility. Behavior is clearly defined, it's well documented, widely adopted, developers know it, developers use it, great!
Regarding the overall post: that's exactly what I feared. First you say you don't want to invent yet another flavor, then you find lots of excuses why every existing flavor would be flawed. Only possible outcome is, well, another flavor.
You quoted me out of context. I also said, "Some implementations of grep have PCRE that you can opt into, such as GNU grep."
ReDoS isn't an "excuse." And you're wildly misreading what I said. I didn't say "every existing flavor was flawed." I said that PCRE and the only two popular specifications for regex (POSIX and ECMAScript) had fundamental flaws.
The RE2 flavor was released in 2010. The regex crate and the standard library regexp package in Go both closely adhere to the syntax of RE2. But just like Perl and PCRE aren't the same, RE2 and regex crate aren't the same either. The RE2 flavor does not require features that aren't known how to implement efficiently, and thus, it is a good mitigation against ReDoS.
I personally feel like this conversation with you has no back-and-forth at all, so I'm going to end it here and mute this thread.
Once again, nobody here has suggested adding a new flavor.
I maintain the pcre2 crate wrapper for use inside of ripgrep.
I just wrote a RE parser in Rust, though it was an exercise and never been used by anyone but me. It works, and the frontend/backend design allows adding different pattern engines (it currently has one that is pretty standard emacs regexp and one is a new design that seems easier to me) It's at GitHub - russellyoung/regexp-rust: Second Rust project: regular expression searcher. The source code is all there, though as a first (well, second) Rust project probably it could be improved.