Any regex builder crates with friendly APIs?

Are there any regex crates which have similar APIs like 0351-regex-builder in Swift:

import RegexBuilder

let emailPattern = Regex {
  let word = OneOrMore(.word)
  Capture {
    ZeroOrMore {
      word
      "."
    }
    word
  }
  "@"
  Capture {
    word
    OneOrMore {
      "."
      word
    }
  }
} // => Regex<(Substring, Substring, Substring)>

let email = "My email is my.name@mail.swift.org."
if let match = email.firstMatch(of: emailPattern) {
  let (wholeMatch, name, domain) = match.output
  // wholeMatch: "my.name@mail.swift.org"
  //       name: "my.name"
  //     domain: "mail.swift.org"
}

So we don't have to write raw regex strings.

1 Like

It seems like you are actually looking for a PEG-like parser with a proper grammar to describe the structure. Try out Pest, for example.

2 Likes

I would probably use a parser combinator library like nom for this.

To literally answer the question, you can use the regex-syntax crate underlying the regex crate, it has an AST:

3 Likes

Cool! Are there any example showing how to use the ast?

As in how to then compile and run it? Haven't tried, but from the docs looks like your best option is to to_string() the Ast then Regex::new from that. Annoying, given that internally has to parse that back into that same Ast. Probably not actually saving much given all the other processing that regex does though.

If you think regex-syntax is not easy to use, just write one

Haven't tried, but those examples looking promising.

2 Likes

I would not recommend directly using regex-syntax for this. Especially not its Ast type. I'm certain it would be quite tedious. Its Hir type would be better, but still pretty tedious. It's likely you could define some helper routines without much effort to make it easier though.

Yes, this is intentional. The surface of compatibility is only at the concrete syntax. Otherwise regex-syntax would have to be a public dependency of regex, which would couple their evolution.

Indeed not.

2 Likes

Would you expect something a bit more quote! like? Eg, a meta syntax for injecting rust generated syntax into otherwise parsed regex syntax? Other than getting that fancy I'm not sure how much better you can do than Ast / Hir and still have some level of safety, having only looked at the docs, at least. For example, it looks like you could use parse to get any complex static chunks separately, then stitch them together with the dynamic parts?

I don't know. I just focus on the concrete syntax. I leave conveniences to others. But the human_regex project linked above looks more convenient to use than Hir for example.

I think you should try writing a non-trivial regex with Ast. I've actually done that because I wrote the test suite for regex-syntax. It is not fun. It's because the Ast concerns itself with a whole bunch of little details that you really do not want to care about. The Hir type does too. Who cares about the fact that you need to box stuff just so that you can write a recursive data type? I'll tell you who doesn't care about that: people writing regexes. :wink: There also aren't shortcuts for things that are defaults. e.g., You always have to specify greediness when writing out a repetition operator with Hir.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.