2nd project: regular expression searcher

After kind of finishing my first project, tetrii, and a short rest I've started the second, a regular expression searcher. It is another non-trivial problem that I've done before in another language, so I'm familiar with an algorithm to do it.

This is a much better choice - no glib, with its messy ownership and Rc<RefCell<_>> mess. I already learned a lot, and had to rewrite it once to get over the last hump (replacing the Boxes containing my Node structs with enums, in order to be able to recover the full struct rather than just the dyn trait). I'd be interested to find out if there is anything I am missing in the language that would let me improve the code.

Currently it is still in an early stage - it only has the RE parser to build the parse tree, there is no code to walk it yet. And the existing code has not been run, though it does build. Earlier versions ran well enough to confirm that the design, once implemented correctly, should work. It is located here. Thanks in advance for any help and advice.

Great to hear from you! Here a a few comments, might add more later.

Peekable.peeked could be a VecDeque to support both O(1) pop_front, push_front and push_back. But in general I like your design, passing around a &mut Peekable.

You have many instances of if (x.is_some()) { x.unwrap() }, which is a footgun and more idiomatically written with a match statement. Some people say that unwrap should only be used in "should never happen" situations. This also applies to things like if prev.node_type() == NODE_OR { prev.or_mut_ref().unwrap(); }, which can better be written using a match.

Your or_ref, or_mut_ref and and_mut_ref are a little strange because their return value is wrapped in an Option, but it can never be None. Instead, the function panics, which would be unexpected for a function returning Option. It would be more idiomatic to return None instead of panicking.

desc should propably be replaced by impl Debug for all tructs.

In fn reps, you can replace let (min, max): (usize, usize); match next {...} with let (min, max) = match {...} and remove the assignments from each branch.


Thank you, this is a big help.

The Options in the access function should have been removed. It isn't an API but is for internal use, and any wrong type is a programming error, so it should be a panic!(). The Options remain from earlier code. I've removed them.

I was all set to defend my use of "if prev_node() == "... but went back and looked, and found using match is actually cleaner and easier to read than the way I originally wrote it. Score one for Rust.

Is unwrap() a bad thing? I try to use it only after checking the value, usually by a "?" in the previous assignment line. I see where it can be a bad habit to get into, but it also seems like a good way to catch programming errors, like the panic! in the

The reason for desc() rather than Debug is to get the indenting right for nested nodes - for And and Or I want to indent the child output, and didn't see a way to do that using Debug. Implementing Debug for Node means that the node structures all do print debug output.

I thought about VecDeque, and actually put it in, but then backed it out. I like using an iterator rather than a vector, though in this case a regexp won't be long enough to make a difference. What finally decided me was progress() - I recall last time several infinite loops during debugging, and I like the idea of a sanity check to confirm that progress is being made. Still, it is a good reminder to check what is available in a language rather than writing everything myself

Checking and then unwrapping is the anti-pattern. If you always check that the Option is Some in the first place, then you should always be using pattern matching instead, because that can't accidentally start panicking if you refactor the code. Why choose the possible (unintended) panic when you could go for the safe option?

If you use unwrap() on something that can only ever be Some unless a bug happens, the unwrapping is fine, but then why would you check it explicitly beforehand? These two simply don't go well together.

The alternative representation of derived Debug impls does that. Either dbg!(expr) or println!("{:#?}", expr), etc. will correctly indent children of a data structure.

1 Like

The code is finished, if anyone would care to comment. It is supposed to be a complete project, including documentation and testing.

I did learn a lot while doing it, but there are some things I still need to learn. For one, I still have no real feel for marking lifetimes. My technique is (still) to let the compiler give errors and suggestions, and then see if I can figure out how to make them go away.

The code is here: GitHub - russellyoung/regexp-rust: Second Rust project: regular expression searcher . Besides the basic searching part it also includes a few API functions to access matched strings easily, and an interactive interface where REs and text strings can be stored and experimented on, including dumping out parse trees and walk paths.

I'd also like to thank those above who commented on earlier versions. So many times my designs were limited by my (lack of) knowledge, and hints or ideas from people with more experience led me to understanding I wouldn't have had otherwise. With no one in my area to talk with about it I only had online as a resource. StackOverflow helped on concrete problems and rust-lang for concepts.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.