I'd like to parse time expressions of the form hh:mm such as 3:15 or 12:30 to tuples (3, 15) or (12, 30), respectively. Here's the code I came up with (storing the Regex in a struct to avoid repeated compilation):
pub struct TimeParser {
re: Regex,
}
impl TimeParser {
pub fn new() -> Self {
let re = Regex::new("^([0-9]+):([0-5][0-9])$").unwrap();
TimeParser { re }
}
pub fn parse(&self, time: &str) -> Option<(usize, usize)> {
match self.re.captures(time) {
Some(matches) => {
if matches.len() == 3 {
let hours = match matches.get(1) {
Some(hours) => hours.as_str().parse().unwrap_or(0),
None => 0,
};
let minutes = match matches.get(2) {
Some(minutes) => minutes.as_str().parse().unwrap_or(0),
None => 0,
};
Some((hours, minutes))
} else {
None
}
}
None => None,
}
}
}
My haunch is that the method parse could be shorter; but I didn't figure out a way to make it more concise yet. Any ideas?
You can use the ? operator to simplify error handling:
pub fn parse(&self, time: &str) -> Option<(usize, usize)> {
let matches = self.re.captures(time)?;
let hours = matches.get(1)?.as_str().parse().ok()?;
let minutes = matches.get(2)?.as_str().parse().ok()?;
Some((hours, minutes))
}
But it would probably be better to fix the regex (as @akrauze suggests, above), and use .unwrap() and/or indexing syntax for the operations that can never fail:
pub fn parse(&self, time: &str) -> Option<(usize, usize)> {
let matches = self.re.captures(time)?;
let hours = matches[1].parse().unwrap();
let minutes = matches[2].parse().unwrap();
Some((hours, minutes))
}
Second @mbrubeck's code snippet is great, and I would totally accept that code in a review. However if you enjoy more functional/monadic coding style, you can do:
pub fn parse(&self, time: &str) -> Option<(usize, usize)> {
self.re
.captures(time)
.map(|matches| {
matches
.get(1)
.zip(matches.get(2))
.expect("regex asserts that this will not fail")
})
.map(|(hours, minutes)| {
(
hours
.as_str()
.parse()
.expect("regex asserts that this will not fail"),
minutes
.as_str()
.parse()
.expect("regex asserts that this will not fail"),
)
})
}
Main differences beetween this and your logic (other than imperative/functional approch) is that this doesn't need to handle None case of matches.get like you do (you should have probably use unwrap anyway, since you have checked length), and since we know that parsing cannot fail (that is the whole point of matching first with regex) I just called expect on parsing results.
As a final design note, representing hours and minutes as usize wrong. Use usize when you are working with pointer offsets/collection indexes (same thing if you thing about it), and in other cases use specific integer types. u8 should be fine for you.
Thanks for pointing this out! However, that was intentional, I just failed to communicate my real intention: I'd like to sum up billable hours, which are always in hours/minutes and not in days, so 123:55 is valid in my case.
I think this is the closest to what I would write, but the parsing step can fail! So this will panic for some user inputs.
This is how I would write it:
use regex::Regex;
pub struct TimeParser {
re: Regex,
}
impl TimeParser {
pub fn new() -> Self {
let re = Regex::new("^([0-9]+):([0-5][0-9])$").unwrap();
TimeParser { re }
}
pub fn parse(&self, time: &str) -> Option<(usize, usize)> {
let (_, [hours, minutes]) = self.re.captures(time)?.extract();
Some((hours.parse().ok()?, minutes.parse().ok()?))
}
}
fn main() {
let p = TimeParser::new();
assert_eq!(p.parse("123456789:59"), Some((123456789, 59)));
// In the original code, this returned `Some((0, 59))`,
// but `None` seems better.
assert_eq!(p.parse("99999999999999999999999:59"), None);
assert_eq!(p.parse("0:00"), Some((0, 0)));
assert_eq!(p.parse("00"), None);
assert_eq!(p.parse("0"), None);
assert_eq!(p.parse(""), None);
}
This makes use of Captures::extract, which is a relatively new addition (in the last year). It lets you get the values matching a capture group a little more concisely. And it will prevent you from getting it wrong at compile time. (It won't work if you have conditional capture groups or if the number of groups can change based on the match, for example.)
Only on pathological input where hours amount doesn't fit unto usize, right? If either hours or minutes are not an unsigned integer, the regex would not match in the first place.
I would actually write this differently. There's no need for a separate parser type; however, the bare tuple representation is unclear, a dedicated type would be nicer. It could then have a FromStr impl with a real error type: Playground
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
struct BilledTime {
hours: usize,
minutes: usize,
}
#[derive(Clone, Debug, Error)]
enum TimeParseError {
#[error("input does not match expected format")]
InvalidFormat,
#[error(transparent)]
ParseInt(#[from] ParseIntError),
}
impl FromStr for BilledTime {
type Err = TimeParseError;
fn from_str(input: &str) -> Result<Self, Self::Err> {
static RX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new("^(?P<hours>[01]?[0-9]|2[0-3]):(?P<minutes>[0-5][0-9]|60)$").unwrap()
});
let (_, [hours, minutes]) = RX.captures(input)
.ok_or(TimeParseError::InvalidFormat)?
.extract();
let hours: usize = hours.parse()?;
let minutes: usize = minutes.parse()?;
Ok(BilledTime { hours, minutes })
}
}