I have a file which contains numbers in a format like this, x-y,z-w
where x
,y
,z
and w
are the numbers.
My parser for this looked like this,
fn parse_1(input: &[u8]) -> Vec<((u8, u8), (u8, u8))> {
input
.split(|&c| c == b'\n')
.filter(|line| !line.is_empty())
.map(|line| {
let mut nums = line.split(|&c| c == b',').flat_map(|set| {
set.split(|&c| c == b'-').map(|num| {
num.get(1)
.map_or(num[0] - b'0', |c| (num[0] - b'0') * 10 + c - b'0')
})
});
(
(nums.next().unwrap(), nums.next().unwrap()),
(nums.next().unwrap(), nums.next().unwrap()),
)
})
.collect()
}
This code runs in around 20 microseconds for a 1000 line input.
Some one on discord suggested me to rewrite it to parse input sequentially. That code looks like this,
fn parse_2(input: &[u8]) -> Vec<((u8, u8), (u8, u8))> {
input
.split(|&c| c == b'\n')
.filter(|line| !line.is_empty())
.map(|line| {
let mut line = line.splitn(2, |&c| c == b'-');
let e1 = line.next().unwrap();
let rest = line.next().unwrap();
let mut line = rest.split(|&c| c == b',');
let e2 = line.next().unwrap();
let rest = line.next().unwrap();
let mut line = rest.splitn(2, |&x| x == b'-');
let e3 = line.next().unwrap();
let e4 = line.next().unwrap();
(
(
e1.get(1)
.map_or(e1[0] - b'0', |c| ((e1[0] - b'0') * 10 + c - b'0')),
e2.get(1)
.map_or(e2[0] - b'0', |c| ((e2[0] - b'0') * 10 + c - b'0')),
),
(
e3.get(1)
.map_or(e3[0] - b'0', |c| ((e3[0] - b'0') * 10 + c - b'0')),
e4.get(1)
.map_or(e4[0] - b'0', |c| ((e4[0] - b'0') * 10 + c - b'0')),
),
)
})
.collect()
}
This code runs in around 13 microseconds. They also said that sequential parsers like this are generally a bit faster? I didn't understood how or why? I tried looking at godbolt output for the two functions and that went over my head.