Performance difference between parsing input in two different ways?

I have a file which contains numbers in a format like this, x-y,z-w where x,y,z and w are the numbers.

My parser for this looked like this,

fn parse_1(input: &[u8]) -> Vec<((u8, u8), (u8, u8))> {
    input
        .split(|&c| c == b'\n')
        .filter(|line| !line.is_empty())
        .map(|line| {
            let mut nums = line.split(|&c| c == b',').flat_map(|set| {
                set.split(|&c| c == b'-').map(|num| {
                    num.get(1)
                        .map_or(num[0] - b'0', |c| (num[0] - b'0') * 10 + c - b'0')
                })
            });

            (
                (nums.next().unwrap(), nums.next().unwrap()),
                (nums.next().unwrap(), nums.next().unwrap()),
            )
        })
        .collect()
}

Playground Link

This code runs in around 20 microseconds for a 1000 line input.

Some one on discord suggested me to rewrite it to parse input sequentially. That code looks like this,

fn parse_2(input: &[u8]) -> Vec<((u8, u8), (u8, u8))> {
    input
        .split(|&c| c == b'\n')
        .filter(|line| !line.is_empty())
        .map(|line| {
            let mut line = line.splitn(2, |&c| c == b'-');
            let e1 = line.next().unwrap();

            let rest = line.next().unwrap();
            let mut line = rest.split(|&c| c == b',');
            let e2 = line.next().unwrap();

            let rest = line.next().unwrap();
            let mut line = rest.splitn(2, |&x| x == b'-');
            let e3 = line.next().unwrap();
            let e4 = line.next().unwrap();

            (
                (
                    e1.get(1)
                        .map_or(e1[0] - b'0', |c| ((e1[0] - b'0') * 10 + c - b'0')),
                    e2.get(1)
                        .map_or(e2[0] - b'0', |c| ((e2[0] - b'0') * 10 + c - b'0')),
                ),
                (
                    e3.get(1)
                        .map_or(e3[0] - b'0', |c| ((e3[0] - b'0') * 10 + c - b'0')),
                    e4.get(1)
                        .map_or(e4[0] - b'0', |c| ((e4[0] - b'0') * 10 + c - b'0')),
                ),
            )
        })
        .collect()
}

Playground Link

This code runs in around 13 microseconds. They also said that sequential parsers like this are generally a bit faster? I didn't understood how or why? I tried looking at godbolt output for the two functions and that went over my head.

In the first version, the first thing you do for each line is find the comma character which is in the middle of the line. So you have to do a bunch of reads and comparisons to find the comma, and then you immediately read those characters again to parse the first set of numbers. You're basically duplicating work.

It's possible you could get a similar benefit from transforming the split on the newline character in the same way.

2 Likes

Did you get these measurements from criterion or bencher?

If not, the numbers may be unreliable. Microbenchmarks are hard to implement properly. Simple timing of code run once will be very noisy. Results are often very misleading due to caches, power states of the CPU, and pre-emptive multitasking. If you run a benchmark for less than a few seconds and don't eliminate outliers, you may be measuring a wrong thing.

2 Likes
  • Try reserving capacity in the Vec (use extend instead of collect). The parsing is relatively simple, and you may be spending time just growing the vector.

  • Avoid unwrap. It inhibits optimizations due to having a super significant side effect. Error handling is actually faster! You don't even need an error type, ? works in functions returning Option.

I got these numbers with the test::Bencher. My benchmark code looked like,

#[bench]
fn bench(t: &mut test::Bencher) {
    t.iter(|| {
        let v = parse_2(&[49, 57, 45, 52, 55, 44, 49, 56, 45, 57, 57]);
        test::black_box(v);
    })
}

(As input, It was parsing contents of a file(read with, include_bytes!() which was 1000 lines in length)

1 Like

Noted :+1:
I'll update it to use a vector with reserved capacity and I'll try to remove unwrap where I can. Thank you!

This makes a lot of sense and thank you for explaining it so well!