Is there a problem with my benchmark code? (Execution seems too fast)

I wrote a benchmark to test the speed of generating simple web pages using the Sailfish template engine, pulldown_cmark, and TOML for frontmatter data.

Somehow I can't believe that the code is so fast, so I'm wondering, if I made a mistake somewhere.

The benchmark code is:

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use plugin_system::{control, MARKDOWN};

pub fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("control 100.000", |b| {
        b.iter(|| control::run(black_box(MARKDOWN), black_box(100_000)))
    })
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

And the benchmarked code is as follows:

use extract_frontmatter::Extractor;
use pulldown_cmark::{html, Parser};
use sailfish::TemplateOnce;
use toml::Value;

#[derive(TemplateOnce)]
#[template(path = "benchmark.stpl")]
struct Template<'a> {
    title: &'a str,
    author: &'a str,
    description: &'a str,
    body: String,
}

pub fn run(markdown: &'static str, times: i32) {
    for _ in 0..times {
        generate_page(markdown)
    }
}

fn generate_page(markdown: &'static str) {
    let mut fm = Extractor::new(markdown);
    fm.select_by_terminator("---");
    let (front_matter, document): (Vec<&str>, &str) = fm.split();

    let fm_str = front_matter.join("\n");
    let mut fm_val: Value = fm_str.parse().unwrap();
    let fm_tbl = fm_val.as_table_mut().unwrap();

    let title = fm_tbl.remove("title").unwrap();
    let author = fm_tbl.remove("author").unwrap();
    let description = fm_tbl.remove("description").unwrap();

    let parser = Parser::new(document);

    let mut html = String::new();
    html::push_html(&mut html, parser);

    let ctx = Template {
        title: title.as_str().unwrap(),
        author: author.as_str().unwrap(),
        description: description.as_str().unwrap(),
        body: html,
    };

    ctx.render_once().unwrap();
}

MARKDOWN is a small Markdown document. I use a &'static str instead of loading files from the file system to exclude IO time.

The benchmark result is:

Benchmarking control 100.000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 84.6s, or reduce sample count to 10.
control 100.000         time:   [841.15 ms 841.96 ms 842.99 ms]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

If I understand the output correctly, 'control 100.000' generates 100.000 HTML documents in 841.15 ms.

Could this be correct?

Normally when using criterion, you just ask it to benchmark a single iteration, since criterion handles the looping for you to get an accurate time. Can you compare with the time to just generate one HTML document?

Also, it looks like the compiler could optimize away the run function entirely, right? Since it doesn't do anything?

2 Likes

To prevent the compiler from optimizing the function away, you should return the resulting String from your run function. b.iter(...) will use black_box internally to make sure the output is actually calculated.

2 Likes

Normally when using criterion, you just ask it to benchmark a single iteration, since criterion handles the looping for you to get an accurate time. Can you compare with the time to just generate one HTML document?

@droundy: Thank you. I should have mentioned, that I'm benchmarking a plugin system I'm building (similar to a middleware system, or a build tool, basically something that processes a stream of data). The code in the first post is the control I'm comparing my plugin system to, so I can't remove the internal iteration.

Also, it looks like the compiler could optimize away the run function entirely, right? Since it doesn't do anything?

To prevent the compiler from optimizing the function away, you should return the resulting String from your run function. b.iter(...) will use black_box internally to make sure the output is actually calculated.

@droundy and @Riateche: I now let generate_page return the length of the generated string:

fn generate_page(markdown: &'static str) -> usize {
    /* ... */
    ctx.render_once().unwrap().len()
}

And run adds everything together and returns the result:

pub fn run(markdown: &'static str, times: i32) -> usize {
    let mut bytes = 0;
    for _ in 0..times {
        bytes = bytes + generate_page(markdown)
    }
    bytes
}

Would that prevent the compiler from optimizing the function away?

It only changed the result slightly (+2.6%).

Yes, that should prevent the code from being optimized out completely. However, depending on what generate_page does, the optimizer might be smart enough to see that it will return the same value every time, allowing it to call it just once instead of calling it every time through the loop.

What result do you get if you pass 1 instead of 100_000 for the iteration count?

@mbrubeck: The output for "1, "100", and "1000" is as follows:

control 1               time:   [8.1793 us 8.2754 us 8.3967 us]
                        change: [-99.015% -98.990% -98.967%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

control 100             time:   [868.88 us 877.56 us 887.56 us]
                        change: [+3.3629% +4.9518% +6.5157%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

control 1000            time:   [8.4183 ms 8.5174 ms 8.6368 ms]
                        change: [-9.7957% -6.8986% -4.0572%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  7 (7.00%) high severe

So everything seems to look okay.

depending on what generate_page does, the optimizer might be smart enough to see that it will return the same value every time, allowing it to call it just once instead of calling it every time through the loop.

To prevent this, I now pass the iteration index to generate_page and add that to the return value:

pub fn run(markdown: &'static str, times: i32) -> usize {
    let mut bytes = 0;
    for i in 0..times {
        bytes = bytes + generate_page(markdown, i as usize)
    }
    bytes
}

fn generate_page(markdown: &'static str, i: usize) -> usize {
    /* ... */
    ctx.render_once().unwrap().len() + i
}

This doesn't seem to change anything, however (as expected from the result above):

control 1               time:   [8.7039 us 8.8875 us 9.0945 us]
                        change: [+3.9369% +6.3796% +8.8751%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

control 100             time:   [846.41 us 856.64 us 869.48 us]
                        change: [-5.6674% -3.8057% -1.7947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe

control 1000            time:   [8.4736 ms 8.5711 ms 8.6907 ms]
                        change: [-1.1419% +0.6313% +2.5488%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high severe

It sure looks like your function is able to parse and render a document in about 9 microseconds. Perhaps this is not too surprising, since it is within the range of benchmarks for several of the Markdown rendering libraries in this benchmark: GitHub - rsms/markdown-wasm: Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c

or this benchmark: GitHub - 1Password/markdown-benchmarks: Benchmarking markdown libraries

2 Likes

@mbrubeck: Wow. I think, I need to start to worry less about performance (and benchmark more often). I'm planing to create a static website generator that can generate 100.000 pages in under 10 seconds (just as a nice goal), and that should be possible, it seems.

Thanks everybody!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.