Why is this rust loop ~3x slower when writing to disk?

Hi! I'm new to rust, and heard about the great community here :slight_smile:

I'm re-writing a CLI tool that I wrote in C and I'm surprised by the following result: when I process a text file and write to /dev/null, my Rust code is as fast as my C code (a tad faster, actually!); but when I write to a file on disk, my Rust code is about 3x slower. Why would this be? Things I think I'm doing "right":

  • I'm locking the Stdout struct before the inner loops
  • I'm using Vec<u8> for ascii input, which is comparable to char * performance in C
  • I'm not using println! or related formatting, because I've read there is a performance penalty there
  • I'm comparing my C code with cargo build --release results

The following code is also available in a WIP at GitHub - wordtreefoundation/ngram-tools at rust-tools-question

    io::stdin().read_to_end(&mut buffer).unwrap();
    let result = normalize::normalize_ascii(&buffer);

    let stdout = io::stdout();
    let mut out_handle = stdout.lock();
    for sentence in result.split(|c| c == &b'.') {
        if sentence.len() > 0 {
            let words: Vec<&[u8]> = sentence.split(|c| c == &b' ').collect();
            for ngram in words.windows(number) {
                let mut first_word_written = false;
                for word in ngram {
                    if first_word_written {
                        out_handle.write_all(&[b' ']).unwrap();
                    } else {
                        first_word_written = true;
                    }
                    out_handle.write_all(word).unwrap();
                }
                out_handle.write_all(&[b'\n']).unwrap();
            }
        }
    }

Is write_all not the best way to be sending data to stdout? What else can I check?

Thanks!

3 Likes

Writing one or few characters at a time will be slow, because it does locking and possibly system calls each time (I think .lock() is a lie, since this doesn't deadlock).

You can wrap your out_handle in io::BufWriter to make writes buffered, and lower the overhead.

From what I've seen, there is a performance penalty due to stdout being unconditionally line-buffered, but there is currently absolutely no way around this. Writing a newline character in any way triggers a flush. (there was a PR to fix this so that it's only line-buffered when connected to a terminal, but it got stalled and hasn't been picked back up)

To see if that's the case here, try constructing a BufWriter<File> around "/dev/null" for comparison.

stdout in rust already has a BufWriter. (more specifically, a LineWriter)

2 Likes

It's not a lie -- it just uses a reentrant mutex.

5 Likes

This is it! You've nailed it. Thank you for the explanation.

I commented out the newline character (out_handle.write_all(&[b'\n']).unwrap();) and my Rust code is now 2X faster than C.

So the unconditional line buffering is adding an approximate 6X speed cost. At least I know the problem. It sounds like the solution is still a ways out? Any workarounds?

2 Likes

For anyone else looking for this PR, I think this is the one @ExpHP is talking about: io::Stdout should use block bufferring when appropriate ยท Issue #60673 ยท rust-lang/rust ยท GitHub

Answering my own question here for future help-seekers:

You can fairly easily replace the standard io::stdout() with the grep-cli variant used by ripgrep: grep_cli::stdout - Rust

I'm getting about a 10X improvement in speed.

7 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.