Efficient stdout: it's buffers all the way down

Hi everyone,

I am currently toying around with an high throughtput fizzbuzz, inspired by this post: fastest code - High throughput Fizz Buzz - Code Golf Stack Exchange. std::io::Stdout is known to be line-buffered, and the work-around I found online is to wrap it (or rather an StdoutLock) in a BufWriter. However, it seems to me that though it may reduce the frequency of syscalls, one still incurs the cost of searching for \n in the buffer, as well as an unneeded copy between the BufWriter and the LineBuffer. To circumvent that, I hacked the piece of code linked below, which I wrap in the BufWriter but to my surprise the performances were not really different (though I didn't test it well isolated to be honest). Could someone help me understand which buffer (and operations) bytes go through before being printed to the terminal? I tried figuring it out by reading the std code but it is pretty involved.

The hacked stdout:

impl std::io::Write for UnixStdoutRaw {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        unsafe {
            let res = libc::write(libc::STDOUT_FILENO, buf.as_ptr() as _, buf.len());
            if res == -1 {
                return std::io::Result::Err(std::io::Error::last_os_error());
            } else {
                return Ok(res.try_into().unwrap());
            }
        }
    }

    fn flush(&mut self) -> std::io::Result<()> {
        Ok(())
    }
}

Here is my theory on why it is almost as efficient.

If you are using a BufWriter, it will only be sending (fairly) large writes to Stdout. This makes it more efficient for the line buffering to work (and makes the locking more efficient too*).

Stdout's line buffering only searches for a newline back from the end and then directly writes everything out (without copying) up to (and including) the newline. The extra buffering will only become a problem if your text that you are writing to Stdout has very few newlines.

(see linewriter.rs and linewritershim.rs)

*: If you are just wrapping a Stdout struct, it locks the output on each write. If you instead lock it, getting a StdoutLock, it holds the lock throughout, negating that overhead, except for locking at the very start and unlocking at the end.

1 Like

Thanks for your answer.

It's true that the backward search for the newline makes it so that for large buffers, the cost per byte of output goes to zero. However, there is still one more copy isn't there?

1 Like

It actually turns out that there isn't, except on Windows:*

The LineWriter wraps a BufWriter, just with some special handling for the newlines. To write the data, it calls BufWriter::write [1], which then calls write_cold [2] if the write is bigger than the buffer (which is 1024 bytes for Stdout). There [3], it then just writes it directly to the output, without copying.

[1]: rust/io/buffered/bufwriter.rs:516 at e60e19b · rust-lang/rust (github.com)
[2]: rust/io/buffered/bufwriter.rs:355 at e60e19b · rust-lang/rust (github.com)
[3]: rust/io/buffered/bufwriter.rs:364 at e60e19b · rust-lang/rust (github.com)

*: On Windows, when Stdout is to a console/terminal, the bytes are re-encoded as UTF-16 [4], then given to WriteConsoleW. Otherwise, there is no copy and the bytes are just passed to libc::write [5].

[4]: rust/sys/windows/stdio.rs:153 at e60e19b · rust-lang/rust (github.com)
[5]: rust/sys/unix/fd.rs:132 at e60e19b · rust-lang/rust (github.com)

1 Like

Thanks a lot for taking the dive into std's code! Everything is much clearer now.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.