I am currently toying around with an high throughtput fizzbuzz, inspired by this post: fastest code - High throughput Fizz Buzz - Code Golf Stack Exchange. std::io::Stdout is known to be line-buffered, and the work-around I found online is to wrap it (or rather an StdoutLock) in a BufWriter. However, it seems to me that though it may reduce the frequency of syscalls, one still incurs the cost of searching for \n in the buffer, as well as an unneeded copy between the BufWriter and the LineBuffer. To circumvent that, I hacked the piece of code linked below, which I wrap in the BufWriter but to my surprise the performances were not really different (though I didn't test it well isolated to be honest). Could someone help me understand which buffer (and operations) bytes go through before being printed to the terminal? I tried figuring it out by reading the std code but it is pretty involved.
The hacked stdout:
impl std::io::Write for UnixStdoutRaw {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
unsafe {
let res = libc::write(libc::STDOUT_FILENO, buf.as_ptr() as _, buf.len());
if res == -1 {
return std::io::Result::Err(std::io::Error::last_os_error());
} else {
return Ok(res.try_into().unwrap());
}
}
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
Here is my theory on why it is almost as efficient.
If you are using a BufWriter, it will only be sending (fairly) large writes to Stdout. This makes it more efficient for the line buffering to work (and makes the locking more efficient too*).
Stdout's line buffering only searches for a newline back from the end and then directly writes everything out (without copying) up to (and including) the newline. The extra buffering will only become a problem if your text that you are writing to Stdout has very few newlines.
*: If you are just wrapping a Stdout struct, it locks the output on each write. If you instead lock it, getting a StdoutLock, it holds the lock throughout, negating that overhead, except for locking at the very start and unlocking at the end.
It's true that the backward search for the newline makes it so that for large buffers, the cost per byte of output goes to zero. However, there is still one more copy isn't there?
It actually turns out that there isn't, except on Windows:*
The LineWriter wraps a BufWriter, just with some special handling for the newlines. To write the data, it calls BufWriter::write [1], which then calls write_cold [2] if the write is bigger than the buffer (which is 1024 bytes for Stdout). There [3], it then just writes it directly to the output, without copying.
*: On Windows, when Stdout is to a console/terminal, the bytes are re-encoded as UTF-16 [4], then given to WriteConsoleW. Otherwise, there is no copy and the bytes are just passed to libc::write [5].