Strange write behavior from Stdout

mycroft · April 14, 2022, 8:49pm

While converting a random trivial tool from Python to Rust, I wanted to improve the performance by doing larger I/O. So I wrapped the input and output in BufReader/BufWriter:

  let mut f = BufReader::with_capacity(1024*1024, f);
  let mut g = BufWriter::with_capacity(1024*1024, std::io::stdout());

What I'm (kind of obviously) expecting here is to see a smooth sequence of 1MB reads and writes. Indeed when I strace it, the reads are good:

read(3, "\0\377\377\377\377\377\377\377\377\377\377\0\0\2\0\2\0\0\10\0\0\0\10\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(3, "\v\0\1\4\377\3\1\4\v\0\1\4\377\3\1\4\v\0\1\4\377\3\1\4\v\0\1\4\377\3\377\7"..., 1048576) = 1048576
read(3, "\27\tD\2\10\4\6\5\364\344\0\0X^\10\t\0\0dJ\0\0\0\0\354\302\7\371R\5\6\t"..., 1048576) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
...

But the writes are, at first blush, all over the place:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1047804) = 1047804
write(1, "\17#\4\256\24\0Q2\356\363\21\3\0324\35\362\335 \360\37\24\0\332 %\374\322.\26\372\324T"..., 772) = 772
write(1, "@\0\5 \302\36\23\356\357^\25\16\313A\20#@\0\256B\377\360\nF\20\316\f\24@\340\r\23"..., 1048259) = 1048259
write(1, "\0\20\0\1\0\0\0!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!"..., 317) = 317
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0 \2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0"..., 1040391) = 1040391
...

When I analyzed the output further, I found that the writes are occurring in clusters that add up to 1MB. But I can't explain why it doesn't just write that amount in one call.

I then modified the program to take a second file name and use File::create() instead. In that case, I see the 1MB writes I expected:

write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(4, "@\0\5 \302\36\23\356\357^\25\16\313A\20#@\0\256B\377\360\nF\20\316\f\24@\340\r\23"..., 1048576) = 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0 \2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0"..., 1048576) = 1048576
write(4, "\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363{\357{\357Z\353"..., 1048576) = 1048576
write(4, "\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200"..., 1048576) = 1048576
...

So this appears to be some weird property of Stdout? Before I rabbit hole into that implementation, can someone explain what is going on here and how I can get the result I expect?

For reference, I'm using Rust 1.57.0 on Ubuntu 20.04 x86_64.

mycroft · April 14, 2022, 10:26pm

I've discovered that this has to do with Stdout's line buffering. It seems there are no good, portable workarounds.

quinedot · April 15, 2022, 12:09am

Here's an issue about it.

quinedot · April 15, 2022, 12:15am

If you can't find a crate but want to do it yourself, I'd probably start by looking at ripgrep as linked in that issue. BurntSushi is a team member and makes great stuff.

mycroft · April 15, 2022, 12:18am

Thanks. I've now found several PRs and projects related to this. Unfortunately none of them have been moving forward. (Similarly with the workarounds I have to do to read fixed-length chunks and handle EOF properly, something libc's fread() does trivially.)

What really boggles me here is the why in this specific case. The point of line buffering is to insure that when a newline is written the output goes out immediately. Since my own BufWriter is passing down large chunks and bypassing Stdout's internal buffer (mostly) anyway, just writing all the data immediately would meet this expectation. There's no need to split it up.

Now obviously fixing that behavior wouldn't speed up repeated small println!() using the standard buffer as cited in some PRs.

More comprehensively, it seems like std could use an equivalent to libc's setvbuf(), allowing one to change the buffering mode or the buffer size, without having to create an extra BufWriter like I did. I did see some work in this direction, creating a switchable buffer layer, but it seems to be stalled.

stonerfish · April 15, 2022, 12:35am

If your target is unix, have you tried the method with writing to file, but to the file /dev/stdout. Instead of using std::io::stdout.

mycroft · April 15, 2022, 12:37am

On Unix I can also do:

+  let g: File = unsafe{std::os::unix::io::FromRawFd::from_raw_fd(1)};
+  let mut g = BufWriter::with_capacity(1024*1024, g);

But I'd prefer a portable method.

quinedot · April 15, 2022, 12:40am

I agree it's not ideal. How exactly are you outputting incidentally? I see write_fmt uses write_all,^[1] so it's somewhat surprising too. Hard to gaze through all the layers of abstraction in the implementaiton though.

potentially -- the comment is odd on a trait, as implementors don't have to use write_all ↩︎

mycroft · April 15, 2022, 12:43am

I'm using write_all() on binary blobs. Ain't got time for writing outputs loops unnecessarily.

mycroft · April 15, 2022, 12:50am

Honestly, even just exposing the underlying File from Stdout would create a trivial workaround, one could just make a new BufWriter and forget about the Stdout buffer. I'm not sure whether there's some portability constraint that makes this difficult.. The fact that I'm passing an immutable File to BufWrtier suggests that there should be no issues with mutable borrowing.

quinedot · April 15, 2022, 12:52am

Looking closer at your strace, yeah, that's probably as good as it will get with the current stdlib -- flush the line buffer, then write the blob, leading to intermingled short and long writes.

quinedot · April 15, 2022, 12:59am

You could #[cfg] your way through it with generics or dyn Write, if you don't want to take the time to find a properly portable approach for every platform.

system · July 14, 2022, 1:00am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Processing I/O with BufRead help	5	962	January 12, 2023
BufReader read bytes fail a feature or a bug help	4	288	January 2, 2024
Default buffer capacity of BufReader? help	5	2932	January 12, 2023
BufReader and Writer help	6	1363	August 27, 2020
TcpStream and BufReader Behavior help	8	1180	September 16, 2020

Strange write behavior from Stdout

Related Topics