While converting a random trivial tool from Python to Rust, I wanted to improve the performance by doing larger I/O. So I wrapped the input and output in BufReader/BufWriter:
let mut f = BufReader::with_capacity(1024*1024, f);
let mut g = BufWriter::with_capacity(1024*1024, std::io::stdout());
What I'm (kind of obviously) expecting here is to see a smooth sequence of 1MB reads and writes. Indeed when I strace it, the reads are good:
read(3, "\0\377\377\377\377\377\377\377\377\377\377\0\0\2\0\2\0\0\10\0\0\0\10\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(3, "\v\0\1\4\377\3\1\4\v\0\1\4\377\3\1\4\v\0\1\4\377\3\1\4\v\0\1\4\377\3\377\7"..., 1048576) = 1048576
read(3, "\27\tD\2\10\4\6\5\364\344\0\0X^\10\t\0\0dJ\0\0\0\0\354\302\7\371R\5\6\t"..., 1048576) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
...
But the writes are, at first blush, all over the place:
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1047804) = 1047804
write(1, "\17#\4\256\24\0Q2\356\363\21\3\0324\35\362\335 \360\37\24\0\332 %\374\322.\26\372\324T"..., 772) = 772
write(1, "@\0\5 \302\36\23\356\357^\25\16\313A\20#@\0\256B\377\360\nF\20\316\f\24@\340\r\23"..., 1048259) = 1048259
write(1, "\0\20\0\1\0\0\0!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!\4!"..., 317) = 317
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0 \2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0"..., 1040391) = 1040391
...
When I analyzed the output further, I found that the writes are occurring in clusters that add up to 1MB. But I can't explain why it doesn't just write that amount in one call.
I then modified the program to take a second file name and use File::create() instead. In that case, I see the 1MB writes I expected:
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(4, "@\0\5 \302\36\23\356\357^\25\16\313A\20#@\0\256B\377\360\nF\20\316\f\24@\340\r\23"..., 1048576) = 1048576
write(4, "\0\0\0\0\0\0\0\0\0\0\0\0 \2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0"..., 1048576) = 1048576
write(4, "\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363\234\363{\357{\357Z\353"..., 1048576) = 1048576
write(4, "\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200\0\200"..., 1048576) = 1048576
...
So this appears to be some weird property of Stdout? Before I rabbit hole into that implementation, can someone explain what is going on here and how I can get the result I expect?
For reference, I'm using Rust 1.57.0 on Ubuntu 20.04 x86_64.