Rust beginner notes & questions

I would argue that BufRead/Read2 approach is always better, but this takes some insight into API design. An incredibly common "learning curve" I see goes like this:

Q: I did this I/O code! It's slow! Can someone help please?
A: You're using Read, but you're doing too many kernel calls because you're consuming a few bytes at a time. You should use BufReader. This "lesson" is right there in the doco in the first example.

This was the underlying root cause of poor performance of the pre-Firefox Mozilla web browser for years. Practically all I/O it did was line-by-line, unbuffered, and flushed between lines when writing text files like the JavaScript profile contents. Never mind that terrible performance, this was downright dangerous. The Mozilla suite also handled POP3/IMAP email, and tens of thousands of people -- including me -- lost all of their mail data because the Mozilla suite would shred their profile on exit by cutting files in half. Oops. I submitted a bug ticket, which I discovered was one of dozens and was promptly ignored along with everyone else for a decade. Thousands of desperate people had submitted comments along the lines of "Please help! I've lost all my emails!!!".

The Mozilla team basically ignored this because it was just too hard to dig through all of the I/O code littered throughout the codebase and carefully ensure that all of it is appropriately buffered, flushed only at safe transaction points, and file replace operations were appropriately atomic. Firefox thankfully now uses SQLite for most I/O which doesn't have these issues.

So what's the true root cause here? The issue is that the "Read a buffer / Write a buffer API is a simple common denominator" is a trap. It's a pit full of pointy spikes. It's the C/C++ approach. Professionals fall into it all of the time. The entire Mozilla team did. It's not efficient for reads, it's dangerous for writes, and it doesn't scale to even user-mode applications like Firefox, let alone high-performance severs. Like I was saying in this thread, it can't even handle memory-mapped file I/O, which dates back to at least the 90s.

What the "API user" actually wants from any I/O is typically: "Give me as much data is efficiently available right now, and I'll see how much I can consume, most likely all of it. Don't stop reading just because I'm processing data."

Read doesn't do this. You provide a pre-constrained buffer of some fixed size for each call. You have to guess at what is a good size for this. Your guess will be wrong. If you make this buffer too big, then the inherent copy in the API will blow through your L1/L2 CPU caches and your performance will be bad. If you ask for too little, then you will spam the kernel with transitions and your performance will be terrible. If you try to layer things on top of each other (ZipStream on ChunkStream on CryptoStream) then you will have an absolute nightmare holding onto bytes not consumed by the various layers as they reach the end of their roles. As the consumer of this API, everything you do is difficult and likely to be bad.

There is no scenario, ever, where Read is truly easier for the API user. The single call vs the 2 calls may seem like a "lighter weight" API, but this is just going to lead to poor performance, unnecessary copies, cache thrashing, and even lost data and crying users. Always. Every time. Everywhere. To the point that the Mozilla guys failed to fix the bug for a decade.

Sure, it's possible that superhuman developers will not fall into this trap. I admit I fell into this trap at least a few times when I was a junior developer. I bet everyone reading this forum did at one point or another.

Meanwhile, the BufRead/Read2 style of API design allows the system with the knowledge -- the platform I/O library -- to make the judgement call of the best buffer size. The user can provide a minimum and allow the platform to provide that plus a best-effort extra on top. The best effort can dynamically grow to be the entire file if mmap is available. Or... most of the file if mmap is available and the platform is 32-bit. The API user can then wrap this in something consuming the input byte-by-byte such a decompressor and not have to worry about the number of kernel calls. Similarly, the default non-tokio version can still use async I/O behind the scenes without the consumer being forced to use an async API themselves. It all just... works by default, as long as it is the default.

3 Likes