I'm curious about BufWriter

On a basic level I understand the purpose and advantages of using BufWriter. When we have some data being frequently written to a file or something we can throw it in the buffer and then write in larger blocks. A few things I want to know are:

If we just create a BufWriter::new() with default params (8kib size) and we begin writing to the buffer, when it hits 8kib does it automatically flush (e.g. write to file)?

Is that what flushing even is?

Are there practical limits to size? 8kib seems like a decent size, but what if I wanted to have a 5mb buffer for example? (I'm aware the docs say its better to not use BufWriter for large writes)

Thanks

There exists BufWriter::with_capacity().

It will invoke Write::write on the underlying writer, yes. It won't however call Write::flush on it if that's what you mean. This will make a difference for some writers, though at least on Windows/Linux it shouldn't make a difference for File since its Write::flush is a noop on them.

Not really, except for the fact that you'll get diminushing returns the bigger the buffer is, with the downside of increased memory usage.

2 Likes

BufWriter exists to amortize costs of small writes. It doesn't apply to large writes because those large writes end up being throughput-limited in the underlying storage and would cost an extra memory copy on top (from the written slice to the buffer, from the buffer to the underlying writer).

So if you're only doing large writes you don't need it. If the underlying writer is very cheap to call (perhaps because it has its own internal buffer) you also don't need it.

flush should have the semantics of turning any internal writer state into actual IO where possible. E.g. if you have a chain of compress (internal buffer) -> encrypt -> buffer -> File then it might finish the current compression block, encrypt that (assuming it's some stream cipher) and flush the buffer to the File, i.e. pass the data to the operating system. It does not guarantee that the data gets persisted to disk, since the OS has its own buffering. It's just meant to make it leave the current process. I write "might" because in the end it depends on all the writers in the chain implement flushing.

2 Likes