Why using the read_lines() iterator is much slower than using read_line()?

I'd say the "more efficient" bit is because if I've got a 10GB file, reading it entirely into memory would require about 10GB of ram, whereas using for line in buf_reader.lines() and throwing line away at the end of each iteration will use memory proportional to the largest line.

So what we're seeing is that you have multiple approaches, each with their own strengths and weaknesses.

Method Convenience Access Memory Usage
std::fs::read_to_string() It's just a string Random access Allocates a buffer the same size as the file
buf_reader.lines() Iterator of strings Streaming, can hold on to strings for random access Allocate one string per line
buf_reader.read_line(&mut buffer) Manual buffer management Streaming only Single buffer the size of the longest line
mmap You can get a &str, but it's unsafe and platform-dependent Random access zero[1]

That table isn't perfect, but you can see there's a general tradeoff between performance and convenience, with a naive std::fs::read_to_string() being the most convenient, and memory mapped files being the most performant - especially for larger files.


  1. From the perspective of your program and OOMs. The OS will manage paging parts of the file into and out of memory for you. ↩︎

8 Likes