Question on performance of Seek trait


#1

Hi there, after more than a year using Rust, I just realized we have a Seek trait (what a day xD), and I want to use it to read files and skip some data. I’m using a BufReader for that, and my question is, if I read, for instance, 10 bytes in a file, and I seek(SeekFrom::Start(15)), will the BufReader try to reload the file from start and skip 15 bytes, or will it simply skip 5 bytes more?

In the latter, it would be the same as doing seek(SeekFrom::Current(5)), which would make more sense than loading the file again. Is that the case?

Thanks!


#3

From the docs of BufReader:

Seeking always discards the internal buffer, even if the seek position
would otherwise fall within it. This guarantees that calling
.unwrap() immediately after a seek yields the underlying reader at
the same position.

Other than that, it just calls seek on the underlying reader.
It’s entierly up to the underlying reader how seek is implemented. For a File I’d expect it to call something like libc fseek or similar.


#4

I see that is using lseek64(). I haven’t been able to get information on its performance. Does a read suffer too much if that is being called sometimes?


#5

If you do it often it will probably hurt a bit. But not because of the seek but because the internal buffer of BufReader is discarded every time. AFAIK the buffer is 8K big. That means an 8K read after every single seek. That seems excessive if you only need a few bytes.

Personally, I don’t agree with the reasoning for why BufReader::seek behaves as it does, but that’s unlikely to change. A BufReader with better support for seek would probably be a major win if you seek a lot.

I’ve also seen cases where seek is horribly slow, especially in combination with writing, not so much with reading. But this was on IBM AIX with JFS file system (I think the problem is the journal/log).