Is std::fs::Seek O(1) or fs-dependent?

Consider the following statements:

  1. there exists a file-system / OS / cpu-config / ssd-brand combo where Bitcask can not make useful O(1) time start read/write guarantees

  2. there exists a file-system / OS / cpu-config ssd-brand combo where Bitcask can make useful O(1) time start read/write guarantees

You have definitely proved #1 by providing a concrete example. It sounds like you are trying to argue that #2 is impossible, which I think is very difficult to do.

AFAIK, #2 is possible, but not practical to pursue in general-purpose system. It would hold if the system is specifically designed to run only in hard-real-time contexts, but in other cases is just doesn't worth the time and so would probably not be achieved deliberately.

In order for this to be productive I think you need to be more specific about what a “useful guarantee” is.

It seems to me that Bitcask makes a bunch of useful guarantees - that writes are append-only, that reads require exactly one, contiguous read(), and so on. Those are O(1)-type guarantees. They're not wallclock time or predictability guarantees, but that's not what O(1) is implying anyway.

If you want a wallclock guarantee, that's a property of the system, of which Bitcask is only a small part.

4 Likes

The issue is not the question but that I posted to the wrong forum. In a forum dedicated to using low-latency KV stores, the community would be focused on "under what conditions this could work", but here, the focus seems to be "pick extreme outliers to prove it can not work in general."

The focus is to read the question literally, I think, without trying to guess the context. Of course, in KV-stores community the context would be already provided.

2 Likes

Proving this in the positive requires only one configuration. Proving this in the negative requires accounting for all possible configurations; yet, for whatever reason, the focus here is on the latter.

Expanding a bit on @Cerber-Ursi 's point: Your original question, as reflected in the title of the thread, is asking whether this can be made to work in general. If instead you had titled it something like "How to achieve bounded disk seeks," I suspect the conversation would have gone differently.

1 Like

Ah, I think I see the issue here now. Somewhere in this thread, say at Is std::fs::Seek O(1) or fs-dependent? - #13 by RAOF in particular with

I saw the original question as having been resolved and the discussion shifting towards:

I guess people skipped that message.

Hmmm.... If we want to endure anything about anything we cannot rely on assumptions.

I can sense where the confusion lies in the question.

Bottom line is that no application can make guarantees about anything. One has to take into account the underlying operating system and whatever resources it is using.

After all, simply getting from one instruction to the next in any program under most common operating systems has no upper bound on the time it may take. What if an interrupt occurs? What if the processes is suspended? What if the machine goes to sleep?

I think the gap between the responses here vs Rust in low latency environments shows that:

  1. it is very hard to engineer stuff

  2. it is very easy to find an edge case to argue about (i.e. impossible to make that guarantee because, what if, world goes into WW3 and nukes all the CPU production facilities)

1 ) If the seek is in the current loaded page in memory, it's just updating a pointer, that's O1.
2 ) Then you go to the file system that looks up in block cash, and if your in there, it's a pointer on another page. Still pretty much O1.
3) Block cache hit fails and you need to go to disk to load the next block. This may trigger a page flush to get space in the block cash, in which case your waiting for writes to disk for dirty pages, then you need to read the disk to get the new page. This hits the physical disk cache on the SATA drive, where the new page may be present or not, so you wait for disk to cycle it's cache.
Assuming an SSD there's no physical movement to wait for, but the bad block detection and recovery algorithm, is not O1, and that randomizes where data is on the disk, and that can interrupt the cache refresh in the SSD firmware.

And to address your original query, yes the filesystem matters as well, as each block on disk has pointer to the next block on disk. But how contiguous the blocks are, and how well they translate to pages in memory depends on the type of filesystem. Most modern filesystems are journaling, so they actually have two strings of disk blocks, one for committed writes, and one for dirty blocks that have been written to. Figuring out if your reading a dirty or committed block, and traversing those pointers is not O1 in most cases.

Back in the day we might expect seek() to wind a tape forwards or even backwards until it got to the right place. That might take a minute or more. Then subsequent reads would be much faster.

Point is, the moment you call out to the OS you cannot make any guarantees about throughout or latency.

But as I think I said earlier, on modern CPU's and operating systems you cannot even make any guarantees of how long it takes to get from one instruction of your program to the next.

If you want to make real-time assurances you need control of the whole stack from application, through OS to hardware.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.