I want to bypass the os in-memory cache for io operations and do direct io. I checked some code from ChatGPT and other forums and found that setting libc::O_DIRECT
in OpenOptions
might be the answer but this does not exist for the version libc = "0.2"
. Can someone guide towards what's recommended way of doing direct IO in rust
Have you checked the docs? It exists just fine.
Unless you know what you are doing, you should probably not use O_DIRECT. Mixing O_DIRECT and regular accesses or mmap on the same file is likely to result in corruption, as does resizing a file or allocating not yet allocated blocks in the file for a while that is also accessed with O_DIRECT as O_DIRECT doesn't enforce cache coherence with metadata changes and regular reads/writes. It also puts restrictions on the alignment of your reads and writes. See also Clarifying Direct IO's Semantics - Ext4, linux - What does O_DIRECT really mean? - Stack Overflow and (for a rant why Linus dislikes O_DIRECT) https://yarchive.net/comp/linux/o_direct.html.
It wasn't specified which platform you expect it to exist on, but O_DIRECT
is not available everywhere. It doesn't exist if you are building for macOS, Windows, etc.
I can see in docs but I am seriously not able to import it for usage. It keeps on saying cannot find value
O_DIRECTin crate
libcnot found in
libc``
Actually I want a fine grainer control over the file because I am implementing a database and don't want to rely on the OS caching
The main OS on which I'd like to run it will be Linux and Mac
What platform (target) are you building? It is not present on MacOS as @parasyte said. Search for F_NOCACHE
to see how to bypass the fs cache on MacOS (I haven't tried it).
Even on Linux it must be set in FileOptions using a platform specific OpenOptionsExt.
#[cfg(unix)]
pub mod file {
use std::fs::OpenOptions;
use std::os::unix::fs::OpenOptionsExt;
pub fn open_direct(opts: &mut OpenOptions) -> &mut OpenOptions {
opts.custom_flags(libc::O_DIRECT)
}
}
Keep in mind that on Linux O_DIRECT
shouldn't be read as "bypass cache" -- it's more low-level than that. O_DIRECT
basically means "set up a DMA call to perform this read/write operation". If you just use O_DIRECT
and treat it as performing regular reads/writes you'll very quickly end up with broken files and partially read buffers.
You'll need to pair O_DIRECT
with O_SYNC
to get the operations to block and wait for completion before returning.
Also; O_DIRECT
is a platform-specific extension, it can mean different things on different platforms.
While O_DIRECT was introduced for databases, in the linked thread Linus said:
Side note: the only reason O_DIRECT exists is because database people are
too used to it, because other OS's haven't had enough taste to tell them
to do it right, so they've historically hacked their OS to get out of the
way.
Elsewhere he suggests using vmsplice/splice to move pages from/to the page cache without copying and then using madvise/fadvise to hint the kernel into reading the page from the disk into the page cache when reading and flushing it to the disk without leaving a copy in the page cache when writing. This way the page cache still handles correct synchronization wrt other operations on the same filesystem without affecting performance much.
There is also currently a patchset bejng reviewed to add a new preadv2/pwritev2 flag (RWF_UNCACHED) to hint that the data shouldn't be cached, but still use the page cache for synchronization: [PATCHSET v5 0/17] Uncached buffered IO - Jens Axboe
thanks @bjorn3 for this information which I didn't knew! I will go through the link you shared. My major concern was not persistence or syntonisation but to keep the pages under my control and implement that like how I want it to be. I have a rather complicated merging process which requires reading data block by block. So I wanted to directly get the block in-memory and just use reference to the slice inside it which extra copying from page cache to my program and vice-versa. You happen to know is there any sensible way of doing this in Rust for mac and linux ?