I'll be storing data in a GB file where every X KiB would represent a "page". The memory manager will dynamically load and drop these pages to/from RAM and perform related operations. I have a general understanding of memory paging and how e.g. databases do file-based reads and writes. However, I get indecisive on these lower-level operations in Rust when I start translating the theory into practice.
Q: So my general question is how file-based (custom) memory paging should be implemented in a Rust application so that the application would be able to control what and when gets loaded/evicted from actual memory, when fsync should be triggered and to bypass OS's virtual memory/mmap. Here are some ideas:
I could pretend that Rust itself is magically smart enough and would simply use a HashMap<PageIndex, PageSrc>, load data through read_at and add some eviction mechanic on top, but I have a bad feeling already :). Would repr-align help here?
I could use objects provided by crates x86_64 and aarch64 written by the OSDev and HermitCore communities. What about other platforms?
In the case of WebAssembly, paging should probably be provided by the runtime itself and you would then operate with pages through exposed functions (e.g. read_page, write_page) - linear memory doesn't feel right.
It doesn't directly answer your question, but I'd recommend reading about how you can implement your own allocator in Rust. The Writing an OS in Rust series has an excellent explanation.
Later on it goes through paging and its implementation. Those pages aren't really necessary for what you are doing (you can just ask your OS for pages, but they are implementing the OS), but it's good background information.
After that, they build their own global allocator on top of the paging interface that was just created. This is very close to what you are trying to achieve, except where they request pages of memory from their FrameAllocator, you'll use your own file-based implementation.
You don't necessarily need to implement the GlobalAlloc trait and register your allocator as Rust's global allocator, though. If you want to use the allocator for just part of your application, consider implementing the (nightly) Allocator trait and using constructors like Vec::new_in() to create specific Vecs (or Boxes, or whatever) backed by your allocator.
Just think of core::arch::wasm32::grow() as allocating a new page (or pages) that have been placed at the end of your current address space.
That's effectively what happens, except instead of your system's virtual memory manager mapping that part of virtual memory to some physical memory, it's your WebAssembly runtime doing the mapping.
The Introduction to Paging was actually the only blog I came across that goes deeper on this topic. Nobody seems to be dicing with these questions. This definitely adds a few points to my skepticism regarding the path I should take. It does feel like I try to maybe over-engineer something that Rust already provides. It's just not completely clear if it does or how it does it. All these low-level questions get me in the search loop and days are passing by :).
OK, so after some more thinking and a decent sleep, I got to conclusions that I'd like to share here and I hope will potentially help others who seek similar answers:
A. I find number #2 a little bit too heavy since going this path basically means bypassing Rust's and OS's features and thus doesn't make much sense running your app in the general purpose OS after all. I assume that the benefits are not worth the time invested. I also assume that OS itself will optimize memory management based on the hardware below and thus it shouldn't be the domain of a developer. Well, I guess it depends, but I'm close, right?
B. Regarding #3, since 4GB is the biggest size for 32-bit pointers and this is what WebAssembly currently supports, and 64-bit is still in the making, I find these limits unsuited for running "serious" services (e.g. database) within WASM. It's better to provide a streaming API tailored for that purpose.
C: And finally #1. Rust's memory management is pretty awesome and I must say that working on so many different things makes you forget about all these tiny details that are only conditionally important. You forget most of them at your first read :).
I could force myself to "correctly" write Rust code and thus "correctly" manage what is on the stack and what is in the heap. This will allow me to eliminate random data placements and thus read data in sequences which is fast. Also Rust automatically manages memory alignments which means that, potentially, I don't even need to use repr indicators anywhere. With these, fast-memory-related questions are now answered, I guess.
What about the so-called non-volatile storage? I realized that my indecisiveness mainly came from the layer between Rust and the OS (Unix in my case). The interaction between the file system and Rust's memory layer is sort of hidden from us or at least I can't find many in-depth discussions on that online - the source code is your friend :). Anyhow, accessing data on disk means working with basically 3 types of pages - hardware, OS, and application. Since we "know" how to manage application pages the "right" way, we just need to bypass the OS and write directly to the hardware. I see that std::fs::OpenOptions allows you to pass custom flags - O_DIRECT in that case - on how to access files. With that, I can bypass the OS's cache and make writes faster. This is also now resolved, I guess.
Do you think my claims are correct? Could I do it even better?