I just had this idea for a memory allocator that would allocate memory from large static array, and the array can be saved to a file and reloaded the next time the program runs. So there would be some assumption that the address of the static array does not change when the program is run again, is this valid? Or is there another way to get a chunk of memory at a fixed address?
It would (ideally) need to interact with the operating system a bit similar to memmap, so pages are loaded on demand on a page fault, and only modified pages are saved.
I don't believe so. Address space layout randomization means that your binary (and therefore any static variables it contains) will be loaded into a random location in memory, whic is a pretty standard security feature provided by the kernel. Imagine an attack where I craft some machine code which loads a string (e.g. "rm -rf /\0") and jumping to the address of libc's system() function. With ASLR, this attack isn't possible because the address of system() isn't known ahead of time.
I'm pretty sure you can actually ask the mmap syscall to put memory at a particular location.
The mmap function signature is: void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
So just pass a non-null addr and a block of memory should get mapped to that location.
Note that due to ASLR (previous reply) you cannot put something like Box<dyn Trait> into that array; after reload, saved vtable pointer will point to garbage.
UPD: the second problem you'll run into is uninitialized memory (such as in padding of (u16, u8)). You'll be pretty much forced to mmap that file, since <File as Write>::write wants the full slice to be initialized, or allocate only bytemuck::Pod types.
Ok, I notice: "Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities."
But Rust mostly solves the issue of memory corruption. Still, it seems it would not be straight-forward due to ASLR.
If I understand correctly, functionally it becomes like a binary database or swap memory. This is interesting because, besides being persistent like a database, it can be used to create applications that will start using the disk if they hit the predefined real RAM limit, allowing heavy applications to run on devices with minimal RAM, eg AI model. The advantage over the OS's built in memory swap feature is the ability to implement custom algorithms that are more performant for the use case
Rkyv uses relative pointers to mmap binary data to a file, rather than using fixed addresses. Perhaps that technique could be used
Hmm. As someone's who just spent the last few days writing one's own GC, this doesn't strike me as an impossible task. Provided: 1) you rely on an index-based allocation, rather than hard-coded static addressing; 2) you forego any and all trait objects (Box<dyn ...>); 3) you implement the [de]serialization at startup/shutdown; and 4) you don't expect/require the pointers into the heap post-reload to match exactly those pre-reload. Such a constraint would never fly.
The part #3 might be the trickiest one. How do you imagine yourself interacting with the reloader? If the first thing you plan to do in main is to check for your offloaded static memory file and load it as necessary, you'll need a separate piece of code that does just that. If that piece itself happens to rely on the heap, once it's done you'll be effectively swapping your entire heap in one go.
Suppose that's done and over with. How would you branch on the existing/non-existing execution paths? If you did happen to fill in the heap from a file, you'll need some way of figuring out which indexed chunk on the heap belongs to which data type; before any fn on_data(data: ?).
If you have an exact structure in mind, you might be better off hard coding one single enormous struct Heap, itself possibly global/static behind some LazyLock; containing each and every field matching each and every piece of data you'll be relying on at runtime. When serializing, dump it all as one giant #[repr(C)]-compatible [u8] slice into a file; when reloading put it back into place and use it through your code as-is. If you don't know the exact structure, where exactly are you going to be branching to? Suspension at an arbitrary point in the program means suspending the entire stack, as well as saving the current ISP. Both will have to be offset by whatever origin your OS decided to assign them when loading your executable into memory pre-runtime; and you will need some way to fetch all of the stack frames pre-suspension, for your caching as well.
Doing all of the above correctly is one hell of a challenge. Since the byte ordering is machine specific, your heap caching will be inherently local. Some of the aforementioned bits might be outright impossible to do in some OS's, if I remember my security basics right. So, there you go.
Yes, the idea is you could use standard fast data structures such as HashMap and BTreeMap ( perhaps with "repr(C)", so the layout is stable across recompiilations, although I guess vtables are not ever going to be stable) to have some kind of in-memory database, but due to being allocated from "Saved" memory, it will still be there when the program is stopped and restarted.
It appears that ASLR can be disabled for an individual executable on Linux at least, so maybe that isn't an insuperable problem.
There are time travel debuggers, like rr, which are based on the principle that a single-core program behaves the same way if all the syscalls behave the same. To make this work, they need to map executable images and allocations at the same address every run. So it's clearly possible.
It is hard to write a memory-mapped file to disk without synchronization. It does not make a good allocator in the sense that you need to synchronize allocation and deallocation against the (very expensive) file writes. Consider that during recovery you will need an indication and ID system to determine which allocations are valid and which allocations belong to what 'value' you want to recover. In particular for mmap you are at the mercy of an unknown page caching mechanism with very uncontrolled write back rules, which is a hazardous environment to slot any atomic commit mechanism into. Anyways, already investigated this problem space a bit and find being able to snapshot known-good configurations with an out-of-process spooling and recovery system seemed more promising:
Edit: but really that is for the sake of playing around. Using an in-memory database like sqlite is probably better in any production scenario.
LISP has something like this. Basically while you're in the REPL you can do something like "save everything in memory right now to a file" and you can restore the REPL state later. You can write whole programs this way, since the saved state includes both code and data.
There's also WebAssembly, where all pointers get compiled down to what are effectively relative indices.
Another possible strategy is traversing over everything in the allocator and re-writing all the pointers when storing/retrieving from disk. You'd need a way to find pointers in the allocated objects, which you could do by requiring them to implement an unsafe trait. I think there's a Rust GC implementation that does something similar.
It can also be disabled per-allocation via the first argument to mmap/VirtualAlloc, although you have to be lucky enough not to collide with an existing allocation (a decently safe bet on 64 bit systems).
Just a remember, I think disabling security mechanisms is not good. Because it creates entry points for vulnerabilities. I suggest an alternative approach. I'm also trying to create this allocator right now. I name it SwapAllocator :]
I wouldn't say that is exactly true in this case. Suppressing ASLR means it might be easier for an attacker to exploit a vulnerability ( an error in the program ), but it doesn't create the fundamental vulnerability, which would be some kind of Undefined Behaviour ( typically a buffer overrun ).
The only way to have security is to have a correct program. Rust helps a lot with that, ASLR not so much in my opinion ( when you are using Rust, for a C or C++ program it could be justified ).
I meant, that facilitates a vulnerability entry point, because what would have been stopped by ASLR is no longer halted. It is very easy to introduce mistakes in unsafe Rust
Stick to safe Rust then, that prevents the buffer overruns.
ASLR doesn't fix buffer overrun security issues, it only may make it harder to exploit the issue, like having a speed-bump to slow down burglars after they already ransacked your home
An exploit is still perfectly possible. If anything ASLR might give you a false sense of security.
In the real world, bad actors tend to exploit human mistakes ( phishing, impersonation, things like that ).
what's the point of doing that instead of just loading/creating a file a startup and then just writing data inside it normally?
also general advice do not ever blindly trust data loaded from disk to match the memory layout of your types. doing so will just inflict you pain when changing your code and make you vulnerable to attacks
Not everything can be done in safe Rust. The allocator you're about to make, even before the application itself, already uses unsafe, eg memmap is unsafe
I made it clear previously, that's what I refer to. You are disabling a layer of security for something that isn't really worth it, which could be done without disabling that security layer. On one hand, you want security by using Rust, but on the other hand, you don't want security by disabling security layers, isn't that a contradiction?
ASLR stops attackers from reliably jumping to a specific function or shellcode in memory, which prevents the impact from being fatal and highly damaging if something unwanted happens
I am not about to make this allocator, I am just thinking about whether it is possible, and whether it could offer benefits, such as superior performance.