Persistent data of a relatively complex type


#1

I have been looking at a way to persist data across sessions that is held in a non trivial type (i.e. not just &[u8] and have this updated in as much real time as possible, without a performance penalty (or at least minimal). To do this I have been looking at mmap as this seems obvious.

So for example, an mmap crate, say mmap and an example type

struct AType {
data : Vec<AnotherType>
}

struct AnotherType {
one: [u8; 32] // with type alias 
two : u64
three: Vec<SomeOtherType>
.....
}

So the issue is to allow access to a mmap holding such a type in an automated manner. It quickly seems to become a non trivial issue. If it was a simple type like a huge string ot bytes then it seems simpler. However, it’s still a bit unclear really. When you have a vector for instance I assume the container holds pointers on the stack and allocates and puts value on the heap. If mmap was just like memory then it would need to manage that as well. There comes the problems, unless I have strayed way of track (very possible).

ofc I could serialise and write to such a file, but that seems to be the opposite of what I am looking for.

So the question (at last) is how do you really use mmap for complex types?

Added to this confusion (I am sorry), is the mmap itself, I note leveldb for instance switched from mmap to plain write out to files due to bugs in osx :frowning:

So next question is: In rust at the moment what would be the most efficient way to persist and update data in real time for such a container while maintaining as close to RAM speeds as possible. A good place to get some further insight is (as usual) @BurntSushi excellent works in finite state transducers with a really good blog post to accompany it.

Sorry for the tombe here, I hope you can help.


#2

First mmap can’t really work on complex types, or at least any structures that use the heap. Vec is going to go to use the heap to hold the list of objects. And the heap is outside of your mmap space.

So to do this you would need to create an allocator on top of the mmap space so that you can allocate space for any objects you want to save in the mmap. You also have to remember that any objects referencing another object in the space needs to do it by mmap offset rather than memory location. This is because the file may get mapped to a different location in the virtual address space each time you run the program.

This means if an object you are persisting needs a Vec you need to create a mmap version of the vector that allocates it’s array inside the mmap space and saves the offset of it.

My worry about all this would be abnormal termination of the program and what happens, what are you guaranteed of getting written to disk, what may be lost? I feel like you would be calling whatever the flush equivalent is ALOT to ensure that important changes get written to disk to avoid corruption of the whole space. i.e. your vector reallocates it’s list of elements, new elements are added, the application crashes/computer shuts off/there is a panic, did the vector get written to disk? Is it pointing at the new or old array? Was the new object that was added written to disk? Was the change in the vector size written to disk? For those last 2 you hope the answer is either yes or no for both, if one is yes and the other is no, you are in trouble.

For your second question I don’t know exactly what you are doing, but if I needed to store objects on disk every time they changed and wanted to avoid slowing down the main application with this, I’d probably create a “writer” thread that the objects get queued to be saved to disk in the back ground. This background thread would spend the time serializing the objects so they could be saved to disk.

The key thing to remember is order, you want to make sure in an abnormal termination the state on disk is still readable. So if the memory representation of the objects are immutable or easily cloneable it would be the easiest to ensure this. You would just add the objects into the write queue or a clone of the object into the write queue.

You also might have to deal with "checkpoints’ to mark when the disk state is good and when it’s not. Consider:

A references B and B references A. And you want to change this to A references C and C references A and B references nobody. You disk state isn’t “stable” until the new versions of A, B and C are all committed to disk. If something happens before then your disk file is holding a bad state. Depending on your objects you may be able to structure them and change them in a way to avoid the checkpoints, I don’t know. But it’s something that needs to be considered carefully.


#3

This is my current thinking. Create smaller blocks of the structure on disk (say every X number of valid entries) and hold these in a directory via a writer object at least (some buffered writer type). Also hold indexes to point to good state blocks.

At least something like this with a sort of transaction to only commit to the index when a block is good? I suspect looking more into leveldb etc. may be worth some time.

Although in saying that I suspect a vector with an allocator similar to boost interprocess may be worth looking at. It does leave the crash / state question though, as you point out in the first part of your answer. Also the osx issues and why grep leveldb etc. moved away from mmap files.


#4

The directory idea with good state blocks sounds good.

And I would look into exactly what issue leveldb encountered with mmap on OSX, maybe you can live with it, maybe not.

I was going to say you should track down what guarantees each OS you want to support gives you in the case of an abnormal termination, however thinking about it, if you want to handle the “power outage” situation you pretty much have to assume you could have partial writes between any flushes. Which on one hand seems complex, but on the other is probably a “solved” problem, I wonder is someone has already designed a general algorithm/process for that situation.