Write partialy struct on disk

Hi !

I have a very big memory object (which I want to keep in memory), like:

#[derive(Deserialize, Serialize)]
pub struct Tiles<T: Entity> {
    items: Vec<T>,
}

I use bincode to write/load it from disk. But when I modify a little part, like:

this.items[42] = MyType;

When I want to save it, I have to write it entirely on disk. Does exist a crate or a way to write only the changed bytes on right place on the disk ?

It is not a pure Rust question. But, I'm confident as Rust permits a powerful way to do it :slight_smile:

When large amounts of data need to be changed on disk in pieces, use a database and store the pieces as rows. Sqlite is a good one to start with if you haven't used them before. This applies to almost all languages, since programming languages very rarely have a built-in database. There are many Rust crates for accessing databases.

6 Likes

Hi @jumpnbrownweasel,

I used SQLite in previous project. I thought of it before, but I need the best time access as possible to my data.

I know that SQLite is very good in terms of performance, but I'm making the assumption that it can't be better than an in-memory vector. Maybe I'm wrong? Or that there's no middle ground anyway?

If you read the data from Sqlite you can still create an in-memory vector, if that's what you think is necessary to get the performance you need.

The point of using a database is being able to update it on disk without rewriting the entire file -- that's what you said you wanted.

If you don't use a database for that, you could split it up into multiple files so you can write smaller files. When writing smaller pieces, you will have to serialize each part separately -- this is true whether they are separate files or separate rows in a database. How to separate it into smaller pieces is the thing you need to figure out, since how you do this is application specific.

2 Likes

Have you considered mutably mmap-ing the file into memory? There appear to be a lot of crates that offer help with that (I have not used any of them).

Yes you are.

First, SQLite (as almost any serious database) has in-memory caching built-in.

Second, even if it didn't, you could make it yourself.

You can always just collect a small diff of the most recent changes and issue the corresponding queries (while potentially keeping a separate copy of your entire data in memory, if you wish.) How do you think the frontend of a web application manages to update the state without having to store and re-write the entire (potentially multi-TB) data every time?

I don't know mutably mmap-ing, I will take a look thanks.

I know usage of a database. And, by my experience of them, I know they can be very performant. But, I also know that a serialization step is required for input/output. Which will have performance impact.

I will make some tests thanks !

Aren't you doing serialization with bincode anyway?


Another thing you haven't mentioned is whether you need to be able to read the data from disk after a crash. If so, you definitely need a database for that. If you crash while writing data, data on disk may be inconsistent (partially written). Recovering from that situation is the problem that databases and transactions are made to solve.

2 Likes

My critical performance point is read access of data. So I don't have serialization when my data is in RAM.

You have a good point about data consistence when disk write. I will consider the performance benchmark about read access and consider keeping a separate copy of my entire data solution.

Again, caching. That was the whole point of my previous post.