Managing Large Data between Memory and Disk

Hi,

I am coming to Rust from high level languages and hence might be overthinking this (or not :slight_smile: ).

I expect to have a big data structure (that represents a geographical map) with details etc. it could very well be bigger than the available memory on the machine (hence the disk requirement). The application needs to be able to load this map in pieces and allow the user to work on the currently loaded segment and then write back the segment back to disk (with any changes) before loading the next one.

  • Is this a relatively common task that has a standard solution ? (maybe not with a map, but with any data structure in general)
  • Are there APIs/functions/crates that can help me do this easily ?
  • Any advice/gotchas here ? (e.g. I read here that I shouldn't be reading and writing to the same file, etc.)

Thanks

There are others that have done similar things before you. Many databases do something like this. A very interesting tool for this kind of things is a memory map (e.g. memmap), which lets you access the contents of the file as if you had it in memory, without actually storing all of it in RAM. You can read about a project using memory maps here.

Regarding reading and writing to the same file, you are probably forced to do that if the file is very large. The main other alternative is to split it into many smaller files. One disadvantage of a single large file is that you can't "insert" bytes in the middle in a way that extends the length of the file without copying everything in the file after the insert to the right. If you want to keep it in one file, you need to be able to perform your modifications by either overwriting ranges in the middle of the file, or appending to the end of the file.

1 Like

Thank you !

Just use a database, really. SQLite (by means of the rusqlite crate) is a good default choice. Don't try to (re-)implement a database from scratch using memory mapping – it's a lot of hard work.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.