Here is what I understand you want to achieve:
- When the game is started, the full region state is initially on disk.
- The game engine loads region state from disk whenever it feels like it.
- Region state should be prefetched it before it is actually needed, otherwise the game will need to block on data load, which is Very Bad(tm).
- While the player is inside of a region, it may (and will likely) alter the region state.
- When the region is not needed anymore, and unlikely to be needed in the near future, the game writes it down to disk if it has been modified, then discards it from RAM.
- The region state should not be accessible to the main thread during this process in order to avoid data races.
- To avoid blocking the main thread, you want to offload region I/O to a dedicated thread.
From this perspective, I find it useful to think about regions as state machines which can be in at least 4 different states (which the RegionManager API could expose as an enum).
-
Saved: The region is only on disk, and there is no outstanding I/O request for it.
-
Loading: The main thread has requested a region load, which is in progress.
-
Loaded: The region is in RAM and usable by the main thread.
-
Saving: The main thread has requested a region save, which is in progress.
I think this is the minimal number of states which you can afford to use, because if you have only Saved and Loaded, then it becomes hard to avoid duplicate or contradictory I/O requests from the main thread.
As we will later see, you can make the region manager more efficient in various ways by adding more region states.
From this state machine based perspective, the job of your region manager is to hold the active state of each region, and to handle the following game events originating from your game threads:
- Prefetch: Main thread determined that a region will be needed in the near future
- Access: Main thread needs to access a region now (player interacts with it directly)
- Discard: Main thread determined that a region will not be needed in the near future
- LoadFinished: I/O thread has completed a load
- LoadFailed: I/O thread has failed a load
- SaveFinished: I/O thread has completed a save
- SaveFailed: I/O thread has failed a save
At a minimum, the I/O thread must support receiving load and save requests, and sending back notifications when a load or save has completed or failed.
Here is one possible strategy, which I think is not too far from what you described above. Notice that there are some blanks that we need to fill in:
- Prefetch handler:
- If the region is Saved, send load request to I/O thread and transition to Loading.
- If the region is Loading or Loaded, we are already doing the right thing.
- If the region is Saving: ??? NOW WHAT ???
- Access handler:
- If the region is Saved, the prefetcher has screwed up. Send a load request and...
- If the region is Loading, the load was not fast enough. Wait for Loaded state.
- If the region is Loaded, the game can proceed normally.
- If the region is Saving: ??? NOW WHAT ???
- Discard handler:
- If the region is Saved or Saving, we are already doing the right thing.
- If the region is Loaded, send save request to I/O thread and transition to Saving.
- If the region is Loading: ??? NOW WHAT ???
- LoadFinished handler:
- Debug assertion: Should be in the Loading state.
- Transition to Loaded state and notify the main thread in case it's waiting.
- LoadFailed handler:
- Debug assertion: Should be in the Loading state.
- Transition back to Saved state and ??? NOW WHAT ???
- Note that the main thread may be blocking on this load!
- SaveFinished handler:
- Debug assertion: Should be in the Saving state.
- Transition to the Saved state and discard the region.
- SaveFailed handler:
- Debug assertion: Should be in the Saving state.
- Transition back to Loaded state and ??? NOW WHAT ???
The first series of ??? NOW WHAT ??? require some kind of cancelation support to be handled efficiently. Unfortunately, not every platform will support disk I/O cancelation, and it will be faillible even on those that do because your cancelation request may come too late.
If cancelation is not supported, or if it fails, you will probably want to schedule a re-load/discard to be executed after the ongoing save/load has completed. Not only is this inefficient, you will also need to think carefully about how it interacts with the state machine model above.
You will also want to setup some kind of mechanism to prevent the game from entering a rapid succession of prefetches and discards, which would cause lots of cancellation and hurt performance. For example, if your prefetches use a "load the regions next to the player" logic, then you will want to make sure that discards only occur when a region is three neighbours away from the player. If you discard when a region is only two neighbours away, then Bad Things will happen when the player is sitting on a region boundary and a region constantly alternates between being a nearest neighbour of the player and being two neighbours away.
The second series of ??? NOW WHAT ??? concern I/O errors. One possibility is to abort on errors, which is ugly, but simple, and may be a good bet considering that you are writing a game (whose continued operation is not critical) and relying on disk I/O (which fails rarely). If you decide not to do so, then you will need to find a way to un-block the main thread when it is blocking on a load that failed, and also to figure out what should happen then.
If I were you, I would probably just abort on disk I/O errors in this case.
Now, onto some other topics which you may want to think about, and answers to your earlier questions.
If a region has been loaded, but not modified, you do not need to write it back to disk and can discard it immediately. This can happen when, for example, you prefetch game regions, but the player ends up changing her mind and not going there.
If you think that this case is worth handling, you can do so in the state machine model by splitting the Loaded state into a LoadedClean and LoadedDirty state, where one transitions from Clean to Dirty during region writes. Clean regions do not need to be written back to disk and can be discarded from RAM directly.
Backpressure is a feature which aims to correctly handle the scenario where the main thread sends much more requests to the I/O thread than it can handle. Given your description above ("writing a couple dozen files once every few minutes"), it does not seem to be an issue, unless your writes are large.
I agree with you that Tokio may not be the right fit for you here, given the extra constraints that you have provided. Since your problem seems to be common in game development, another possible direction to look at is existing Rust game engines: how do they handle state loading and saving? Do they, perhaps, already provide a building block which is close to what you have in mind here?