Doing I/O in another thread

Say I'm making a game that has a bunch of giant cubes, called regions, made out of smaller cubes, called blocks (this totally isn't a Minecraft clone). Something like this:

type Region = Box<[Block; 32 * 32 * 32]>;

// Loads regions synchronously when asked to, and unloads any regions that have been left unused for over a minute.
struct RegionManager {
    errors: mpsc::Receiver<Error>,
    save: mpsc::Sender<(Coords, Region)>,
    regions: HashMap<(Coords, ActiveRegion)>,
    ...
}

Making a new RegionManager would spawn a new thread that listens for chunks from the manager and then writes them, and RegionManager would have a wrapper for errors::try_iter so that the calling code can get any errors. Is this overcomplicated/unnecessary, and is there a better way to do it?

I guess your goal here is to avoid blocking for I/O in the main thread. Am I correct ? Also, is this disk or network I/O ?

Anyway, operating from this initial guess, here are some thoughts :

  • How does the caller thread know when an I/O call has succeeded, and conversely how to associate I/O errors with the corresponding I/O requests ?
  • How do you plan to react to errors ?
  • How do you plan to handle backpressure when the amount of I/O is consistently larger than what the underlying I/O device can absorb for an extended period of time ?

You may or may not be interested in Tokio's new support for blocking I/O. As a network library, it has good built-in support for backpressure and error handling. However, its design is known to be somewhat hard to approach, and it may be overkill if you do not need its core async network I/O functionality.

Yeah, your guess is right, and it's disk I/O, which I heard has poor cross-platform support. This is what my plan was:

Regarding the first and second questions, a region is saved when either:

  • it is unloaded because it was unused for a period of time.
  • the caller requests a save of all regions.

I'm not really sure how the game would react to either of these erroring, besides either logging the error and moving on (which was my plan) or just crashing to avoid corruption. If it's just logging errors, I'm pretty sure the error itself is enough information, but you probably have better ideas on how to react.

Regarding the third question, I don't know what backpressure is, but is it a problem when writing a couple dozen files once every few minutes? I mainly just want to avoid lag spikes when saving (which Minecraft is notorious for.)

Also, Tokio looks cool, but it doesn't seem like it would work well with my existing game loop, and it certainly does seem to be a bit overkill for some simple file I/O. Like I said, my main concern is that the game should not noticeably freeze while saving.

Here is what I understand you want to achieve:

  • When the game is started, the full region state is initially on disk.
  • The game engine loads region state from disk whenever it feels like it.
    • Region state should be prefetched it before it is actually needed, otherwise the game will need to block on data load, which is Very Bad(tm).
  • While the player is inside of a region, it may (and will likely) alter the region state.
  • When the region is not needed anymore, and unlikely to be needed in the near future, the game writes it down to disk if it has been modified, then discards it from RAM.
    • The region state should not be accessible to the main thread during this process in order to avoid data races.
  • To avoid blocking the main thread, you want to offload region I/O to a dedicated thread.

From this perspective, I find it useful to think about regions as state machines which can be in at least 4 different states (which the RegionManager API could expose as an enum).

  • Saved: The region is only on disk, and there is no outstanding I/O request for it.
  • Loading: The main thread has requested a region load, which is in progress.
  • Loaded: The region is in RAM and usable by the main thread.
  • Saving: The main thread has requested a region save, which is in progress.

I think this is the minimal number of states which you can afford to use, because if you have only Saved and Loaded, then it becomes hard to avoid duplicate or contradictory I/O requests from the main thread.

As we will later see, you can make the region manager more efficient in various ways by adding more region states.


From this state machine based perspective, the job of your region manager is to hold the active state of each region, and to handle the following game events originating from your game threads:

  • Prefetch: Main thread determined that a region will be needed in the near future
  • Access: Main thread needs to access a region now (player interacts with it directly)
  • Discard: Main thread determined that a region will not be needed in the near future
  • LoadFinished: I/O thread has completed a load
  • LoadFailed: I/O thread has failed a load
  • SaveFinished: I/O thread has completed a save
  • SaveFailed: I/O thread has failed a save

At a minimum, the I/O thread must support receiving load and save requests, and sending back notifications when a load or save has completed or failed.


Here is one possible strategy, which I think is not too far from what you described above. Notice that there are some blanks that we need to fill in:

  • Prefetch handler:
    • If the region is Saved, send load request to I/O thread and transition to Loading.
    • If the region is Loading or Loaded, we are already doing the right thing.
    • If the region is Saving: ??? NOW WHAT ???
  • Access handler:
    • If the region is Saved, the prefetcher has screwed up. Send a load request and...
    • If the region is Loading, the load was not fast enough. Wait for Loaded state.
    • If the region is Loaded, the game can proceed normally.
    • If the region is Saving: ??? NOW WHAT ???
  • Discard handler:
    • If the region is Saved or Saving, we are already doing the right thing.
    • If the region is Loaded, send save request to I/O thread and transition to Saving.
    • If the region is Loading: ??? NOW WHAT ???
  • LoadFinished handler:
    • Debug assertion: Should be in the Loading state.
    • Transition to Loaded state and notify the main thread in case it's waiting.
  • LoadFailed handler:
    • Debug assertion: Should be in the Loading state.
    • Transition back to Saved state and ??? NOW WHAT ???
    • Note that the main thread may be blocking on this load!
  • SaveFinished handler:
    • Debug assertion: Should be in the Saving state.
    • Transition to the Saved state and discard the region.
  • SaveFailed handler:
    • Debug assertion: Should be in the Saving state.
    • Transition back to Loaded state and ??? NOW WHAT ???

The first series of ??? NOW WHAT ??? require some kind of cancelation support to be handled efficiently. Unfortunately, not every platform will support disk I/O cancelation, and it will be faillible even on those that do because your cancelation request may come too late.

If cancelation is not supported, or if it fails, you will probably want to schedule a re-load/discard to be executed after the ongoing save/load has completed. Not only is this inefficient, you will also need to think carefully about how it interacts with the state machine model above.

You will also want to setup some kind of mechanism to prevent the game from entering a rapid succession of prefetches and discards, which would cause lots of cancellation and hurt performance. For example, if your prefetches use a "load the regions next to the player" logic, then you will want to make sure that discards only occur when a region is three neighbours away from the player. If you discard when a region is only two neighbours away, then Bad Things will happen when the player is sitting on a region boundary and a region constantly alternates between being a nearest neighbour of the player and being two neighbours away.

The second series of ??? NOW WHAT ??? concern I/O errors. One possibility is to abort on errors, which is ugly, but simple, and may be a good bet considering that you are writing a game (whose continued operation is not critical) and relying on disk I/O (which fails rarely). If you decide not to do so, then you will need to find a way to un-block the main thread when it is blocking on a load that failed, and also to figure out what should happen then.

If I were you, I would probably just abort on disk I/O errors in this case.


Now, onto some other topics which you may want to think about, and answers to your earlier questions.

If a region has been loaded, but not modified, you do not need to write it back to disk and can discard it immediately. This can happen when, for example, you prefetch game regions, but the player ends up changing her mind and not going there.

If you think that this case is worth handling, you can do so in the state machine model by splitting the Loaded state into a LoadedClean and LoadedDirty state, where one transitions from Clean to Dirty during region writes. Clean regions do not need to be written back to disk and can be discarded from RAM directly.

Backpressure is a feature which aims to correctly handle the scenario where the main thread sends much more requests to the I/O thread than it can handle. Given your description above ("writing a couple dozen files once every few minutes"), it does not seem to be an issue, unless your writes are large.

I agree with you that Tokio may not be the right fit for you here, given the extra constraints that you have provided. Since your problem seems to be common in game development, another possible direction to look at is existing Rust game engines: how do they handle state loading and saving? Do they, perhaps, already provide a building block which is close to what you have in mind here?

2 Likes

Consider using a threadpool (or multiple) for I/O - the returned futures are handles the caller can use to monitor progress and detect errors. You could have one threadpool (with a single thread) per backing file to allow multiple I/O operations in parallel.

@vitalyd I am not sure if this would be a good idea in this project, for two reasons:

  • IO operations are rare (@quadrupleslap mentioned one IO op every couple of minutes), so the extra complexity of allowing concurrent execution does not seem warranted.
  • It may add synchronization complexity if the files that are read from and written to are not independent.

Wow, thank you for the comprehensive response!

  • I'll deal with load errors by adding another Bad state that's like a null-object, where it's just painted solid red or something, so the player is able to try reloading it.
  • I have an idea to avoid rapid loading and unloading, by guaranteeing that all regions are kept for at least a minute since they were last read. This also sort-of solves save errors - just emit the error message and keep the region to retry again a minute later, at least until you OOM.
  • This scheme is fine when the pointers to the region data can be moved around between threads, but what about periodically flushing all changes to disk in the background? The only obvious way I can think of doing this is by adding another request type and mutexing all the regions, but that sounds horrible.

@vitalyd That's an interesting idea, but isn't disk I/O sequential anyway?

Sure, maybe not. A singlethreaded threadpool can be used instead. The biggest benefit of it is you get a future back, and that's your handle to the operation. One can certainly build a future impl themselves over a channel, but that's more work.

My assumption was the files are independent because a region can only pertain to a single file. But I indeed don't know the actual file layout, so this was just a general suggestion.

2 Likes

You mean serial? When you write to a file, you're unlikely to actually submit any I/O at that moment - you'll be hitting some sort of page cache, although that's OS dependent. But by letting the kernel see more I/O payload can allow it to schedule it better. This is really moot given you're writing a couple dozen files every few mins, so this aspect can be disregarded (as I mentioned to @HadrienG above as well).

1 Like

As you correctly point out, this use case is a bit different from the "write back regions to disk before discarding" use case, because to implement this you need to be able to (conceptually) concurrently save and modify region data.

I can think of two strategies for doing this:

  1. If the region data is lightweight, you can just make a copy of it and send it to the I/O thread while modifying the original.
  2. If the region data is heavyweight, you can use some sort of concurrent copy-on-write mechanism which only makes a copy of the region data if the main thread tries to modify the region as it is being snapshotted.

In both cases, you will need a more complex state machine (or an alternate design) to handle the notion of snapshots, because now you need to make a difference between the cases where data must be discarded after being saved, and those where it will remain in RAM.

1 Like

What does your gameplay look like? Is it a character moving around the world? Is there user "think time" or some other idle cycles?

When you auto-flush all changes periodically, you should consider doing it incrementally, ideally when a region(s) is not "used". As the player moves around the world, dirty regions going into the background can be scheduled for saving, either with a snapshot like @HadrienG suggested or by moving them to the background thread and moving them back when they're persisted.

The trick is to find ways to hide latent operations from the player, when they're "not looking".

2 Likes

Another option to consider is a pseudo transaction-log like scheme - as changes are made to regions (or perhaps only every N'th change, or only "important" changes), you send that change/delta to the background thread; the background thread applies the change to its representation of the Region, and persists it. One option here might be to maintain mmap's of the region file(s) and apply the change directly to it - that somewhat mitigates the need to "duplicate" Region values.

Whether this is suitable or not depends on what exactly a Region really is, and whether such deltas/change records are possible (i.e. they can be modeled efficiently, and aren't just a "clone" of the Region).

2 Likes

A region, right now, is literally just a Box<[Block; 32 * 32 * 32]> where Block is an integer. I don't think mmap will work if I want to compress things with, e.g., LZ4 or a simple RLE. But the only reason I need to persist changes periodically is to reduce data loss if I get SIGKILL'd, because otherwise it'll be saved normally as it gets unloaded. It's a bit risky, but ignoring the possibility of a force-quit won't be the end of the world.

1 Like

Assuming 64-bit integers, that's 262 KB per region. So if you only keep in RAM the region which the player is sitting on and two layers of nearest neighbours (5 x 5 x 5 = 125 blocks), then assuming an unrealistic worst-case scenario where every block is dirty, the total volume of data that you need to save when performing a full snapshot is 125 x 262 KB = 32.8 MB.

These days, the order of magnitude for RAM bandwidth on low-end machines is 10 GB/s. This means that making a copy of that state takes at worst 3.3 ms. If your performance target is 60 Hz, your frame time budget is ~17 ms. So the time it takes to initiate a copy-based snapshot from the main thread is not negligible, but still something that you can probably afford: you may just drop a frame occasionally. Therefore, I think simple copy-based snapshots could serve your needs well.

I would like to stress that the analysis above is a worst-case one. If you're talking about a Minecraft clone, then the neighbouring blocks on the vertical axis (above and below the player) will not be dirtied very often, as players only have a limited ability to interact with what's above and below them. So a dirty/clean block optimization should spare you from a lot of data traffic already.

But if you want to reduce the risk of dropping a frame due to snapshots further, then as counterintuitive as this may sound, you may benefit from taking snapshots more often. That's because the more often you take a snapshot, the least amount of blocks the player will have dirtied, and the more effective a clean/dirty optimization will be. So you will trade more total snapshotting overhead for a reduced average snapshot latency, which in video games is generally what you want.

All that being said...

Many AAA games consider it okay to have stop-the-world auto-save features. If you manage to have an auto-save mechanism which does not momentarily freeze the game, then you're already above average. I think most players would be very happy with occasionally skipping a frame during auto-save, if that is the best that you can achieve.

3 Likes

Oh, one more thing. If your goal is to be resilient to SIGKILL, you will also need to think carefully about what happens if the game is killed at any point during the data saving process.

In an ideal world, you'd be able to write the new game state alongside the old one, and then make it the current state through an atomic filesystem operation. But this may require some contorsions on your saved game data, such as fitting everything in a single file (since on most filesystems, moving a single file is atomic, but moving multiple files isn't).

3 Likes

Taking this a little farther, you should consider actually storing all the data in a sqlite database. When you go to save, start a new transaction and commit when everything is written. This pretty much just solves all the saving related filesystem, disk io flushing etc etc multitudes of problems that plague programs that constantly write data, especially being cross-platform. If you're careful about how you use transactions sqlite can actually be a really fast data store. You should see if it can work for you. I'd store it as one region per row, and overwrite changed regions entirely. (And since you're storing dimensional data, the R*Tree might be useful.)

When you start looking at compression, keep it simple at first (just compress the blocks array as a stream), and think about data striding as your first technique to improve compression rates, i.e. whether you save it as x,y,z order or z,y,x order etc. The order can heavily influence ratios of general stream compressors. I've seen differences of like 5x on data like this just by changing the stride order, especially at a low compression level for speed. (Hint: an (zx|xz)y order can be good when it's mostly flat.) And as a next step, instead of rolling your own custom encoding to try to improve compressibility first see if you can transform your data to fit into a 2d png. png can do up to 64 bits per pixel (Block, for you) and has a lot of tried-and tested, highly optimized, and tunable compression methods baked in (that work remarkably well for 3d data too). If you can encode your blocks into 64 bits or less (nevermind colors) it could be a big win in compression ratio for comparatively little work, and you can just as easily save that output in sqlite.

And finally, you might be able to use a Copy-on-write region data structure when saving, so it doesn't induce a copy operation for all regions, just the ones that the engine needs to modify while saving is occurring. I'm less familiar with this though, someone else might be able to give better recommendations for a data structure that can flip between mutable (normal use in-engine) and COW (during saving) and back (including whether that's recommended at all).

I hope I've given you some ideas at least. :slight_smile:

2 Likes