Async contamination problems

For the last year and a half, I've been writing a Rust client for a virtual world. It's coming along OK. Here's some video. The old C++ viewers are mostly C++, with a lot of async-like coroutines. They're bottlenecked on the main thread and can't use modern multi-core CPUs effectively.

So my design is multi-thread. It uses the Rend3, WGPU and Vulkan libraries to handle the GPU. These allow me to send new textures and meshes to the GPU from other threads. The rendering thread is just a short loop which calls Rend3's render function over and over. That thread has the highest CPU priority, to keep the frame rate from dropping under load.

All the update work is done outside that thread. Network messages are coming in, assets are being fetched and decompressed, moving objects get moved, animations get run, the scene graph is being constantly changed, and assets go into the GPU and are deleted from the GPU. Several CPU cores are busy doing this when the viewpoint is moving and assets are being loaded and released.

Locking is fine-grained, at the viewable object level within the scene. Locks are never held for more than a few lines of code.

I'd previously tried an architecture where one thread owned the scene graph and it received messages from other threads. But that resulted in excessive complexity. A message comes in, "OK, here LOD 3 of asset N". But, while that event was on the queue, the camera moved, and now we want LOD 2. So that update is discarded, and a new request has to be generated. This kind of loose coupling gets complicated. With fine-grained locking, I eliminated that.

OK, fine. It's all working.

Then the author of Rend3 decided to support WebAssembly. It's nice to be able to run in a browser. The trouble is, the WASM architecture is really a way to do Javascript stuff faster. It has Javascript's concurrency model, not Rust's. Each thread is a "Web Worker", which, I think, is a separate process. They share some memory. Within that memory, you can use atomic operations. Mutex-type locks are (not sure about this) really compare and swap plus spinlock plus timeout.

So, the normal approach to WASM is to use "async" and avoid doing much locking. Every crate that has a lock inside has to be a special async version. So parts of Rend3 were converted to "async", using Tokio. And, because async land insists on being in charge, there's been a transition from a library to a "framework". (You call a library, framework calls you.) Haven't fully figured out the changes yet.

Async land doesn't really like compute-bound work. Modern game-type work, though, is multiple CPUs and a GPU running flat out, all sharing the same scene graph. Async is a bad match to that mode. We don't often see AAA game titles running in a browser. That's part of why.

An async solution would take months of work and would probably take a performance hit on non-browser platforms. For me, running in a browser might be nice, but is not essential. So, huge rework, increased complexity, and reduced performance.

Someone else ran into this problem with the Bevy game engine. No one has an answer for them. So switching to Bevy won't help.

Any ideas?

2 Likes

They have to be on the main thread as you aren't allowed to block the main thread. On web workers there are the wait and notify wasm instructions which work similar to futexes (which are used for locks on at least linux) I believe.

spinlock

No, at least not on non-WASM platforms. It's a compare and swap, and if you have to wait, wait until someone pings you that the lock status has changed. Then try again. After some amount of time, < 1 ms, that loop exits, and there's a mini CPU dispatcher to decide who runs next. It's complicated, but not too bad if you usually take the happy path.

Things where you block normally, though, like channels, need different handling. I use crossbeam_channel, but would probably have to convert to mpsc.

Can anyone give an example of something successful that's heavily compute bound, has a lot of data shared between threads, needs multiple CPUs, and runs on Tokio async?

Trying to figure out how to work under all these limitations.

It looks like most of my threading could work under Tokio in multithreaded mode. Not sure about WASM. Can you pass async_std::sync::Arc to another worker under WASM?

Please note that you should not be mixing Tokio with async-std. Note also that async_std::sync::Arc is a re-export of std::sync::Arc.

1 Like

Generally, Tokio actually recommends that you put anything compute bound on rayon rather than putting it on Tokio. See for example this blog post. You can put compute bound stuff directly in Tokio, but if you do you should probably be starting two Tokio runtimes — one for your normal async stuff and one for your compute bound stuff.

2 Likes