As some of you may know, I’m currently writing a safe wrapper around Vulkan. I have already solved a lot of problems, but I’ve currently been stuck on something for a few days. I’m posting this here because I’m starting to be desperate about this, and I’d like to listen to suggestions from people.
Description of the problem
Vulkan is a low level API that can control the GPU. Contrary to older graphical APIs, Vulkan is very low level and gives you lots of responsibilities. One of these responsibilities is to handle memory safety.
Vulkan has this concept of queues on the GPU to which you submit commands. A queue is very similar to a CPU thread, and the same challenges arise. Take this for example:
We submit some tasks for queues 1 and 2. One of the tasks of queue 2 writes to a buffer, and one of the tasks of queue 1 reads from the same buffer. If the two command executions happen to overlap, you get a data race.
To do so, Vulkan provides semaphores:
Now the memory safety problem is solved. Doing this with the Vulkan API must be done in three phases:
- Create a semaphore object.
- When you submit the command that writes to the buffer, you must pass the semaphore and ask the implementation to signal it at the end of the task.
- When you submit the command that reads from the buffer, you must pass the semaphore and ask the implementation to wait for it to be signaled before starting.
Furthermore, there are a few constraints:
- You can’t signal a semaphore once and wait upon it multiple times. One signal = one wait.
- You should minimize the number of semaphores that are created, because it can be expensive. Semaphores can be reused once they have been waited upon.
- Even if the write and the read happen on the same queue, a semaphore must still be used. This is because the implementation allows commands to overlap.
The problem I have is: how to handle semaphores in my Vulkan wrapper? This may not look difficult, but it is. Continue reading to see why.
Idea #1: lock each resource exclusively every time
The idea is to assign a semaphore to each resource. Whenever that resource is used by a task, this task must at the same time wait for the corresponding semaphore at the beginning of the task (except the first time a resource is used), and signal it at the end.
This should work, but the problem is this:
Just like a
Mutex compared to a
RwLock, we’re going to waste a lot of time with resources that are accessed concurrently, which happens often in graphics programming.
Idea #2: since a mutex-like isn’t good, let’s try a rwlock-like
For each resource, keep track of whether it is currently accessed mutably or immutably, just like a
RwLock on the CPU.
However this approach doesn’t work in practice, because you have to allocate the semaphores and signal them when you submit the write command. This means that at the time when you submit the write command, you have to know the number of read commands that are going to be executed. This is really not practical.
Another approach would be to always create several semaphores, and use only some of them when they are needed. But as stated in the constraints, we should avoid over-allocating semaphores. We need more than one semaphore per queue, because each read needs to signal a different semaphore in the case where it is followed by a write.
Idea #3: specialize what happens for each resource
Most of the times resources obey some patterns.
For example a buffer that contains a 3D model is usually only ever modified once, then only read. This means that we can just create one semaphore per queue, signal them all in the write, and then the first time each queue reads from the buffer we make it wait on the appropriate semaphore.
It could be argued that the first two solutions don’t work well because they are too generic. By using one algorithm for each pattern, we can do things correctly.
However this enters in collisions with the way an application is usually organized. The fact that the code that creates resources needs to know in advance exactly how each resource is going to be used makes things really difficult to organize. For example you can no longer use a pool of textures, since you would have to know, for each individual texture, how it is going to be used at the time when it is created.
Idea #4: let the library user handle this and add checks
One thing that would work for sure is to let the user manually create, signal and wait for semaphores manually. But since the library has to be safe to use, the correct semaphores usage has to checked.
The problem with this is that it would add an overhead. This would not just add an overhead, but add an overhead that enters in collision with what the user already tells us.
Furthermore, this also makes the API unpractical. This would also tie different parts of the application together, because for example the code that submits the write command needs to know about the way the resource is going to be used.
Handling synchronization automatically really brings a lot of benefits compared to manual handling.
Idea #5: leave that part of the API unsafe
Same as the previous point, but without the part where you check that it’s correct. This is obviously the C++ approach.
Handling synchronization is a major point of the Vulkan API, and if this part is left unsafe you might as well leave the entire library unsafe. For example the entire design of command buffers must be adapted to these synchronization issues. In the end only very few functions would be safe.
Idea #6: check the online literature about this topic
Since Vulkan has only been released a month ago, there’s no such thing as good resources about it. As far as I know only three game developers ported their game to Vulkan yet, and all three said that they didn’t yet optimize their engine for next-gen APIs because they still need to support older stuff. For example the “The Talos Principle” developers report a framerate 30% lower than with older API.
As said in the opening, I really don’t know what to do. Please provide some remarks and suggestions!