Designing Vulkan Pipeline + Shader Layout Agreement

This is a proc macro + build script problem. I think I have a solution. Sharing in case there are other ideas that deserve attempts.

Context

Pipelines are where layout agreement becomes fully decided. The Slang compiler can emit metadata about structs that are in use by a shader (for both Spir-V or MSL backends). When we compose pipeline stages, each stage must reference a shader source, allowing us to match all input layouts (uniforms, push constants, buffer types) to shader metadata.

Plan of Attack

To convert any mismatch into a compile time error, the following scheme is devised:

  1. build.rs invokes slang to emit metadata and evaluate hashes representing structure and field layouts
  2. rust structures use proc macros to emit const functions that evaluate to a hash of the structure layout
  3. pipeline stage hashes emit const functions that call the const functions of their fields, creating a composite hash of the entire stage
  4. the shader metadata must have evaluated to the same hashes or else an emitted const evaluation will fail.

Agreement between pipelines seems to be a matter of buffers, images, and their barriers, making it still an expression of pipeline stages rather than requiring more composition on top.

Since the const function only exists after macro expansion, any structure of structures will have to emit a const check that hopes that the other const check exists. Limitations like this are still compile time failures and not too confusing.

Even Better Ways to Feed RA?

The only way I can think of to emit span level errors is by emitting extra pre-checks to some file during builds, but they will get stale. Otherwise the macro expansions themselves cannot know what is going on with the child structs amirite?

Avoiding Generation

I decided not to do code generation because I think it will actually degrade the ergonomics, forcing us to import specific Rust types and complicating deduplication of types used in many shaders (hashes naturally deduplicate). We would have to wait on slangc to get updated Rust types instead of just typing fields.

For the first writing of a struct, I plan to include a macro that will create one Rust type from a slang shader source and type name. Expanding in the IDE can then "write" the annoying first copy while maintenance can use the const checks to enforce agreements stay contractually bound.

The downside of relying on such generation all the time would be again that Slang might re-use types and we only know the layouts from use in a shader.

The padding somewhat changes from type to type, and we can handle this by using different annotations to produe MyTypeUniform etc like other crates for handling this problem. If there is a disagreement, the user can combine the slang path & type in a macro call and generate the similar but slang-distinct types where necessary.

Recruiting Contributors

All of this design work is for MuTate and I people who want to gain experience or a useful technical artifact will find support for any useful work, including simple things like finishing the clap interface for the included DSP workbench.

Not a direct reply, really, but since I've been playing around with them recently - do extensions like VK_EXT_descriptor_buffers (and soon _heaps) that push bindless further help reduce layout mismatching?

It feels like with dynamic rendering, vertex pulling, device dispatch, and all the direct device memory access features we're getting pretty close to pipelines being not much more than a pure shader program object, at least for desktop.

Yesterday I found several extensions and strategies to reduce the need for padding. Layout mismatches are about field order as well as padding so we always care about layout.

As for getting (mostly) rid of descriptor churn, it boils down to the big descriptor arrays strategy.

Yes. We can look at a pipeline as a function. Sharing image and buffer indexes is sort of like manually passing the arguments. If we squint, these two activities combine to make something like a single big Lisp expression.

Do tune in. I'm going to prototype pipeline wiring using macros to decide how to compose pipelines and user typestates to verify semantic (what is in the buffer now?) correctness.

Ergonomics first. Guard rails when there's something to guard :wink:

I'm not anywhere experienced enough at current APIs yet to actually have any idea what it would look like, but every time I've messed with them I've always had this idea in the back of my head that the "true form" of the modern renderer is a "rendering language" that not only covers binding and vertex layouts across host and device, but also subsumes the shader language itself (at least to some extent) and the render graph on the other side, which would let you transition the same data across domains several times in the same expression (eg g buffers, clustered lighting), and have it extract out the efficient dispatch and sync implementation (including device dispatched if available)

Of course, this is not only a fairly absurd amount of both design work and implementation if you go all the way, but I've got no idea if there's even a reasonable way to define an interface to it, on both the dispatch/scene data and the hardware side. Not to mention integration with existing code!

1 Like

Having enough Lisp experience, I think its safe to say that all programs have a "true form" that is a DSL that only describes that program. The semantic compression goes on until we reach a fixed point where we're trading regular code for macro code nearly 1:1.

One of the ergonomic strategies I just cooked up was to do something similar to Nix callPackage (a true marvel of laziness and other sin). In Nix, functions can name their arguments. If those names are bound in scope, calling callPackage just inserts the named values from scope into the function call. This is convenient because Nix argument lists are long and evaluation is lazy.

How such evil will become useful for rendering is that pipelines truly must be completely explicit about their inputs. If we know all of the inputs and "passing" them is somewhat of a farce because we're just giving buffer indexes into a descriptor array where the data already exists (after some pipeline barrier, during a moment in time), do we actually want to pass them at all?

The filth of callPackage makes a lot of sense. While we can certainly override some named inputs, for the most part, if the callees know what they need, the scope can just pull the right things out of the hat and put them into the "call." The next trick is to reasonably restrict "the hat."

We declare what we want to evaluate. Steps that depend on each other "call" each other. Steps that can run in parallel evaluate separately. When there's more than one valid input, we might need some explicit bindings. Otherwise, we just presume that callees have made enough stuff available in scope. Since pipelines-are-functions, we can draw up in Lisp:

(let ((bound2 (pipeline4)))
  (pipeline8 bound2)
  (pipeline7
   bound2 (let ((bound1 (pipeline1)))
     (pipeline5 bound1 (pipeline3))
     (pipeline6 bound1 (pipeline2)))))

This isn't quite as lazy of a sketch as call package (which would look completely flat). It gives us a lot of information to decide dependency without specifically declaring them (since the dependents already must know too much). If the programmer doesn't have a good idea of the order they are describing, the macro can reveal the problem before runtime. More bindings can resolve ambiguity. Call structure determines what outputs could be in scope and prevents the macro from having to search the entire visual.

We can use some const eval checks to push up the moments of failure without onerous type contracts. The question we're answering is can the runtime decide the composition for this visual or is it not consistent if the only thing recording to a command buffer?

Well time to begin vibe coding our way towards the truth preservation of the concrete :spiral_shell:

Yeah, the main benefit I was hoping for such a rendering language is that you're thinking about data flow instead of descriptor indexes and buffers, but the descriptors and buffers didn't just go away, and they certainly didn't become less important. At some point the rubber has to meet the road!

The reason I was thinking about such a language isn't because it would (theoretically!) make any particular engine or rendering strategy nicer though, it's that it would make it easier to experiment with different rendering strategies, and the ability to have higher level debugging tooling for such strategies. The ability to quickly iterate and inspect is far more valuable than simple terseness, though they are closely related.

Of course, what most likely you would end up with without a lot of luck and care, is simply another leaky abstraction that takes more effort to wrangle into doing what you actually want. I'd need to get a lot more experience actually implementing different techniques before I'd even think of tackling anything like this even if I did think I had the endurance to pull off even a constrained version. A basic render graph format gets you most of the way there!

Rather making a DSL is about picking a few favored tactics that reduce the need to specify things (because it all works the same) and then making it easy to spam a few really powerful tactics like SoA while still being extremely ergonomic. The reduced language never does as much as what it reduces.

Alone, it is said that those of our kind suffer, separated from the glory of the Khala. But none of us are ever truly alone.

The strategy I'm using is to focus on getting concrete things working as fast as possible with ergonomics second and compile time safety etc last because that's easier to add at the end once there's a productive API already doing concretely cool stuff.

The behaviors and dataflows are starting to look really clear. One of the definitive challenges is to crossfade render styles. That means different render techniques that can accept common inputs and have their mutations of state either crossfaded by weighted interleaving of working on the same input buffers or reduced with weighting on independent updates, things like that.

The main challenge for writing macros is going to be building the small things in isolation where the inputs can't be inferred. It's not until we get to pipelines that we an even know most decisions. Only compositions of pipelines can begin to fully leverage opportunities to infer correct argument bindings.

I think I'm at the point where the design work is starting to get back ahead of concrete implementation and it's just time to ram out lots of code again. I have some extra code I can share in gists for any interested. It's pretty certain that crossfading render styles will yield some extraordinary toys for other rendering.

very rough draft of the kind of macros I want to make, encoded in Lisp, but imagine JSON style if you prefer. (are Lisp style expressions in macros welcome? Can't recall any popular macros that use that syntax :cactus: )