Rust beyond CPU's (for HW such as TPUs, GPUs, accelerators etc.)

So, I was wondering if there are projects (or communities) within the larger Rust ecosystem working on bringing Rust to a more diverse range of hardware, such as TPUs, GPUs, or custom accelerators. I primarily work in the systems domain (embedded + high performance), but I've noticed an increasing shift of compute towards specialized hardware—things that are not CPUs—for various reasons like parallel processing, efficiency, and dedicated memory architectures.

I believe this trend is partly due to the AI/ML boom we're witnessing. It got me thinking: are individuals in the Rust ecosystem also observing this shift? I'm aware of one or two projects working towards this, such as rust-gpu. However, I don't think these projects cover the entire spectrum.

The second part of my question may be more suitable for a Rust compiler forum, but I'll include it here for completeness' sake.

P.S.: My knowledge of compilers is that of a novice. However, how challenging would it be to take MIRI (one of Rust's intermediate formats) and lower it to something like MLIR before emitting LLVM IR, considering MLIR is what every fancy new piece of hardware is targeting?

2 Likes

In theory, that's easy. In practice, the compiler was initially written with many assumptions about commodity hardware being baked in, so even porting it to AVR, for example (Hardvard architecture, with separate address spaces for data and code) proved to be challenging and required several years to be completed.

Also, that's not even the hard part. The hard parts are:

  • how to provide the platform-specific stuff that's needed for the port to be useful (this includes eg. the OS interfaces on personal computers, but on specialized hardware, it would probably have to do with communication with proprietary drivers and toolchains)
  • how to emit code that's optimized or optimizable in a way that it's worth running on a GPU/TPU. You can pretend they are just another kind of CPU, but to be actually useful for hardware acceleration, you have to uphold all sorts of specialized memory access patterns and instruction scheduling, just to name a couple challenges.
4 Likes

I for one don't want to write GPU kernels in Rust and have to deal with the Rust aliasing rules for shared memory. It seems like a really poor fit for many kinds of computation. However I would like to write host code in Rust rather than C++. It would be really convenient to use tokio to manage a GPU in an async way the same way you might use storage or the network. The last time I looked, I couldn't find GPU bindings that supported DMAs soundly.

Thank you for the note on AVR. I did wonder why it took a while.

how to emit code that's optimized or optimizable in a way that it's worth running on a GPU/TPU. You can pretend they are just another kind of CPU, but to be actually useful for hardware acceleration, you have to uphold all sorts of specialized memory access patterns and instruction scheduling, just to name a couple challenges.

I presume the answer to this is most likely going to be ‘no,’ but just to confirm - Can we not just add the specialized parts (memory access patterns, synchronization primitives, instruction scheduling) to a subset of Rust's core library and, say, create profiles? I can envision this happening in some nascent way with SIMD, today.

I for one don't want to write GPU kernels in Rust and have to deal with the Rust aliasing rules for shared memory.

:laughing:

Instruction scheduling is something the compiler performs while optimizing the IR and emitting machine code. I cannot fathom how you would achieve it merely by means of the standard library.

Synchronization primitives can certainly be (and are actually) implemented in a platform-specific way.

By "memory access patterns" I also meant codegen tailored to the architecture, which is also not something the standard library does. At the low level, it's performed by the compiler; at the high level, it's performed by the programmer implementing an accelerated algorithm.

It's easy: you use raw pointers and/or UnsafeCell. If you want shared mutability, you have to opt out from Rust's memory safety anyway.

2 Likes

You may or may not be aware of WebGPU, which uses WGSL to write shaders, and also supports GPGPU pipelines.

As expressed by others, using that effectively is a whole other matter, but that's more an artifact of the functioning of the hardware than the software.

I've only recently started looking at WebGPU so I can't personally say yet how useful it will be in practice for GPGPU purposes, but I figured I'd at least mention it.

4 Likes

There's the rust-hdl crate, if you're into FPGAs.

(I would have preferred to see the backend generate VHDL, but that's just a matter of preference).

5 Likes

Thank you. This is pretty intriguing. I looked it up. Looks like the author has embarked on a rewrite - rhdl.

4 Likes

Have you actually done this?

CUDA is really powerful. Is there a Rust GPU project that even comes close with all the associated tooling?

I did some preliminary exploration on this. The issue back then (I believe) was that while you could build compute pipelines with wgpu, it was focused on rendering. I’m not sure if things have changed now.

1 Like

Done what? Used interior mutability primitives? Yes.

I don't think so, or at least I don't know of one. I have, however, no idea how the existence of a good enough platform support layer is relevant to the inherent properties of the language. It doesn't seem to matter how "powerful CUDA is" in relation to the fact that you have to use interior mutability for avoiding UB if you need shared mutability.

1 Like

Yes, there are a number of competent GPU wrappers: GPU Computing and the ecosystem for GPU driven graphics is very exciting: 3D Rendering | Are we game yet?

2 Likes

To write something substantial that executes on a GPU?

Are the GPGPU wrappers that mature? Their own documentation seems at odds with this. The Rust CUDA project seems to say that much ("The project is still in early development, expect bugs, safety issues, and things that don't work"). Just to pick on one example async DMAs are unsafe and you are on your own to make sure you poll the future to completion before dropping or modifying the buffer.

The trouble is usually: comprehensive platform support, edge-cases related to Rusts type system, additional domain specific feature integration requests, and working safely against proprietary code.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.