Feedback on our Rust Documentation for HPC Users

Hi everyone!

I am in charge of the Rust language at NERSC and Lawrence Berkeley National Laboratory. In practice, that means that I make sure the language, along with good relevant up-to-date documentation and key modules, is available to researchers using our supercomputers.

That's more than 10,000 users worldwide doing research on a wide array of topics (physics, cosmology, material science, artificial intelligence, etc. you name it, we have it and might even have helped someone get a Nobel prize in it).

Right now, these users might be competent in high-performance computing and numerical simulation but most of them are unfamiliar with Rust (C++ and Python would be the two main programming languages here). My goal is to make users who might benefit from Rust aware of its existence, and to make their life as easy as possible by pointing them to the resources they might need. A key part of that is our Rust documentation.

I'm reaching out here to know if anyone has suggestions to improve the documentation (crates I might have missed, corrections to mistakes, etc.). I'll take anything to try and help bring Rust to that community :slight_smile:

4 Likes

In the "Installing Rust" section, you write:

Once loaded, you should be able to use cargo as usual.

But the previous section and your post here suggest that this is meant significantly for an audience who doesn't know what “usual” is — so it might be worth adding a hyperlink to Hello, Cargo! so that people in that position can get onto the usual introductory path.

Note that the module does not let you change the Rust version and channel via rustup . You can switch channels (i.e., from stable to nightly ) using a rust-toolchain.toml file in your project or explicitly adding +nightly to your cargo calls.

As someone who does know how Rust tooling works, this is an almost self-contradictory claim because rust-toolchain.toml and +nightly are rustup features. Perhaps you mean “… change the Rust version and channel via rustup override”? Specifying that would help users know exactly what to expect.

… (i.e. using the target-cpu=native build flag) …

… (i.e., data races are a compile-time bug in Rust) …

These should be “e.g.” instead of “i.e.”.

CPU Parallelism … In particular, we recommend that you take a look at the rayon and crossbeam libraries.

I think it may be also worth mentioning that you can achieve a lot of parallelism using solely std tools. std::thread::scope can be used if you already know how to break your problem into threads and don’t need rayon’s more powerful and DWIM features, and std::sync::mpsc is a fine choice of channel if you don’t need the additional features of crossbeam.

(rayon can be a good introduction to Rust parallelism because it's so easy to use — but it has subtleties, like how you can accidentally deadlock by blocking in a Rayon task, and quite a large API surface with lots of traits.)

The Rust compiler has support for SIMD instructions, both architecture-specific (on stable ) and portable (currently restricted to nightly ).

The “architecture-specific” link is many versions old.

I think it would also be worth mentioning that the Rust compiler is capable of autovectorization, so it is not necessary to use explicit SIMD for simple cases.

See the Rust Cookbook and Blessed, …

The name of this site is “Blessed.rs”, not “Blessed”.


This may be more of a pet peeve than good advice, but if I were writing this page, I’d replace most occurrences of “crate” with “library”.

  • Technically, all Rust libraries are crates but not all crates are libraries.
  • Using already-known language may help people get oriented faster or be put off less by Rust’s strangenesses.

Much of the Rust community would probably disagree with me here, though.

6 Likes

Thank you for the detailed review! I will try and update the documentation today to integrate your feedback.

One good question you raise is who we expect to read the doc. The key profiles I expect are people with high-performance computing experience (C++, OpenMP, MPI, etc.) but little to no knowledge of Rust (this would be a first introduction to whether Rust might fit their use-case), and people with some Rust experience but little Rust-for-HPC experience (those would need to be pointed at HPC-specific crates and tools they might not be aware of). Plus interns and grad students, a lot of interns and grad students (who know little, but are really motivated and willing to try newer technologies, as long as you point them in the right direction).

Regarding the use of "crate" vs "library", I think I would default to "crate" if I expected all readers to be familiar with Rust but, since some will definitely not be existing Rust users, "library" might be a better fit here.

Documentation updated!

1 Like

Thank you for taking the lead to bring Rust into the HPC domain.

I spend a meaningful chunk of my day job on optimizing Rust code for high throughput and low latency applications. From my perspective, your documentation is a good starting point given that the target audience is assumed to be unfamiliar with the Rust ecosystem.

Here a few additional aspects to add:

Linux io_uring gives a meaningful performance boost w.r.t networking and disk IO. You can leverage it either by using a binding or the monoio runtime. I would not exactly recommend using io_uring directly unless the developer is familiar with the IO intricacies of modern (5 / 6) Linux kernels.

The monoio runtime leverages a thread per core architecture (which is different from work stealing in Tokio) and can easily be 2x to 3x faster than Tokio for balanced workloads (i.e. no irregular bursts).

Then, for IPC with core affinity, you most certainly want to look into the Rust port of the Disruptor Ring Buffer.
As matter of fact, io_ring in the Linux kernel is implemented with a C version of a RingBuffer.

There is a good blog post explaining the concept.

Much of the Disruptor was pioneered by LMAX so the original documentation is still worth reading as reference even though they refer to the original implementation in Java.

Rust has one of the fastest finite state tranceducers (FST) and given that HPC experiments tend to index humongous troves of data, you definitely want to add it to the list. This crate is actually used by some Antivirus companies to quickly scan hundreds of billions of indexed URLs.

GitHub - BurntSushi/fst: Represent large sets and maps compactly with finite state transducers.

These are just suggestions which I deem sensible in the HPC context assuming you aim for high throughput large data processing in Rust.

Thank you for reaching out to the Rust community and for sharing your documentation.

4 Likes

I, as a reader of docs, was confused by the "crate" vs "library" vs "binary" terms. I think using the words "library crate" or "binary crate" help to teach that "crate" is a compiled unit, and there are different types. A crate can referred to as just "crate" or "library" in the text also, after it referred to as a "library crate" first.

1 Like

I think defaulting to library is the right choice here, people following crates.io links will soonrun into the word crate but by then they are in the wider Rust ecosystem and more generalist documentation would be the best place to answer their questions.

Thank you for the feedback!

I think that most of those libraries are too specialized for the scope of this page as few of our users are I/O-bound in the traditional sense.

Most communication bound codes here would be waiting on MPI (inter-node communications, which are optimized to death).

We have some very data-focused users (i.e. in biology / genomics I believe) who would benefit from those tools, but I expect them to be familiar with those primitives and be able to locate the corresponding Rust crates easily.

People wanting to make the best use of the hardware provided would mostly ask us how to use all cores, how to use the GPUs, and how to scale up to several nodes gracefully (then we might tell them that their multi-nodes multi-cores application is spending most of its time loading data in step 1 of their pipeline...).

It is sometimes not as easy as you'd like to find crates when starting cold. So encourage you to consider adding this, if you have any room for specialized sections in the doc.

Also it helps to have recommendations from people who have used them, like @marvin-hansen gave. People often post here asking for recommendations.

1 Like

I see it as a balancing act: if I put too much content on the page then people are less likely to consult individual pieces of content.

Here I believe the use-case is rare enough to not be worth it. That decision was guided, in part, by the fact that we do not talk about io_uring in other parts of the doc, despite it being useful to existing data-focused non-Rust codes.

We do have some dedicated I/O pages, but those tend to focus on using higher-level interfaces (i.e. HDF5 with implementations able to use MPI-IO effectively to make the most of our distributed file system) and profiling / tooling to identify problems (at which point users can reach out to us for specialized help, and if any problem ends up being recurrent then we will add documentation to help people encountering it).

3 Likes