Nine Rules for Writing Python Extensions in Rust & Rayon

A year ago I got fed up with runtime compatibility hell from C++/OpenMP in our Python bioinformatics package. This article lays out what I learned by upgrading to Rust:

Towards Data Science Nine Rules for Writing Python Extensions in Rust
(alt friend's link)

Along with suggestions about PyO3, file layout, memory allocation, testing, etc, the article gives examples of

  • using Rayon/ndarray::parallel while returning all errors
  • letting users control the number of parallel threads
  • creating nice errors with thiserror and translating those Rust errors to nice Python errors
  • translating Python dynamic types to Rust generic functions

Some of this might be of interest to anyone using Rayon for parallel array processing.

The Python with Rust package is Bed-Reader and is open source. Thank you Rust & Rayon for letting me escape from C++/OpenMP runtime compatibility hell!

3 Likes

This is quite a nice write-up.

To ask you an oddball question, what was the most frustrating part of writing Bed-Reader? Were there any moments when you would run into something then need to stop and look out the window for a half an hour to figure out how to solve something?

1 Like

Michael,

Thanks for the question! Three frustrating parts of creating the Rust extension come to mind:

  • "nice" Rust errors - It took me a long time to figure out how to get nice errors in Rust. I wanted to return system errors (e.g. file open not finding a file) and custom errors (e.g. file in the wrong format). I also wanted all the errors to have nice error messages. As mentioned in the article, I ended up using the "thiserror" crate and creating an enum called BedErrorPlus. BedErrorPlus included std::io::Error, ThreadPoolBuildError, and BedError. Finally, BedError contains all my custom errors. This seems like the way everyone should handle errors, but I only found one resource that explained it. (Rust: Structuring and handling errors in 2020 - Nick's Blog and Digital Garden)

  • par_azip! and returning errors -- ndarray::parallel's par_azip! macro is, I think, the most readable way to data parallelize array-related code. However, the documentation and examples for it are very sparse AND maybe it should be part of Rayon, not just ndarray::parallel AND it doesn't have a direct way to return errors. (The article gives a work-around way to return errors).

  • traits for numerics -- it took a long time for me to figure out the traits to create a generic function that would cover i8, f32, and f64. I ended up (after a lot of trial and error) with Copy + Default + From<i8> + Debug + Sync + Send

Yours,
Carl

1 Like