Learn Rust the Dangerous Way - the unsafe-first tutorial

Hi folks!

When I'm trying to convince my friends in the low-level, high-performance software community to learn Rust, I frequently find people bouncing off the high-level nature of the tutorials. I think part of the problem is that Rust introductions tend to present unsafe like an afterthought, or an embarrassment. This is probably the right move for teaching someone coming from Java or Python, for example, but for C programmers, unsafe is what they're used to. C programmers have been watching languages present themselves as "C replacements" for decades, but each one falls short, so they're rightfully skeptical.

I thought I'd try something different.

Learn Rust the Dangerous Way takes an unsafe-first approach to the language, by converting a heavily optimized, pointer-casting, SSE-using C program into equivalent Rust, and then incrementally introducing Rust's powers on top of that.

The program gets faster as it gets safer, mostly by giving the compiler more elbow room. This may not surprise y'all, but I think it's surprising to a lot of C programmers, who associate "safe" with the performance hit you sometimes take when moving to languages like Java.

I'd love to get some feedback, corrections, etc. on what I've posted so far. (Thanks @mww and @Minoru for the early feedback and editing help!)

Teasers:

  • Part 6 will cover using auto-vectorization to get a faster program with less work / no unsafe.
  • I intend for part 7 to cover packed_simd, though I'm currently fighting with it a bit.
56 Likes

Also consider wide for implementing SIMD.

Just a couple thoughts I had while reading through.


Under You can't write C in just any ol' language:

Note 1: What's with that allow nonsense?

You could use #[allow(bad_style)] instead of explicitly naming all three lints. It'd also help reinforce the point that Rust has an opinion around naming.

Rust won't stop us from doing this, but it won't go out of its way to help us either. In general, given a dangerous option and an equally good safe option, Rust will try to nudge you toward the safe option by making it easier to use. In keeping with the theme of being explicit, doing dangerous stuff requires more typing, so it's harder to do by accident.

I really like this statement! It isn't mentioned often, but having these subtle speed bumps to help make the safer code easier to use is something I really like about how people write unsafe Rust.

int main(int argc, char *argv[]){

    offset_Momentum(solar_Bodies);
    output_Energy(solar_Bodies);
    for(intnative_t n=atoi(argv[1]); n--;
        advance(solar_Bodies));

    output_Energy(solar_Bodies);
}

Is that for loop missing a closing )?


Making safe things from unsafe parts:

Experimentally, the naive approach to switching these variables to locals — replacing static with let — costs about 5%!

Wow! I wasn't expecting that. I guess clearing out a couple hundred bytes on every call to advance() really kills performance. Wouldn't an identical optimisation to static variables be to create the array in main() and pass down a reference to the &mut [Interactions; 3] buffer?

Edit: Lol, never mind. Looks like you did exactly that.

let mut solar_Bodies = STARTING_STATE; // <-- new!

Why did you choose to make solar_Bodies a constant instead of declaring the array as a local variable? I feel like having one local variable would be simpler than a constant definition and a local variable initialized using that constant.

Switching from using an unsafe static mut for the solar system state, to a local ( nbody-5 ), won us back most of the performance we lost.

Do you know why that may have happened? I would have thought they'd be identical, seeing as you're either using a pointer to static memory or a pointer to a local higher up in the stack. Maybe it has something to do with caching because the stack is always hot?

Edit: Never mind, looks like you answered that question too :grin:


As a side note, I like how you've put an emphasis on local reasoning over global reasoning. I think it's really important, and one of the things which reduces cognitive load when writing unsafe Rust.

I'd be curious to see what happens when you remove SIMD and let LLVM do vectorizing and similar SIMD optimisations for you (i.e. using -march=native).

5 Likes

Hi!

I've had that one turn people off -- I think the use of the word "bad" is too opinionated.

Nope! Read it carefully; the last ) is at the end of the call to advance. This is a reason why I dislike the C for loop -- it's too flexible.

Yeah, rustc has a couple of bugs around setting up stack frames with large arrays, and while I have repeatedly hit them, I haven't taken the time to isolate and report them. (I bet they're already known.)

That's a great question! I felt like it kept the logic of main clearer to not have a 70 line array literal stuck inside it. This might just be a personal habit of mine.

It also reduced the diff against the C code. :wink:

Thanks! I'm glad that came through.

Well, this is a bit of a spoiler, but: I have good news, and bad news.

Good news: simplifying the code and relying on auto-vectorization produces a significantly (~30%) faster program, despite using sqrt instead of rsqrt + Newton-Raphson, testing on a Skylake machine (which is the only machine I own).

Bad news: not with target-cpu=native. If you let the compiler target AVX, it fails to vectorize the magnitude loop, falling back to vsqrtss instead of vsqrtps, which hurts performance significantly. So targeting core2 (and thus SSE2) produces a faster program.

Fortunately for me, since I'm writing an article rather than a program, this provides a segue into discussing the downside of auto-vectorization: that it's magic and sometimes won't work. Hence packed_simd and friends.

9 Likes

Ewwww....

This sounds like a LLVM bug. Maybe it's missing an optimisation or generating poor quality code.

I had a feeling something like that would happen. With Rust and modern C++ we often leave a lot of the performance tweaking up to the optimiser, which is normally fine because the people working on LLVM and GCC are really smart... But on the other hand, it isn't the most deterministic way of making sure something really is a zero-cost abstraction.

1 Like

This is a cool series - while I don't think going unsafe first is the best way to teach Rust to a general audience, I think taking that route could be a really good way to convince experienced C/C++ developers of the value that Rust adds.

For exactly this reason, that lint has now been renamed to nonstandard_style. bad_style is now just an alias for backwards compatibility.

13 Likes

Awesome blog post series, I like how you've detailed everything going step-by-step!


Regarding

In the current version of Rust there may indeed be no difference, but it would still be unspecified and thus non-guaranteed behavior: currently #[repr(Rust)] enums have like no guarantees whatsoever with regards to layout: the different variants, for instance, are not even guaranteed to start at the same offset (i.e., some variant could have prefix padding) ! The initialization / validity invariants are unspecified as well.

That's why, until #[repr(Rust)] unions get some defined semantics, the best is to never use them, thus making the #[repr(C)] attribute mandatory.


Quid of Cell ?

When talking about C-to-Rust translations, at some point a C programmer will be annoyed at &mut _ being overly restrictive and not matching the "mutable reference" intuition from classic programming languages. And that's when &Cell<_> can shine: imho C's "I don"t wanna think about threads right now" way of thinking fits pretty well the semantics of Cell.

For instance, maybe this "let's try with Cell" approach could be used to attempt a different form of "stage 5" for your code (although, I admit, I haven't really thought all the intricacies it could lead to). That is, the "stage 4" of your translation is the one with optimal performance but it requires using static mut (which are indeed wildly unsafe, and I expect them to be deprecated or even removed in a future edition). In that case, keeping the array "global" but within a thread_local! { static rather than a static mut, and making the velocity field become [Cell<f64>; 3], should lead to a compiling solution (which gets to be reentrant(-safe) despite not being thread-safe) that I would love to see benchmarked.

2 Likes

I just went through the first part of the tutorial and I'm loving it so far :slight_smile:

However, I can't get it to compile:

error: expected `:`, found `=`
   --> src/main.rs:139:32
    |
139 |     static mut position_Deltas = [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
    |                                ^ expected `:`

Also, in Section two: calculating distances there is this text:

The C code is using a more complex for loop this time. I've un-indented the code, remember that we're inside an outer loop still.

But this is not the case as far as I can see in the original C code.

All static variables need to provide their type, it's actually part of the language syntax.

Does static mut position_Deltas: [Align16; 3] = ... work?

It does work :slight_smile:
I changed

static mut position_Deltas = [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
static mut magnitudes = Align16([0.; ROUNDED_INTERACTIONS_COUNT]);

to

static mut position_Deltas: [Align16; 3] = [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
static mut magnitudes: Align16 = Align16([0.; ROUNDED_INTERACTIONS_COUNT]);
1 Like

Unfortunately I don't get the same output after compiling and running the program. I'm not 100% sure that it wasn't me the one that screwed up somewhere though.

In any case, the explanations and the code breakdown are already good enough for me to follow the tutorial through.

Thanks everyone! First round of responses...

@17cupsofcoffee, I'm delighted to hear this! We need to work on our humility, and this seems like a great step.

@Yandros, There doesn't appear to be a lint for this -- should there be?

I'm also a little disappointed that static mut produced such better code on x86. My suspicion is that thread_local, which would generate gs-relative addressing, wouldn't perform as well, but I can test it. Lately I've been working mostly on ARM, where stack-relative addressing tends to be cheaper because of the lack of a compact way to do absolute addressing.

Fortunately, this seems to be an artifact of how the tables are accessed -- I have a simpler version that performs better, which will appear in Part 6.

I agree that Cell winds up being important, but I'm not sure this is the best place to introduce it, because using it in a thread local relies on generics and other concepts I haven't introduced yet. (I also wouldn't use it here in idiomatic Rust.)

@andresovela, you've found two editing mistakes on my part! It looks like the first one isn't present in the nbody-1.rs file that's linked toward the end of part 1, since I compile and test that one. I suggest downloading it and trying it.

I've fixed both issues in the article. Thanks!

Uh-oh! I'm worried about this one. Can you post the output you do get? Can you compare against the nbody-1.rs full program?

The algorithm is kind of sensitive to where parentheses appear in floating point expressions. There's also some += and -= that will corrupt the results if swapped. (I know this because I screwed both up when I was initially transcribing the program.)

2 Likes

I went through the program line by line and I found my mistake. Now I do get the same output :slight_smile:

I haven't read the tutorial yet (and I'm not the target audience), but I wanted to quickly say thank you for writing this!!! We've tried to make the book background-agnostic, which means we're often not presenting the material in the best way for every reader. We need more background-specific resources like this!

18 Likes

cbiffle
Thanks for posting that brilliant article.

I'm not sure of the premise about C programmers though. As an old time embedded systems programmer, in C and other languages before that, I can appreciate their value system when selecting a language: Native compilation, no run-time overheads, small binaries, performance, and above all control of what is what.

As such, my first experiments in Rust were exactly reimplementing some C programs in Rust. So as to evaluate performance, code size etc. Of course I produced what is probably very bad, non-idiomatic, Rust code that looked like my C. I was immediately impressed how Rust met all the requirements I mentioned above.

But I had no "unsafe" anywhere. I just rearranged things a bit until the compiler was happy.

For example:

This FFT. 32 bit integer maths only. Originally written in Spin for the Parallax Propeller. Then PASM and C: https://github.com/ZiCog/fftbench

This anagram finding challenge: https://github.com/ZiCog/insane-british-anagram-rust/tree/master

This conversion of a C solution to the Project Euler problem #256 "Tatami-Free Rooms": https://github.com/ZiCog/tatami-rust

Anyway, love the article, I always learn a lot from everything you write.

1 Like

I'm a few days late, but if you'd like to put this in the https://github.com/rust-tutorials GitHub org I'd be happy to have you.

2 Likes

I'd just like to say this is very well-written and +1 the sentiment that this fills a very real gap in the landscape of Rust education/documentation.

Aside from the #[allow(...)] thing that others mentioned already, the only potential improvement that jumped out to me is this: I'm of the opinion that we should avoid saying things "this unsafe code is safe, but that unsafe code is unsafe" because it makes "unsafe" into a very murky term implicitly switching between two or more meanings multiple times in a sentence, when it really needs to be a crystal clear term. Admittedly, this is not a settled issue with a clear community consensus, and for all I know I might be in the minority here, but the nature of unsafety is so central to this book that it's probably worth thinking about whether sentences like "Here's an aside on when unsafe is safe" should use some other word like "correct".


EDIT: spotted one other small thing in part 4:

Because the union is defined in the same file as advance , putting pub on the accessors doesn't actually do anything ... For the purposes of this tutorial, I'm keeping everything in one file.

I believe you can declare a mod inside that file, and then the rest of the file really would be forced to use the pub accessors. I'm not sure if that's a net pedagogical improvement, but it's probably worth considering.

Moderator note: I removed a tangent about the nonstandard_style lint's name. Please don't hijack the thread, especially for bikeshedding.

4 Likes

I like the emphasis on 'whatever you could do in C, Rust can do it too', it's a much-needed approach.

However, it may be a good idea to change the order of lessons around a bit. The way it looks right now, I'd expect the typical C programmer to look at the first few diffs, then conclude "Rust is way too verbose/complicated, I already know C and can handle memory just fine, because I'm a Good Programmer™", which means that most of them are never going to get to the good parts.

Also, a quick primer on Rust's variable declaration syntax may be a good idea. For someone who only ever used C, asm and Bash, something like : [f64; 3] may look like utter gibberish, and type inference may be mistaken for dynamic typing.

7 Likes

The standard language a lot of us are trying to standardize around is "sound" versus "unsound" for functions. A block is unsound if it does something illegal, and sound if it breaks any rules.

For a safe function, in order to be sound, it must be sound for all possible inputs (and state, if relevant) producible in safe code. For an unsafe function, it must be sound over all documented supported inputs (and state). Unsafe functions also get the distinction of having sound and unsound invocations.

And then the final step would be "unsafe but unmarked and private" which is a safe function that is sound for some inputs but not all, but can still be fine if encapsulated in code which is always sound. (But many people would tell you to mark it unsafe for greater explicitness anyway.)

7 Likes