Learn Rust the Dangerous Way - the unsafe-first tutorial

There’s also heavy overlap with Ruby’s Enumerable module and Rust’s Iterator trait.

At least, that’s where I first became familiar with functional-ish combinators in a reasonably mainstream language. Apart from the usual computer science focus on lisp and scheme in college I didn’t have much exposure to such things.

That is indeed the plan.

But it's not easy. When your whole life has been sequence, selection, iteration expressed as expressions, 'if', 'for' etc, in many different languages, one does not immediately think of 'iter', 'map', 'enumerate', 'fold', 'collect', 'zip' and so on.

These things have not existed in my world. They are not what our computers do down at the machine instruction level. They are costly abstractions that have no place in systems that need to be small, performant and portable.

Rust of course changes all that by bringing them to the realm of compiled systems programming languages. And hence, here I am trying to get acclimatized.

Funny you should mention F#. That is a new kid on the language block. Not any use in my world. But, whilst watching a presentation on functional programming the other day I started to realize where many things in Rust were coming from.

3 Likes

Interesting, I didn't think I had done that -- what are you referring to?

Or are you talking about some docs unrelated to this thread?

No, no, sorry if I was unclear. Your article certainly does not do that.

I was generally referring to that Rust docs, what I read about Rust around the net and discussions I have tried to follow here.

Actually my question was perhaps the opposite, why the need to kick-start C programmers into Rust with "unsafe"?

As a long time C user I was quickly sold on Rust meeting C like performance after a few experiments without ever needing "unsafe".

1 Like

Thanks for this post, was really interesting and I didn't expect the performance to be faster.
According to github I'm using rust since 2015, but I've never tried to use raw pointers this way :sweat_smile:

Great tutorial, I enjoyed it! One thing I noticed:

In a couple places, you transmute from [MaybeUninit<T>; N] to [T; N], using two stack variables where the C code only uses one. Seems like you should either transmute a reference instead, or inform the reader that you're actually making copy which will probably be elided by the compiler.

1 Like

Personally I have always found it better style to mark any function that can exhibit UB as unsafe, even if it's private. Otherwise, you run the risk of introducing UB in the future when you forgot the function isn't actually safe.

1 Like

As a slight aside - one tutorial that I would love to read is a stream of consciousness of someone optimizing some code:

  • how they generate machine code that tells them useful information
  • how they profile
  • How they interpret the generated machine code (do they use LLIR or asm? how do they zero in on the relevant part of the code?)
  • How to test for performance regression in automated testing
4 Likes

So, I've actually started on such a piece. The main issue is that it turns out to be a lot of work! Mostly around preparing visual aids, code snippets, etc. As you've probably noticed on this thread, several people have found errors in the original code snippets from my article; this is because different versions of the code got their wires crossed during the editing process.

When I teach this sort of thing to new hires, we pair-program. I almost feel like "optimization livestreaming" might be interesting, except that video's not terribly accessible, and is arguably even more work to edit and produce.

I also need to look into adding a diff-viewer to my blog software (I'm using Zola). I think colored annotated diffs, like a code review, would be easier to follow for lots of small changes than my current format.

4 Likes

I like the idea of live-streaming. I don't think it needs to be edited - sometimes looking at the rough edges can be very good for learning!

I would like to thank you, @cbiffle, for saving me a lot of effort. I was going to write a blog post at some point covering pretty much what you did!

I have a C background myself, and felt the same way as your intro about those tutorials when I first saw the potential of Rust. This was also around the version 1.15 timeframe, so there were far fewer libraries than there are now.

When I wanted to re-write a personal project in Rust, there was no library for what I needed. So I decided to port a 14,000 line C library to Rust.

It was pretty much your approach, but in my case, I used an automatic code generation tool. It generated unsafe Rust that behaved identically to the original C.

(And I do mean exactly identical. For example, it declared every array as mem::uninitialized() and then explicitly called libc::memcpy to initialize it.)

Once I worked around a few bugs, I had a 100% unsafe library that linked with and passed all the C code's tests. I began hacking my way safety, running the tests to check for mistakes, and debugging the problems I found. (That was a very good learning experience all by itself!)

After two years of weekend work, I made really good progress, and learned a lot... but never finished it. It was obsoleted by a much better, much safer library.

I thought about writing a blog post on this, but pretty much everything I was going to say was covered in the series! The only exceptions are a couple of "horror stories" I encountered along the way, which were either my own mistakes with unsafe or how quirky C is.

Anyway, as I said up top: thank you, @cbiffle!

9 Likes

In the chapter 5, you say that the cfg causes a confusing error message when there is no sse2 but by adding these 2 lines you can define a clear message:

#[cfg(not(target_feature = "sse2"))]
compile_error!("Your computer don't support sse2");

The code also has the advantage of being more understandable and should not pose any performance problems.

7 Likes

I'd actually love to read unsafe horror stories. Learning from other's mistakes is a great way to learn and especially for unsafe code, I can see myself learning a lot from it. I can't possibly make every mistake myself to learn something. :grin:

This might even be a good idea for a thread in this forum where everyone can share their unsafe-gone-wrong horror stories.

5 Likes

First big thanks @cbiffle, this tutorial was very helpful, although I'm not coming to Rust from C/C++ background. For me main value of this tutorial was that it helped to move unsafe into more appropriate perspective.
From initial reading on TRPL book (and couple of other books) I got impression that unsafe is rather obscure feature of the language, which I would almost never need, maybe only when I'll be doing some very serious systems hacking (which is anyhow beyond my skills and mental capabilities). But later I've learned that using unsafe is not so difficult and in some cases necessary (infamous double linked list , ffi ...) and it's not so uncommon. So I think rather then making unsafe obscure people should learn how to use it properly and put it into right perspective - and this tutorial made a great step in this direction - do not afraid of unsafe, use it when necessary, but try to localize it and provide reasoning how unsafe code maintains required safety invariants.

I'm looking forward for next parts and especially would like to hear about packed_simd - I looked at their nbody example and it looks quite promising (simple code, using SIMD for 3d vectors feels more "natural") and also it's performance is great (app. 25% faster) so more expert insight on this topic would be welcomed.

4 Likes

@cbiffle - I guess you will touch it in next part but I think it would be good to create wrapper type for __m128d and implement add, sub, mul and div operations for it - it makes the resulting code so much nicer and I think everybody who reads your tutorial already understands concept of operator overriding (or if not it rather trivial to grasp it), so it would not distract from main theme at all, but rather it'll make code much more readable - nested _mm functions are quite messy.

2 Likes

A thread sounds like a good idea to me, @Phlopsi!

If it gets traction, it might also be a set of data the Unsafe Rust Guidelines WG would be interested in. It could highlight what people try to do, but invoke UB because they did it wrong or missed an assumption in the unsafe contract.

Based on the notes I took for my blog post, I can contribute three "horror stories" without much trouble. Two of them are specific to the C interop I was doing, but perhaps they will still have value.

They are currently titled:

  1. The Single-Threaded Program Deadlock
  2. The Sloppy Copy C API
  3. The Box-ing match Between C and Rust Ownership

At me if you start a thread, and I'll write up and link some Gists!

1 Like

Hi again! Thanks for all the feedback, I really appreciate it.

The promised part 6, dealing with autovectorization in naive Rust code, is now online! Spoiler alert: the program just got quite a bit faster.

Whoever's been posting links to Reddit/HackerNews -- thank you! It seems a lot of people are finding the articles from there. I don't use either site, so I appreciate your help.

10 Likes

Nice :slight_smile: small typo in:

Second, it messed up the compiler's ability to reason about k . Right now, the compiler is smart enough to look at the way k is maintained and determine that it's always in bounds for position_deltas[x] ;

I think you meant position_deltas[k].

You're right! Fixed. Thanks!

1 Like

A small typography hiccup: here, minuses weren't replaced by the em dash:

That's hardly changed from part 5 -- in fact, I only changed some spelling and capitalization to match Rust conventions.

Great work, thank you very much for writing all of this :+1:

3 Likes