Learn Rust the Dangerous Way - the unsafe-first tutorial

It does work :slight_smile:
I changed

static mut position_Deltas = [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
static mut magnitudes = Align16([0.; ROUNDED_INTERACTIONS_COUNT]);

to

static mut position_Deltas: [Align16; 3] = [Align16([0.; ROUNDED_INTERACTIONS_COUNT]); 3];
static mut magnitudes: Align16 = Align16([0.; ROUNDED_INTERACTIONS_COUNT]);
1 Like

Unfortunately I don't get the same output after compiling and running the program. I'm not 100% sure that it wasn't me the one that screwed up somewhere though.

In any case, the explanations and the code breakdown are already good enough for me to follow the tutorial through.

Thanks everyone! First round of responses...

@17cupsofcoffee, I'm delighted to hear this! We need to work on our humility, and this seems like a great step.

@Yandros, There doesn't appear to be a lint for this -- should there be?

I'm also a little disappointed that static mut produced such better code on x86. My suspicion is that thread_local, which would generate gs-relative addressing, wouldn't perform as well, but I can test it. Lately I've been working mostly on ARM, where stack-relative addressing tends to be cheaper because of the lack of a compact way to do absolute addressing.

Fortunately, this seems to be an artifact of how the tables are accessed -- I have a simpler version that performs better, which will appear in Part 6.

I agree that Cell winds up being important, but I'm not sure this is the best place to introduce it, because using it in a thread local relies on generics and other concepts I haven't introduced yet. (I also wouldn't use it here in idiomatic Rust.)

@andresovela, you've found two editing mistakes on my part! It looks like the first one isn't present in the nbody-1.rs file that's linked toward the end of part 1, since I compile and test that one. I suggest downloading it and trying it.

I've fixed both issues in the article. Thanks!

Uh-oh! I'm worried about this one. Can you post the output you do get? Can you compare against the nbody-1.rs full program?

The algorithm is kind of sensitive to where parentheses appear in floating point expressions. There's also some += and -= that will corrupt the results if swapped. (I know this because I screwed both up when I was initially transcribing the program.)

3 Likes

I went through the program line by line and I found my mistake. Now I do get the same output :slight_smile:

I haven't read the tutorial yet (and I'm not the target audience), but I wanted to quickly say thank you for writing this!!! We've tried to make the book background-agnostic, which means we're often not presenting the material in the best way for every reader. We need more background-specific resources like this!

18 Likes

cbiffle
Thanks for posting that brilliant article.

I'm not sure of the premise about C programmers though. As an old time embedded systems programmer, in C and other languages before that, I can appreciate their value system when selecting a language: Native compilation, no run-time overheads, small binaries, performance, and above all control of what is what.

As such, my first experiments in Rust were exactly reimplementing some C programs in Rust. So as to evaluate performance, code size etc. Of course I produced what is probably very bad, non-idiomatic, Rust code that looked like my C. I was immediately impressed how Rust met all the requirements I mentioned above.

But I had no "unsafe" anywhere. I just rearranged things a bit until the compiler was happy.

For example:

This FFT. 32 bit integer maths only. Originally written in Spin for the Parallax Propeller. Then PASM and C: https://github.com/ZiCog/fftbench

This anagram finding challenge: GitHub - ZiCog/insane-british-anagram-rust: Rust program to find anagrams in the Debian british-english-insane dictionary file.

This conversion of a C solution to the Project Euler problem #256 "Tatami-Free Rooms": https://github.com/ZiCog/tatami-rust

Anyway, love the article, I always learn a lot from everything you write.

2 Likes

I'm a few days late, but if you'd like to put this in the Rust Tutorials · GitHub GitHub org I'd be happy to have you.

2 Likes

I'd just like to say this is very well-written and +1 the sentiment that this fills a very real gap in the landscape of Rust education/documentation.

Aside from the #[allow(...)] thing that others mentioned already, the only potential improvement that jumped out to me is this: I'm of the opinion that we should avoid saying things "this unsafe code is safe, but that unsafe code is unsafe" because it makes "unsafe" into a very murky term implicitly switching between two or more meanings multiple times in a sentence, when it really needs to be a crystal clear term. Admittedly, this is not a settled issue with a clear community consensus, and for all I know I might be in the minority here, but the nature of unsafety is so central to this book that it's probably worth thinking about whether sentences like "Here's an aside on when unsafe is safe" should use some other word like "correct".


EDIT: spotted one other small thing in part 4:

Because the union is defined in the same file as advance , putting pub on the accessors doesn't actually do anything ... For the purposes of this tutorial, I'm keeping everything in one file.

I believe you can declare a mod inside that file, and then the rest of the file really would be forced to use the pub accessors. I'm not sure if that's a net pedagogical improvement, but it's probably worth considering.

Moderator note: I removed a tangent about the nonstandard_style lint's name. Please don't hijack the thread, especially for bikeshedding.

4 Likes

I like the emphasis on 'whatever you could do in C, Rust can do it too', it's a much-needed approach.

However, it may be a good idea to change the order of lessons around a bit. The way it looks right now, I'd expect the typical C programmer to look at the first few diffs, then conclude "Rust is way too verbose/complicated, I already know C and can handle memory just fine, because I'm a Good Programmer™", which means that most of them are never going to get to the good parts.

Also, a quick primer on Rust's variable declaration syntax may be a good idea. For someone who only ever used C, asm and Bash, something like : [f64; 3] may look like utter gibberish, and type inference may be mistaken for dynamic typing.

7 Likes

The standard language a lot of us are trying to standardize around is "sound" versus "unsound" for functions. A block is unsound if it does something illegal, and sound if it breaks any rules.

For a safe function, in order to be sound, it must be sound for all possible inputs (and state, if relevant) producible in safe code. For an unsafe function, it must be sound over all documented supported inputs (and state). Unsafe functions also get the distinction of having sound and unsound invocations.

And then the final step would be "unsafe but unmarked and private" which is a safe function that is sound for some inputs but not all, but can still be fine if encapsulated in code which is always sound. (But many people would tell you to mark it unsafe for greater explicitness anyway.)

8 Likes

"unsafe", "unsound", that is the same to me. How confusing can we make this?

At the end of the day checks are done by the compiler, or they are not done. "checked"/"unchecked".

1 Like

Yeah, I agree, actually. I went back and forth on phrasing there. To some degree, I like the juxtaposition of things like "making safe out of unsafe," but I also see that it could be confusing.

I'm a fan of the sound/unsound distinction that @CAD97 raised, and I don't think it's excessively jargony. I'll play with the wording.

I think that's a real risk, but I'm also not sure that I can reach someone who's approaching the tutorial looking for reasons to stop reading it. I could be more forceful in Part 1 pointing out that the code is going to get worse before it gets better, I suppose.

I agree that the words are uncomfortably similar. Unchecked/checked isn't the distinction we're trying to make, though -- it's between these cases:

  1. This bit of code is written using safe parts (in the Rust sense) and is sound. (Which should be true of all safe code, but sometimes safe code has bugs.)
  2. This bit of code is written using unsafe parts and is unsound (which is usually what happens the first time I try to write something using unsafe).
  3. This bit of code is written using unsafe parts and is nevertheless sound.

Both 2 and 3 are "unchecked" in the sense that the compiler doesn't have our backs.

I think the safe/sound distinction appeals to me because of the English expression "safe and sound" meaning that something is comfortably secure (and probably also warm and cozy). With Rust, achieving "safe and sound" is relatively easy.

...now I feel like I want a Rust sweater.

10 Likes

Also, "soundness" is a term of art from mathematics.

1 Like

I'm not sure if you're offering that as an argument in favor or against the use of the term; in general, I've specifically tried to avoid using math jargon in these articles. The existence of a non-math analogy for soundness makes me more inclined to use it, though.

When used with a specific meaning in parts of logic and proof theory, "sound" is not mathematical jargon, but a mathematical term, just like "function", "real number", "complex number", "group", "field", "self-adjoint".

Unfortunately, i am not experienced enough in that part of logic to see if the term is appropriate here.

I was puzzled for a moment by "mathematical jargon", but then i've figured out that "pathological", "elegant", "folklore", etc., are examples of mathematical jargon.

1 Like

Hm. I didn't intend jargon as a value judgment, but it sounds like you heard it that way. By "jargon," I mean terms that are opaque or difficult to understand for people outside of a particular field or profession. I'm explicitly trying to avoid using math terminology that isn't commonly used in programming, so that the tutorials remain accessible to people (like me) without formal math training. And so "function" is okay but I'm not using terms like "field," "ring," "isomorphic," or (say) terms from category theory.

Hope that makes sense. I'm not attacking math, just aiming for a target audience who may not be comfortable with it.

Simple reference on the application in logic: Soundness in Logic { Philosophy Index }

The term sound is most frequently used to describe whether or not an argument is valid and has true premises, thereby guaranteeing the truth of its conclusion.

So our application to soundness of safe code at least aligns with the definition in logic, in that it's true (here the predicate is the lack of undefined behavior) over all possible inputs.

But I also agree that the colloquial definition is what matters for coining a term. And here it applies just as well that let it be used as the logical term.

1. in good condition; not damaged, injured, or diseased.
2. based on reason, sense, or judgment.

1 Like

"safe", "sound" whatever. Nobody outside of mathematics knows if that is some technically defined meaning or just casual mathematician slang.

For the same reason can we avoid words like "monad", "monoid", "functor" when talking about Rust?

At the end of the day the source code is mechanically checked for memory aliasing problems or it is not. If not a human has to check it.

So "unsafe" means not mechanically checked by the compiler. If I understand Rust correctly.

That does not mean it is actually unsafe. It means you have to trust a human rather than the compiler.

4 Likes

It's that specific difference that we're trying to capture in a safe/sound distinction. Even if casual usage of the terms is casual, it definitely helps if written material consistently uses two different terms for the different meanings.

Even pure synonyms are noticed when used for disjoint (but related) concepts consistently. If we save even one moment of thinking between "is this "unsafe as in human-checked" or "unsafe as in causes UB", then the term split is worth it. It doesn't even matter if it's a conscious acknowledgement in the reader if we can bias them towards assuming the correct case from the start.

I understand the position of "terminology doesn't matter that much, context will figure it out," but I don't agree with it. (Maybe it's my formal background coming through.) Terminology exists to help the reader understand faster, and we should especially try to make understanding easier around unsafe (as in human-checked) code.

That's why "jargon" isn't great: you either know the meaning or you don't, and if you don't, it's worse than a more ambiguous but more obvious term.

Oh, and one more thing:

That's not correct. Code in an unsafe block is just as checked as code not in an unsafe block. It "just" gives you the superpowers of 1) dereferencing raw pointers, 2) using union fields, and 3) calling other unsafe APIs. This gives you the power to break rules upheld mechanically in not-unsafe code, but nothing changes about the safe subset of the language.

6 Likes