Learn Rust the Dangerous Way - the unsafe-first tutorial

I'd just like to say this is very well-written and +1 the sentiment that this fills a very real gap in the landscape of Rust education/documentation.

Aside from the #[allow(...)] thing that others mentioned already, the only potential improvement that jumped out to me is this: I'm of the opinion that we should avoid saying things "this unsafe code is safe, but that unsafe code is unsafe" because it makes "unsafe" into a very murky term implicitly switching between two or more meanings multiple times in a sentence, when it really needs to be a crystal clear term. Admittedly, this is not a settled issue with a clear community consensus, and for all I know I might be in the minority here, but the nature of unsafety is so central to this book that it's probably worth thinking about whether sentences like "Here's an aside on when unsafe is safe" should use some other word like "correct".


EDIT: spotted one other small thing in part 4:

Because the union is defined in the same file as advance , putting pub on the accessors doesn't actually do anything ... For the purposes of this tutorial, I'm keeping everything in one file.

I believe you can declare a mod inside that file, and then the rest of the file really would be forced to use the pub accessors. I'm not sure if that's a net pedagogical improvement, but it's probably worth considering.

Moderator note: I removed a tangent about the nonstandard_style lint's name. Please don't hijack the thread, especially for bikeshedding.

4 Likes

I like the emphasis on 'whatever you could do in C, Rust can do it too', it's a much-needed approach.

However, it may be a good idea to change the order of lessons around a bit. The way it looks right now, I'd expect the typical C programmer to look at the first few diffs, then conclude "Rust is way too verbose/complicated, I already know C and can handle memory just fine, because I'm a Good Programmer™", which means that most of them are never going to get to the good parts.

Also, a quick primer on Rust's variable declaration syntax may be a good idea. For someone who only ever used C, asm and Bash, something like : [f64; 3] may look like utter gibberish, and type inference may be mistaken for dynamic typing.

7 Likes

The standard language a lot of us are trying to standardize around is "sound" versus "unsound" for functions. A block is unsound if it does something illegal, and sound if it breaks any rules.

For a safe function, in order to be sound, it must be sound for all possible inputs (and state, if relevant) producible in safe code. For an unsafe function, it must be sound over all documented supported inputs (and state). Unsafe functions also get the distinction of having sound and unsound invocations.

And then the final step would be "unsafe but unmarked and private" which is a safe function that is sound for some inputs but not all, but can still be fine if encapsulated in code which is always sound. (But many people would tell you to mark it unsafe for greater explicitness anyway.)

8 Likes

"unsafe", "unsound", that is the same to me. How confusing can we make this?

At the end of the day checks are done by the compiler, or they are not done. "checked"/"unchecked".

1 Like

Yeah, I agree, actually. I went back and forth on phrasing there. To some degree, I like the juxtaposition of things like "making safe out of unsafe," but I also see that it could be confusing.

I'm a fan of the sound/unsound distinction that @CAD97 raised, and I don't think it's excessively jargony. I'll play with the wording.

I think that's a real risk, but I'm also not sure that I can reach someone who's approaching the tutorial looking for reasons to stop reading it. I could be more forceful in Part 1 pointing out that the code is going to get worse before it gets better, I suppose.

I agree that the words are uncomfortably similar. Unchecked/checked isn't the distinction we're trying to make, though -- it's between these cases:

  1. This bit of code is written using safe parts (in the Rust sense) and is sound. (Which should be true of all safe code, but sometimes safe code has bugs.)
  2. This bit of code is written using unsafe parts and is unsound (which is usually what happens the first time I try to write something using unsafe).
  3. This bit of code is written using unsafe parts and is nevertheless sound.

Both 2 and 3 are "unchecked" in the sense that the compiler doesn't have our backs.

I think the safe/sound distinction appeals to me because of the English expression "safe and sound" meaning that something is comfortably secure (and probably also warm and cozy). With Rust, achieving "safe and sound" is relatively easy.

...now I feel like I want a Rust sweater.

10 Likes

Also, "soundness" is a term of art from mathematics.

1 Like

I'm not sure if you're offering that as an argument in favor or against the use of the term; in general, I've specifically tried to avoid using math jargon in these articles. The existence of a non-math analogy for soundness makes me more inclined to use it, though.

When used with a specific meaning in parts of logic and proof theory, "sound" is not mathematical jargon, but a mathematical term, just like "function", "real number", "complex number", "group", "field", "self-adjoint".

Unfortunately, i am not experienced enough in that part of logic to see if the term is appropriate here.

I was puzzled for a moment by "mathematical jargon", but then i've figured out that "pathological", "elegant", "folklore", etc., are examples of mathematical jargon.

1 Like

Hm. I didn't intend jargon as a value judgment, but it sounds like you heard it that way. By "jargon," I mean terms that are opaque or difficult to understand for people outside of a particular field or profession. I'm explicitly trying to avoid using math terminology that isn't commonly used in programming, so that the tutorials remain accessible to people (like me) without formal math training. And so "function" is okay but I'm not using terms like "field," "ring," "isomorphic," or (say) terms from category theory.

Hope that makes sense. I'm not attacking math, just aiming for a target audience who may not be comfortable with it.

Simple reference on the application in logic: http://www.philosophy-index.com/logic/terms/soundness.php

The term sound is most frequently used to describe whether or not an argument is valid and has true premises, thereby guaranteeing the truth of its conclusion.

So our application to soundness of safe code at least aligns with the definition in logic, in that it's true (here the predicate is the lack of undefined behavior) over all possible inputs.

But I also agree that the colloquial definition is what matters for coining a term. And here it applies just as well that let it be used as the logical term.

1. in good condition; not damaged, injured, or diseased.
2. based on reason, sense, or judgment.

1 Like

"safe", "sound" whatever. Nobody outside of mathematics knows if that is some technically defined meaning or just casual mathematician slang.

For the same reason can we avoid words like "monad", "monoid", "functor" when talking about Rust?

At the end of the day the source code is mechanically checked for memory aliasing problems or it is not. If not a human has to check it.

So "unsafe" means not mechanically checked by the compiler. If I understand Rust correctly.

That does not mean it is actually unsafe. It means you have to trust a human rather than the compiler.

4 Likes

It's that specific difference that we're trying to capture in a safe/sound distinction. Even if casual usage of the terms is casual, it definitely helps if written material consistently uses two different terms for the different meanings.

Even pure synonyms are noticed when used for disjoint (but related) concepts consistently. If we save even one moment of thinking between "is this "unsafe as in human-checked" or "unsafe as in causes UB", then the term split is worth it. It doesn't even matter if it's a conscious acknowledgement in the reader if we can bias them towards assuming the correct case from the start.

I understand the position of "terminology doesn't matter that much, context will figure it out," but I don't agree with it. (Maybe it's my formal background coming through.) Terminology exists to help the reader understand faster, and we should especially try to make understanding easier around unsafe (as in human-checked) code.

That's why "jargon" isn't great: you either know the meaning or you don't, and if you don't, it's worse than a more ambiguous but more obvious term.

Oh, and one more thing:

That's not correct. Code in an unsafe block is just as checked as code not in an unsafe block. It "just" gives you the superpowers of 1) dereferencing raw pointers, 2) using union fields, and 3) calling other unsafe APIs. This gives you the power to break rules upheld mechanically in not-unsafe code, but nothing changes about the safe subset of the language.

6 Likes

I feel that having a distinction between "this requires an unsafe block" and "this is undefined behaviour" is quite important, and unsafe vs unsound has served this purpose quite nicely for me.

10 Likes

I'm not a native English speaker, and "sound" was part of my non-technical vocabulary meaning exactly what you want to mean by it. So I think its non-technical meaning is specific enough to use it in Rust tutorials trying to avoid technical jargon from math and logic.

7 Likes

Me neither: "folklore" or "pathological" are examples of mathematical jargon, without value judgment :slight_smile:


Update 2019-12-24. Hoping to clarify a bit: "corner case" is jargon, "abstract class" is a term.

Unsoundness is what happens when unsafety goes wrong.

7 Likes

I think everyone here is making this more complicated than it has to be. Undefined behavior (UB) can be expressed simply by stating that code behaves undefined and defined behavior can simply be referred to as code behaving defined. There's no reason to say unsafe (that means you're taking a risk, not that it always goes wrong) or unsound (a term I never hear except when talking about compilers).

Safe code always behaves defined, even in an unsafe environment. Unsafe code behaves defined, as long as its constraints aren't violated, otherwise it behaves undefined.

Often I will use the phrases "undefined behaviour" and "not undefined behaviour", because I feel that the word "defined" is awkward, because it clashes with a different use, namely the behaviour defined by the documentation of the library.

1 Like

That's where the terms specified and unspecified behavior come in.