Question of terminology: unsafe vs undefined behaviour

"unsafe" sounds worse than "undefined behaviour"
for example: if I had to buy a "unsafe" dog or a "undefined behaviour" dog, I'd probably go with the latter.

However, in Rust, "unsafe" is much "better" than "undefined behaviour" right?

In particular, all cases of "undefined behaviour" sounds like: "if you do this, we can't make any guarantee whatsoever on what your program does"

on the other hand, "unsafe" "merely" means "yeah, this looks dangerous, but trust me (the programmer), this is okay"

So it seems to imply that

  • competent programmer + "unsafe" = everything is still fine

  • regardless of skill + "undefined behaviour" = all hell breaks loose

Is this correct?

2 Likes

In particular, "unsafe" blocks, often seems to mean "hey_compiler_trust_me_this_is_okay" blocks

2 Likes

"Unsafe" may not be the best name, but all the alternatives that match better are way too long for a keyword.

2 Likes

Whenever Rust refers to "Safe" it's talking exclusively about memory safety. Unsafe blocks are simply areas of code where the compiler doesn't do any lifetime checks and you the programmer are responsible for making sure all those invariants are respected.

1 Like

This explains a lot -- i.e. why this tutorial I'm reading has unsafe blocks for anything that passes a pointer to a C api.

I'd take the unsafe dog. The unsafe dog could bite you, if you handle it incorrectly, but it won't if you don't. The undefined-behavior dog, on the hand, is an unsafe dog, that has already been treated in the wrong way and it will definitely bite you, no matter what you do afterwards. You just don't know where and when.

11 Likes

Why?

If the dog has defined dog like behavior you can ensure it is safe. Stick a muzzle on it, for example.

If the dog has undefined behavior you don't know what to do to make it safe. It could explode or do any unexpected thing.

2 Likes

I'd add that Rust also ensures thread safety.

That isn't quite true. All checks that are performed in safe code are still performed in unsafe code. Unsafe blocks allow access to certain features that the Rust compiler can't check. The programmer must ensure the safety of those features, and in some cases they can be used to bypass the usual checks the compiler performs. Specifically, the five things you can do in unsafe code that you can't do in safe code are:

  • Dereference a raw pointer
  • Call an unsafe function
  • Access a static mut variable
  • Implement an unsafe trait
  • Read from a union type
3 Likes

Allow me to refer back to my quote of the week from a few weeks back, in a slightly modified form:

Undefined behaivour is what happens when unsafety goes wrong.

Of course my quote was actually about the word unsoundness, but they mean the same thing — one is just something you are, while the other is something you have. E.g. something is unsound if it has undefined behaviour.

So the answer to your question is yes, because undefined behaviour is when things have already gone wrong. Basically there are some things you just shouldn't do. This involves reading array out of bounds or dereferencing a null pointer. Doing one of these things invokes undefined behaviour.

Unsafe on the other hand just refers to the ability to do a few extra things, and these things are unsafe because the compiler can't check if you invoke undefined behaviour with them. Note that this doesn't mean you're allowed to invoke undefined behaviour — if you break your promise to your compiler, you will be punished with undefined behaviour.

Regarding the turning off lifetime checks, that's a common misconception. References, lifetimes and borrowchecking are completely unaffected by unsafe blocks — it's just that you get the ability to dereference raw pointers, which are a type that has no lifetimes on it in the first place.

8 Likes

The terminology is less than ideal because these two terms have completely different origins and constraints. In particular, Rust didn't get to choose "undefined behavior".

unsafe is a Rust keyword. As a keyword, which some code needs to use several thousand times, it can't be super long. And in practice, "memory safety" is the only kind of safety that Rust-the-language understands, so "unsafe" always means that kind of safety. Thread safety is a special case of memory safety, and all the other kinds of "safety" I'm aware of are impossible to define at the level of a programming language.

"undefined behavior" comes from C/C++ standardese. Consider this (in?)famous, um... definition?

3 Terms and definitions [intro.defs]
...
3.27 [defns.undefined]
undefined behavior
behavior for which this document imposes no requirements

Not exactly unambiguous to the non-expert.

AFAIK this term is only a popular, semi-standard cross-language term because a) C/C++ have been the only important languages with optimizing compilers for a long time, and b) you cannot use C/C++ in practice without a lot of language-lawyering. Unfortunately, misconceptions about precisely what UB means in theory or in practice and what a healthy attitude around UB looks like are about as widespread as the term itself (I recommend https://www.youtube.com/watch?v=yG1OZ69H_-o on that subject).

Whenever UCG gets to the point that we begin seriously thinking about enshrining their UB rules in official Rust RFCs/the Reference/an actual spec, we should bikeshed the hell out of this "UB" term. But it's still the best term to use for this concept in the short term, because it's the mostly widely understood term that exists today, and Rust is clearly going to need that concept no matter how the UCG works out.

10 Likes

nomicon is best for now; change around the text and UB is code that "allows any of the following things."

I tend to think of unsafe as a separate language. What you want is analogies.

Safe is a dragon, while unsafe is the dog.
Safe is a crab, while unsafe is a hedgehog.
You can feed the crab anything and it will never die. Some food may kill the hedgehog.

There is then the term unsoundness.
The hedgehog creates mythical food that can kill the crab.

1 Like

The whole model of Rust safety relies on it being impossible to trigger Undefined Behavior from within non-unsafe Rust, which is the reason why unsafe Rust is so called: it is the part of the language where Undefined Behavior may be "created".

  • This property is key in that if, for instance, a Rust program segfaults, then some unsafe {} block or unsafe impl is responsible for it, and that's what should be audited.

TL,DR: if something has Undefined Behavior, it means it was already unsafe to begin with.

But one can do unsafe code carefully and in a sound manner. For instance, the std lib itself so does.

2 Likes

One thing I think that it is helpful to remember is that if there is undefined behavior, that UB can only be caused by unsafe blocks, FFI, or unsafe impls of unsafe traits; however, the problem could be in non-unsafe code in the sense that things in the same module that are private that should be upholding invariants that the unsafe code relies upon are not correctly doing their job. So, I personally would say the correct thing to say is:

This property is key in that if, for instance, a Rust program segfaults, then some code in a module that contains unsafe {} block or unsafe impl, including the unsafe block or impl, is responsible for it, and that's what should be audited. "Audited" in this context should mean:

  • Each unsafe block or impl has documented the invariants and contractual requirements that must be upheld to call the block without UB
    • Each module containing unsafe impls and/or blocks is audited to ensure ALL of the invariants and contractual requirements the unsafe code requires are upheld under all circumstances under which it is possible to invoke the unsafe code or impl. ALSO, all of the invariants/contractual requirement spelled out actually do make sense and can be manually proven to disallow UB.
  • Each unsafe block or impl has documented the invariants that is must uphold after leaving the block (or even during the lifetime of the block) to ensure it does not create UB
    • Each unsafe block/impl can be manually audited to see that it in fact DOES uphold the necessary invariants & contracts AND that the invariant/contracts as specified actually are sounds (i.e. do not permit UB).
2 Likes

Niko once called that unsafe boundary the Tootsie Pop model. Also see the followup post.

That's exactly what they are supposed to mean. Later, if you suddenly experience a memory leak, you know that it's because you did something unsafe earlier.

Memory leaks are not considered undefined behaviour :slight_smile: E.g. see mem::forget.

So, is there anything more final on these points now or is this something still be worked through in the UCG-WG? Is there anywhere where the latest status as to what constitutes the "unsafe boundary" is defined? For some reason, I thought it was the "module", but, (re)reading the above two posts makes me think it isn't as clear-cut as that (at least with respect to optimizations).

I'm not sure of the current status for UCG, but searching his blog did turn up a few more posts that mention the Tootsie Pop model: