How is locking a Mutex not unsafe?

From https://doc.rust-lang.org/stable/std/sync/struct.Mutex.html

The exact behavior on locking a mutex in the thread which already holds the lock is left unspecified.

This seems like a fancy way of saying the behavior is undefined

1 Like

The following sentence is:

However, this function will not return on the second call (it might panic or deadlock, for example).

(emphasis mine)

4 Likes

The distinction is that it could do anything [1] except undefined behavior.


  1. other than return, as further documented ↩︎

2 Likes

That still seems like undefined behavior to me. Returning isn't okay, but doing anything else is, it seems.

1 Like

I guess the issue is that I view "undefined" and "unspecified" as synonyms.

They aren't. Furthermore, panicking and deadlocking are safe (they don't lead to memory corruption).

1 Like

The documentation does not say that those are the only two options though. Sending rude messages to your grandma would not contradict the documentation.

2 Likes

Sure, it could do that, so long as it panicked or deadlocked later. However, sending rude messages to your grandma is a specific, well-defined behavior. Undefined behavior means your code could do anything at any time for any reason, or no reason at all. If your code unconditionally tries to reentrantly lock a mutex twice, it is guaranteed to execute everything that came before it exactly as it's supposed to, before doing whatever the unspecified behavior happens to be. If your code unconditionally triggers undefined behavior, then nothing about the execution of your code is guaranteed. In that scenario, it would be perfectly valid for the compiler to delete whatever code you wrote and replace it with one that only sends hate mail to your grandma.

There are plenty of safe ways to trigger unspecified behavior, and by definition, there are no safe ways to trigger undefined behavior. Heck, there are plenty of scenarios where deliberately invoking unspecified behavior, like calling Iterator::next on an iterator over a channel's incoming values. There is never a reason to trigger undefined behavior in reachable code.

7 Likes

Indeed, that's the issue. Why do you view them as synonyms? They are different words with different semantic.

True. But doing anything “strange” before that point would be reached is not allowed.

Undefined behavior, on the other hand, is allowed to time-travel. Full normative Rust specification doesn't exist yet, but Rust shares LLVM backend with C++ and in C++ it's very explicitly allowed:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

IOW: if you program triggers undefined behavior than it may behave randomly not just after that point, but also before it, too.

With unspecified behavior it may do different things, but only after that point. Big difference.

5 Likes

Depends on your definition of “reachable”. It's perfectly fine to have an unsafe API which does UB in the, presumably, reachable code (if documentation explains how to avoid it). In fact there's function which specifically exist to never be called!

It's still very useful and important. Because it's a binding promise to the compiler: yes, that code can be reached, but hey, act as if it wouldn't be… I promise that this code path wouldn't ever be used!

I agree this is essentially undefined behavior. Undefined behavior is often described as "nasal demons". "Rude messages to your grandma" is also a good description.

In a lot of places Rust documentation has this very special notion of "unspecified behavior", or as it is sometimes called "anything goes other than undefined behavior". For all practical purposes this is much closer to what in C++ would be called "undefined" than "unspecified".

It's essentially: undefined behavior except it will not corrupt your RAM, but it may do pretty much any other nasty thing other than corrupting your RAM.

I wish Rust documentation would just call it "memory unsafety" or something like that, rather than "undefined behavior".

Not by C++ terminology. If it said "it may panic, hang, or send a rude message to your grandma" it would be well defined but unspecified behavior. But if it's "I have no idea what will happen except RAM will be fine" then it's not well defined behavior, and it really requires some stretch of semantics to say "oh but it will not corrupt the RAM, so the options are limited, so it's well defined".

I think some of this confusion comes from the fact that LLVM has this concept of "undef" or "poison" (i.e. explicitly invoked undefined behavior), and compiler people wrongly conclude that if undef or poison can't happen then there is no undefined behavior.

The only real way to avoid undefined behavior is to define the behavior (and not in an extremely open ended way).

I think Mutex should just say it will deadlock or panic.

3 Likes

How? That's what C++ calls unspecified behavior:

From what I'm seeing Rust uses the same approach: any implementation need to pick one kind of behavior (it may panic all the time or send angry letters to grandma every time) and stick to it… and there is no need to document that choice, the only requirement is to be consistent.

No, it's not like that. Calling free on a pointer which was already freed is undefined behavior because we can't predict what will happen. And there is no undef or poison in sight.

But calling lock on mutex which is already locked doesn't do anything that drastic. Any implementation must pick one behavior and use it consistently. And there are even some limitations on what's allowed! But the exact choice is not documented and can be different in different implementations (including code compiler with different compiler options, they are different implementations).

Rust very explicitly marks certain behaviors unspecified. E.g. if you can Enum with two values foo and bar then transmute from Option<ENUM> to int (of the appropriate size) would always produce something, but it's not specified what exactly would it produce. On purpose. You are not supposed to rely on that.

How is Mutex different?

3 Likes

Undefined Behaviour is a very technical term, which Rust has borrowed from C++. John Regehr has a good series of articles (3 parts) about that concept, and also take a look at a series by Chris Lattner (of LLVM fame).

The TL;DR is that unspecified behaviour is anything that a pure Python program can do. This includes basically anything which you could care about, including sending angry mail and erasing your hard drive. But importantly, some weird behaviours are excluded. You can't prove that all numbers are even. You can't time travel. You can't have something both true and false simultaneously. As you see, the lack of UB is an extremely useful property, even for an arbitrarily broken program.

Yes, you may have horrible bugs, but your program's behaviour is fundamentally reproducible, and thus debuggable. At least in principle, you can capture all of its interactions with the operating system (all syscalls, all I/O), rerun the program with the same environment simulated, and get the same results. With UB, that's not true. Simply recompiling your program may give a program with absolutely different behaviour.

Integer overflow in C++ doesn't corrupt your RAM directly, but it does make your program undefined (which may corrupt RAM, or may reduce all your code to a noop, or make it erase your hard drive when the planets are aligned). Reading uninitialized memory also doesn't corrupt memory, but it does turn your program into garbage. Same with mutating through a & reference.

6 Likes

That's what I mean by "stretch of semantics". In reality, every time C++ specs say "unspecified behavior" it explicitly lists the things that can happen and says the implementation is to pick one of those. It's never "anything in the world except X can happen".

C++ has no such requirement that unspecified behavior be consistent. It does not have to be consistent. It just has to be one of the options listed. For example, the order of evaluation of function arguments in C++ is unspecified. It does not have to be consistent between calls.

The difference is that in the case of enum casting there is a list of things that can happen, but in the case of Mutex there is only a list of things that cannot happen.

1 Like

This seems very similar to the issue a few weeks back about (iirc) hash containers where the user-provided Hash didn't uphold the requirements. That ended up with the unspecified behavior being documented to be bounded to the container and it's user-provided contents.

5 Likes

The problem is that Mutex delegates to the system mutex, and we just can't provide any guarantees about its behaviour. All kinds of operating systems exist.

Well… that's more than can be said about C+ mutex. It only says that you can call mutex's lock if you don't own it. What happens if you try to call it while holding it remains a complete mystery.

Good point. I guess the right words would be “it should be possible to document the behavior” (but there are no need to actually document it).

Sure, but C++ is not claiming that locking a mutex twice is well defined behavior. It's undefined behavior in C++.

If that's true, then it should be unsafe on those operating systems where locking a mutex twice can burn down your house potentially or whatever. I don't think it's really true, at least in some abstract model we assume of the OS.

1 Like

See also: Using std::collections and unsafe: anything can happen - documentation - Rust Internals

4 Likes

That's not the definition of reachable I was going for. If unreachable_unchecked is reached, it's undefined behavior. Whenever an action that could potentially cause undefined behavior is done, it acts as a promise to the compiler the thing that causes UB is never actually reached. There are different ways of preventing this (ie, documenting an unsafe function's prerequisites, or doing the necessary checks yourself) but if your program ever does have reachable undefined behavior, it is by definition malformed and could do anything.

As rust defines it, the key difference between unspecified and undefined behavior is that unspecified behavior is predictable. A double lock of a mutex emailing your grandma is definitely unexpected, but your code will still do exactly what it's supposed to do up until that point, and if you were to run it again, it would still email your grandma at the same point. With undefined behavior, there is no such guarantee. Additionally, there are plenty of consequences for invoking undefined behavior outside of memory corruption. Even on wasm, where 0 is a readable and writable address, dereferencing a null pointer in rust will cause undefined behavior, and thus cause your code to unpredictably do something unexpected when rustc assumes every branch that leads to that dereference can't be reached.

If your code invokes undefined behavior, it means anything can happen at any time for any reason or no reason at all. If your code invokes unspecified behavior, that means some specific thing chosen by the implementation will happen, and its effects are guaranteed to be specific and predictable, but not portable.

6 Likes