How is locking a Mutex not unsafe?

Regardless of the precise definition of "undefined behavior", I think there is clearly a confusion in the docs of a different sort:

  • "Undefined behavior" talks about some language constructs which have ill-defined behavior, i.e. it's about certain invalid things in the source code or certain invalid operations in the Rust abstract machine.

  • What distinguishes Rust is that it promises that as long as you don't use undefined behavior, you are guaranteed certain properties (local variables you aren't mutably sharing won't get modified, etc). These properties are typically called something like "memory safety".

  • Therefore, something like HashMap should not be saying "function will not result in undefined behavior". That's confusing causes and effects. Using undefined behavior is the cause. The effect is memory unsafety.

Users don't even care what language constructs std source code uses. It's using magic features anyway, might as well be written in another language. So what HashMap should be promising is that its functions still guarantee memory safety even if your Eq is broken, rather than talking about not using undefined behavior inside HashMap implementation.

2 Likes

Distinguishes it
 from what? It's the exact same guarantee C or C++ does.

This would immediately prompt questions from folks like me (with background in C++): why not just say that it doesn't trigger undefined behavior? Does it mean Rust uses some strange and unusual definition of UB? Or maybe it guarantees some safety even in the presence of UB? Or
 why the heck something that looks like “no UB”, smells like “no UB” and for all intends and purposes says “no UB” is not called “no UB”?

The perpetual question of whether you want your documentation to be easily understandable by laymans or by “trained professionals”.

I think Rust developers made the right choice: for decades C and C++ developers (and to lesser degree developers who used other non-managed languages like Ada or Object Pascal) used “no UB” as shorthand for “program which have bounded, somewhat predictable, behavior” while “this is UB” was used as shorthand “that code shouldn't exist, we refuse to even think about how it should perform, just go and fix it”.

You may like it or dislike it, but that's how it's used. And while maybe some people who don't know what “undefined behavior” means may exist while simultaneously aware of what “memory safety” means
 is that such a large group? Is it worth optimizing documentation for this, somewhat small, group?

3 Likes

Perhaps that wasn't the best example. There are other guarantees that C++ doesn't give. For example: there are no dangling references.

1 Like

References in rust and C++ are a bit different, but creating a dangling one is undefined behavior in both languages.

edit: fixed borked quote

That's what C++ developer have to guarantee, not the compiler. Similarly to how Rust developer must guarantee that there would be no simultaneous T& and &mut T references pointing to the same object.

Sure, in Rust you can only trugger UB that in unsafe code block (soundness pledge), but otherwise there are absolutely no difference: undefined behavior is something developer have to avoid, somehow, not something that you may reason about.

1 Like

Is it really? I thought this is valid C++ code, as long as you never dereference your dangling reference:

  std::vector<int> v(100);
  int &a = v[0];
  v.resize(1000);

True. Rust have fewer undefined behaviors but hey are stricter. In C++ it's UB to access dangling reference, in Rust it's UB to create dangling reference.

This is related to difference in lifetimes: in C++ lifetimes are lexical which means “normal” code often creates dangling references and pointers (which are then just never accessed). In Rust lifetimes are non-lexical, which makes it feasible to declare even creation of live dangling reference undefined behavior.

Rules are, indeed, subtly different, but metarules (rules about the rules) are the same: if your program triggers undefined behavior then you have to go and fix it, if it merely triggers unspecified or implementation-specified behavior then you can still meaningfully reason about what it would or wouldn't do.

1 Like

Yep, my bad, dangling references are legal to create in C++, and in some limited circumstances, it's even legal to use, although it's pretty much always a bad idea. Still, if you do use that dangling reference and trigger UB, anything can happen at any time for any reason. Again, contrast this to unspecified behavior, where it will behave in a manner you can predict and reason about.

That sounds reasonable. I suggest you make a PR to change the docs.

Because the C++ terminology is terrible. The more we use meaningful terms instead of their word soup and acronym salad, the better. Trained professionals should know what "memory safety" means either way, while for everyone else at least there is a clearly distinct and easily googlable term, rather then several almost but not quite the same words.

Are you seriously arguing that the people who don't know what Undefined Behaviour means are somehow a small group? Hint: all the people writing Javascript, Python and Java don't know that, unless they are also proficient with C++.

2 Likes

It's worth optimizing the document. I don't know C++, but it's hard to understand what this means

Memory safety is a different concept than undefined behavior. While basically anything that would be considered memory unsafety is also undefined behavior, there are plenty of ways to trigger UB without causing memory unsafety (well, UB can cause memory unsafety but that's beside the point). Reaching a call to unreachable_unchecked is hardly memory unsafety, but it still is UB to call, and triggering that UB is still liable to cause sigills, time travel, silently taking the wrong branch, or any other behavior, all without needing to mess up any memory at all.

5 Likes

Bur how many if them would understand what memory safety even means in that context? They probably know what memory safe language is.

But what Is a memory safe function? How does it different from memory unsafe function? I wouldn't know, I have never seen these terms used precisely like that.

And the only way I can internalize for myself what memory safe function mean is, I guess, well
 sound function which couldn't trigger undefined behavior even if it's abused and feed with problematic input.

But how would you teach that to someone who have no idea what undefined behaviur and sound function is?

Do we even have any place where someone tells what memory safe function is? In any language, in any documentations, not necessarily C++ or Rust documentation?

At least that word soup is defined, somewhere. You are proposing to use entirely ad-hoc terminology which is not used (AFAIK, anyway) by no one.

This may not be a bad idea if soundness and undefined behavior would be replaced with some other, easier to understand (but for whom?) term. IDK. But how do you know you wouldn't end up where Richard Feynman did?

I mean this:

I think maybe we should do something like what Rust did once: just replace bad term? Rust uses word unsafe to mark something which was, traditionally, called TCB (trusted computing base).

I worked on some project which had TCB once
 and it caused issues similar to what “undefined behavior” causes. 9 out of 10 newcomers would naturally gravitate to trusted directory and ignore untrusted. Because, hey, it's trusted, it have to be good! And wasn't easy to explain why they shouldn't mess there or try to move code from untrusted to trusted directory.

Rust uses unsafe for it's TCB which immediately triggers correct associations. Sure, it picked it from C# but it was absolutely the right decision.

Maybe replacing “frobidden code” or “invalid construct” or something like this would be enough?

Because right now “undefined behavior”, “unspecified behavior” and “implementation-defined behavior” look extremely similar (because they had looked similar in a world where compiler optimizations were severely limited), but in today's world
 it's really important to somehow separate “undefined behavior” from “unspecified behavior” and “implementation-defined behavior” in people's minds.

Because their effects are so drastically, radically, different.

2 Likes

I didn't say they are the same thing. I said not using undefined behavior is the cause and memory safety is the effect. If you execute unreachable_unchecked, you no longer have memory safety because any guarantee about that goes out of the window.

HashMap's deliverable is memory safety, and the mechanism it uses is avoiding undefined behavior inside its implementation, but that's an implementation detail. It could in theory use something considered undefined behavior in the language spec, with rustc making some special exception just for it.

So the point is, memory safety is the external deliverable.

You could say that the same is true for every other safe function in std, and that's true. HashMap just chooses to underline that aspect in the docs because users might have concerns just because Eq says "must" in its docs.

3 Likes

Except no one knows what “memory safety” even is when applied to functions and thus it's not known whether HashMap delivers it or not. Note that list of explicitly permitted behaviors includes the following: panics, incorrect results, aborts, memory leaks, and non-termination and that list is not even exhaustive.

I don't know what layman would think when your would say that certain function is memory safe, but I, for one, wouldn't think that function which may cause memory leaks and/or never returns is all that “memory safe”.

It may be “safe” in Rust terms (as in: it doesn't trigger undefined behavior) but it definitely does things which one wouldn't expect from HashMap (in normal operation mode).

You just say that using that function cannot violate memory safety. It's a whole program property, but it also depends on every individual operation being sound.

1 Like

I think the concept of memory safety is already very central for Rust. It's mentioned in the first paragraph of Wikipedia about Rust. Leaking memory and looping are not considered memory unsafe. Having dangling references on the other hand is considered memory unsafe. So, for instance, HashMap lookups will never give you dangling references.

Perhaps this concept could be exposed more prominently in the reference.

1 Like

It's paragraph which claims that Rust enforces memory safety. And you want to add sentence which says that certain function is memory safe.

Would layman even understand that it means anything at all if s/he comes from manager language?

I'm not so sure.

But in managed languages they are considered unsafe (and bug in Runtime). Granted, they have to use very funny definition of “memory leak” for that property to hold (e.g. endlessly growing queue is not a memory leak according to them), but I'm afraid it's detail which are too subtle for someone who doesn't know what “undefined behavior” is to distinguish.

Maybe. Frankly, “undefined behavior” is just bad term. It's just too mild.

It's as an electrical appliance which has “don't plug into more than 110V, may cause thermal damage” sign
when in reality it explodes with power enough to demolish the whole city where you live if you plug it into 220V outlet.

UB is the term used to describe "any guarantee about that goes out of the window".

For example, unreachable_unchecked could be compiled as a loop {}. That is a perfectly memory-safe manifestation of UB.

2 Likes

Well that's true, but it doesn't contradict what I said. "Memory unsafety" doesn't mean memory will definitely be corrupted. If you don't know whether or not it will be corrupted, that is already not memory safe. In order to be "memory safe", you need a guarantee.