Ambiguity of "unsafe" causes confusion in understanding

I just tried to turn this into a Pre-RFC just to check what others think about it. Feel free to chime in! [Pre-RFC] Another take at clarifying `unsafe` semantics - Rust Internals

1 Like

Perhaps "trustme" instead of "unsafe"? "unchecked" also makes sense. But really once you "get it", it doesn't matter quite so much. It's just during the learning process.

2 Likes

I would be okay with having two different keywords, however...

New users shouldn't even be aware that unsafe exists. Even experienced Rust programmers should rarely use unsafe.

After reading your example thread, I think the issue isn't with the unsafe keyword, the issue is a misunderstanding of the entire concept of unsafety in Rust.

As others have pointed out in that thread, unsafe isn't a "lint" that you can just turn off and ignore. It's rarely appropriate to call C APIs directly without any sort of conversion or checking.

Writing safe wrappers for unsafe code is idiomatic and highly recommended, but those wrappers need to carefully guarantee that the code is safe (by using conversions, checks, or programmer knowledge). Writing correct unsafe code is difficult, and requires advanced knowledge of Rust.

Therefore, I think changing the keyword won't fix the problem at all. Instead, the solution is to have better documentation that explains what unsafe means, and how to write correct unsafe code. Perhaps it would be good to have unsafe functions in the documentation automatically provide a link to the Book.

9 Likes

Precisely my thinking as well. You can use any keyword you want, terse or verbose - if the concept, rules, docs and so on aren’t up to snuff, it won’t matter. It’s like C++11 memory order variants - they have different names for different semantics but the semantics require elaborate explanation; the names are just familiar mnemonics to use once the semantics are understood.

3 Likes

I agree. The more this gets debated the more I'm convinced that changing or otherwise supplementing the "unsafe" keyword will accomplish nothing. In fact, it seems like even people who are suggesting differing keywords seem to have rather circuitous arguments that don't seem to be grounded in a clear understanding of what unsafe vs undefined behavior vs wrapping unsafe code is. I think you all are correct. The thing that is needed is more explanation of unsafe with specific examples, etc.

1 Like

The sad reality is that all C code is unsafe, because C doesn't have enforced automatic exhaustive memory safety checking. So indeed, however upset those developers might be, the code they write is still unsafe all the way around.

You appear to be confusing "unsafe" with "has a memory management bug". There may be pieces of unsafe code that do not actually have a memory management bug. In fact, we want (and hope) that no pieces of unsafe code have memory management bugs. Still, this doesn't practically rule out the possibility of the existence of such bugs.

All in all, unsafe means "this could potentially have a memory management bug if written or changed without enough caution", and not "this definitely/unconditionally has a memory management bug".

3 Likes

It absolutely does — that's the whole point of writing a safe wrapper around an unsafe function. The wrapper can validate the parameters, dynamically ensure all other kinds of invariants which are necessary for the call to the underlying unsafe function to always be correct, and only then call the function. This is a fundamental kind and use of abstraction in Rust.

1 Like

Please, let's not add more keywords for the same concept. I think it would only cause more confusion. I still remember when I first learned about unsafe in Rust — and it wasn't anything remotely hard to understand, but I would have found it pretty annoying if I had had to memorize two different keywords instead of one.

This is because unsafe actually has one very clear and specific meaning: "doing this is dangerous because many of Rust's safety rules are not enforced".

Now the "doing this" part can be different – it can be an FFI function call, it can be dereferencing a raw pointer, it can be the creation of a string from bytes that are not verified to be UTF-8, it can be any expression, really. But exactly what kind of expression is unsafe doesn't really matter — it's the same concept all the way down. An unsafe function is unsafe because it requires preconditions that the compiler can't verify. This is true whether you are implementing the function, "looking from the inside out", or calling it, "looking from the outside in".

And an unsafe block communicates the identical issue, too: "there's stuff in this block that is not verified by the compiler, so you need to be extra careful".

I don't think there's a fundamental difference between function declarations, calls to unsafe functions, or any other unsafe expression in this regard – they are two (or three, or however many you can think of) aspects of the exact same issue: the compiler averts its eyes and lets you do several dangerous things.

1 Like

I've come to agree that adding to or changing the "unsafe" keyword will not help in the understanding of the concept; however, I disagree that declaring unsafe traits and functions is the same as implementing unsafe traits and calling unsafe functions (dereferencing a pointer can really be thought of as calling and unsafe function as well, it's just a function that is built-in/intrinsic to the compiler).

The only reason I suggested perhaps adding differentiation for these cases was because of this confusion that seems to be apparent with those new to Rust. They tend to (based on questions and threads I've seen like the one I pointed to) that an "unsafe" function needs to be called from inside an "unsafe" block so that the compiler won't complain rather than understanding that the "unsafe" function has pre-conditions/post-conditions that must be upheld by the caller and that putting it inside and "unsafe" block is your way of letting the compiler know, "Yes, I know there is an unsafe contract to uphold here, and yes, I have in fact done the necessary work to uphold that contract, so, you can allow me to call this function that has an unsafe contract". More importantly, just putting it inside an "unsafe" block so that the compiler won't complain, without understanding and upholding the necessary contracts, is not accomplishing anything except for creating code that will more than likely exhibit undefined behavior and likely have security and correctness issues.

The important point that I thought might be possible to solidify in the minds of newbies with the differentiation was exactly this. But, after reading the responses to this thread (and other threads spun off from it), I'm skeptical that any combination of keywords will be any better than just "unsafe".

2 Likes

This is something that I think is the exact wrong message about unsafe. Importantly, it isn't true. "unsafe" does not allow anything. It allows 4 specific things that are otherwise not allowed:

  • dereferencing raw pointers
  • calling "unsafe" functions
  • implementing "unsafe" traits
  • accessing a mutable static (aka Global) variable

Yes, technically, because you can call an "unsafe" function and dereferencing a raw pointer can do arbitrary things, it is somewhat true that "unsafe" allows anything, but, that is not the point. Interestingly, things like borrow checking continue to function even inside unsafe blocks. All other compiler checks are upheld as well.

Again, the important thing to solidify in the minds of newbies is:

  • inside an unsafe block, if you call an unsafe function, you must ensure you've done whatever is necessary to uphold the pre-conditions of that function; otherwise, you could and likely will have undefined behavior. Also, if you are calling an "unsafe" function, that function must be well-behaved as long as its pre-conditions are met; otherwise, that function is buggy.
    * if you dereference a raw pointer, you must have ensured prior to that that the pointer does in fact point to accessible memory that is initialized with the appropriate content and at the appropriate alignment for the data-type you are dereferencing to. If not, it will be undefined behavior
  • if you access read or write to a mutable static variable, you must ensure that either you are the only thread doing that (guaranteed), or that you have done appropriate synchronization by using something like mutex.
  • if you implement an unsafe trait for your struct, you must ensure that the combination of fields on your struct and their access modifiers guarantee that it is not possible, through any combination of current or future safe traits, to violate the contract requirements of that unsafe trait
2 Likes

I have amended the wording in my response – but I feel that is splitting hair and I don't think the exact phrasing is the point (or was in what I wrote).

1 Like

Yes, but, in this case, it is really important for clarity to split the hairs (IMHO). But, please, don't take anything I said as a criticism of what you said. More of an amplification/clarification.

To be absolutely clear it would have to be a longer word

potentially_unsafe

You can't say "verified", because the compiler doesn't know that - thats the whole point.

I think 'unsafe' is fine. it draws attention and people will ask what it means

2 Likes