Ambiguity of "unsafe" causes confusion in understanding

It seems like a lot of people new to Rust get confused by the dual-use meanings of the "unsafe" keyword:

  1. Declaring a Trait or Function has an unsafe contract that must be upheld to implement/call
  2. Declaring that contracts have been upheld

For example, see this person's confusion in this post: Can we get rid of the unsafe wrapper of OS functions?

It would perhaps be better if a different keyword(s) were used. For example, use "unsafe contract" (or just "unsafe") for function/trait definitions and "verified contract" (or just "verified") for "unsafe" blocks and trait impl's.

This has been bike-shedded to death I'm sure, but, since it seems to be a significant source of confusion about "unsafety of Rust", perhaps it is worth another go-round of a discussion in the community and perhaps an RFC? Especially given the new/upcoming 2018 Edition where keyword deprecation could be reasonably accommodated.

Any appetite for that?

Cross-Posted to Internals: Ambiguity of "unsafe" causes confusion in understanding - Rust Internals

3 Likes

I think there would be just as many questions about what exactly “verified” and “unsafe” mean. I think this is a topic where a single keyword won’t make it obvious anyway. And at that point, having separate keywords for the two aspects is needless language churn, IMO.

4 Likes

I think it could remove some confusions, mainly for separating two different meanings from the same keyword. But removing the 'unsafe' keyword would generate compatibility problems.

As naming alternative I suggest attested blocks and keep unsafe for functions.

Maybe we should keep using unsafe keyword in the function body, and replace the unsafe keyword in the declaratoin as "attested" or "trusted"? For example, if an external function is so declared:

pub trusted fn CreateWindowExA(
dwExStyle: DWORD,
lpClassName: LPCSTR,
lpWindowName: LPCSTR,
dwStyle: DWORD,
x: c_int,
y: c_int,
nWidth: c_int,
nHeight: c_int,
hWndParent: HWND,
hMenu: HMENU,
hInstance: HINSTANCE,
lpParam: LPVOID,
) -> HWND;

Then calling CreateWindowExA in the function body is not required to have an unsafe wrapper, as it has been declared trusted.

That would be backwards to my mind.

3 Likes

Then calling CreateWindowExA in the function body is not required to have an unsafe wrapper, as it has been declared trusted.

CreateWindowExA is a not a safe function signature - it's only safe if the user has validated its inputs.

For example, this hypothetical code with a "trusted" CreateWindowExA routine would compile, but still have memory safety issues:

fn creates_a_window(class_name: &str) {
    let handle = CreateWindowExA(
        0, // default Window style
        class_name.as_ptr(), // MEMORY SAFETY ERROR: LPCSTR has a contract that the string is null terminated!
        ...);
}

&str is not null-terminated in Rust, er-go you've introduced a memory safety issue. It doesn't make sense to mark CreateWindowExA as "safe" because it is not any sense of the word. It's up to the programmer to make sure they're calling it with safe inputs. Requiring the unsafe block when calling CreateWindowExA here is important - an unsafe block says that:

  1. The programmer is now manually verifying the safety of the program.
  2. If you're auditing the program for memory safety issues, then you need only validate that your unsafe blocks are upholding the memory safety requirements of the system, rather than auditing the whole system.

You MUST NOT allow the possibility of memory unsafety to leak outside of the unsafe block. If you're trying to present a safe interface to code that is unsafe, the result must be an interface that will not cause memory safety issues for any valid input to the interface - it is allowed result in runtime errors (return an Err or panic), but it may not cause memory safety issues.

5 Likes

Indeed! I think this is the first Rust RFC I worked up the courage to comment on :slight_smile:

2 Likes

I like the "assume_safe" instead of "unsafe" for blocks that someone in that RFC comments suggested. Although, I think a lot of understanding could be gained by being more explicit:

  • "unsafe" function/trait defs: "unsafe contract fn"/"unsafe contract trait"
    • also, require saying "unsafe external fn" rather than nust "external fn" (don't compile otherwise)
  • "unsafe" blocks: "unsafe contract honored { ... }"
  • "unsafe" trait impt: "impl unsafe contract trait ... for ..."

NOTE: In the above, "contract" and "honored" can both be non-optional contextual keywords (contract after unsafe before trait/fn, honored after contract before block open brackets).

For the 2018 Edition you could have the following:

  • make "unsafe contract" and "unsafe contract honored" optional instead of "unsafe" alone for their respective uses
  • have a lint that defaulted to "warn" for lone "unsafe" without "contract/contract honored" as appropriate (with option to disable warning or error instead)
  • For the next Edition (2020???) make it a hard compile error to have "unsafe" alone without either the appropriate "contract" or "contract honored" as appropriate.
  • Seems like it could easily be fixed with "Rustfix"

In your example, unsafety is introduced in the preparation of function call parameters, not in the function call itself. So the example is not a legitimate counterexample.

It is absolutely a legitimate counterexample. You should spend some time reviewing the unsafe section of The Book. Notably:

  1. Dereferencing a raw pointer

CreateWindowExA is (obviously) dereferencing all pointers you pass to it. There's nothing unsafe about getting a pointer from an object. There is absolutely unsafety involved in dereferencing a pointer. When you put "unsafe" around the call to CreateWindowExA, you're asserting that, though the pointers could point to anything, that you've validated that they only point to memory that is valid in the context of the call to CreateWindowExA.

1 Like

You are right. I now agree with you

2 Likes

But here is my true problem. From a book I know that we can define a new wrapper function, for example,

pub fn WRAP_CreateWindowExA( parameters )->HWND
{
    unsafe {CreateWindowExA(parameters)}
}

And in your function you can call this wrapper function WRAP_CreateWindowExA, without further wrapping it with an "unsafe" block. But this practice:

  1. It still does not help reducing any unsafety,
  2. It introduces a further level of calling, which loses performance.

So my hope is to find a perfect way of writing in calling operating system functions, such that

  1. As concise and programmer friendly as possible
  2. Do not lose performance.
  3. Do not hurt safety in other parts of the program.

The current way of writing is either to use an "unsafe" block whereever the OS function is called, which is not programmer friendly, or use a wrapper function, which loses performance.

This is merely exposing a safe API over inherently unsafe internal code - as far as the caller is concerned, it’s a safe function. To that end, your wrapper needs to ensure that’s the case, whichever way is appropriate to achieve that (eg validating parameters).

There’s virtually zero chance of performance loss - the function will be either inlined automatically or you can otherwise encourage (or force) the compiler to inline. The biggest performance loss is likely calling into the FFI itself because that guarantees no inlining. However, you don’t really have a choice in this case.

4 Likes

You can define a new wrapper function that just passes arguments blindly from the "wrapper" to the target function, but you've introduced unsafety into the program: you've violated the memory safety semantics of Rust. See what I said above:

You MUST NOT allow the possibility of memory unsafety to leak outside of the unsafe block.

So, your hypothetical WRAP_CreateWindowExA function should also be marked as unsafe - a caller could violate memory safety by passing in certain arguments.

A safe wrapper for CreateWindowExA would be something ala:

fn create_window(class_name: &str, ...) -> Window {
    let null_terminated_class_name = CString::new(class_name).unwrap();

    unsafe {
        Window::from_raw(CreateWindowExA(
            0, // default window type
            null_terminated_class_name.as_ptr(), // This is memory safe now - class_name has been null terminated!
            ...))
    }
}
3 Likes

Generally don't worry about this in Rust (or any other compiled language with a modern compiler). The compiler will inline the call if it's worth it, it won't if it's not worth it. Only worry about function call overhead if you've actually benchmarked and examined the compiler output and decided the whole construct needs to be in-lined in-source. Otherwise it's premature optimization.

3 Likes

@senator, it might help to think of “unsafe” as simply: compiler cannot enforce the memory safety guarantees, for whatever reason - it’s on you, the author, to do it manually.

With FFI, the reason is fairly obvious - you’re calling into a black box the compiler has no knowledge of.

With unsafe traits, it’s a matter of the compiler (or the language) not being able to express the requirement. For instance, the Sync marker trait; compiler will auto-implement this for compound types if their innards are Sync but sometimes you’ll need to do this yourself (eg you hold raw ptrs to FFI structures). It’s again similar in that the compiler cannot see into foreign code.

std::io::Read::initializer() is another interesting case, but ultimately boils down to the same thing: compiler cannot see into an opaque black box. The gist here is you’ll likely end up calling some FFI code (eg libc to a syscall) to perform the actual read I/O.

Most, if not all, unsafety stems from the “simple” fact the compiler ultimately cannot see things.

2 Likes

I've always wished instead of the "unsafe" keyword, the designers had chosen "unchecked". ie. unchecked by the compiler and (hopefully) confirmed safe by other means.

There's quite a lot of safety critical C code out there whose designers would bristle at the idea their code is unsafe. Especially since there're plenty of programming/logic errors that can still exist in code that is safe by Rust's strict definition of the term. To me, "unsafe" implies a value judgement that is both unjustified and can lead to dogmatic thinking (similar to how some people like to declare language features to be "evil" without reference to the underlying engineering tradeoffs.)

3 Likes

That is actually the naming convention which Ada uses, so there is prior art in this direction :slight_smile: On the other hand, speaking personally as a heavy user of unsafe code, I am quite happy that Rust uses the "unsafe" terminology, for the following reasons:

  • It directly relates to the underlying concepts of memory-safety, type-safety and thread-safety, and there is a lot of prior art for calling languages which guarantee these properties "safe languages".
  • It puts a big red scary warning on something which people should find scary and avoid whenever there is a reasonable alternative (who wants to debug undefined behaviour in 2018?).
  • For similar reasons, it attracts attention during code review, which is good because manually checked code must also be manually verified by others.
6 Likes

To me, the issue is not so much "unsafe" as THE keyword vs. something else like "unchecked", "unverified", "whatever"; as it is the difference between "declaring an unsafe contract" vs "implementing code which must uphold invariants of unsafe contracts". To me, using the same word for both sides of this causes confusion with the overall concept of "unsafe". That is why I suggested something like:

  • When declaring/defining "unsafe" contracts (like functions and traits):
    • trait - "unsafe contract trait ...."
    • func - "unsafe contract fn ...."
    • external/FFI - "unsafe contract external fn"
  • When using/implementing/calling something that defines an unsafe contract:
    • trait - "impl unsafe contract honored trait ... for ..."
    • unsafe block - "unsafe contract honored { ..... }"

Something like this makes absolutely clear the distinction between defining and unsafe contract and code that must uphold the implied contract.

To my mind this would likely result in a lot less confusion for those learning Rust and would make the overall "unsafe" story much more coherent.

What I proposed, is kind of verbose, but, verbosity WRT unsafe probably is a feature not a bug.

1 Like

There was a discussion about this on internals some time ago. One comment was that there is actually a third dimension to unsafe in current usage, which was "properties that unsafe code can rely on" . Think traits like Send and Sync, or TrustedLenIter.

These require special care since unsafe code is not normally allowed to trust safe code (e.g. relying on Eq to be reflexive and transitive for memory safety is a bug). Currently, they are also annotated using unsafe.

To clarify this, one could speak of "unsafe preconditions" (user must provide these for memory safety) and "unsafe postconditions" (user can rely on these for memory safety). And keep unsafe blocks for manually checked code regions. The idea would be that you need to use an unsafe block in order to do anything potentially unsafe, including calling a function with an unsafe precondition, or returning from a function with an unsafe postcondition. Shorthands would be easy to find: "unsafe pre/post" .

This distinction would solve some longstanding problems with unsafe, for example the fact that you need an unsafe block to implement an unsafe fn (this is not necessary if that unsafe is linked to a precondition: you may not use the precondition in your code). It could also be gradually introduced in existing code, by first making the pre/post annotations optional, then later making them mandatory.

Is this truly necessary at this stage of Rust's evolution, though, considering the amount of churn involved for existing code? I would not know...

2 Likes