Is 'unsafe' code a good thing?

So I have been learning about and writing Rust code for a short while now. I really liked the idea of guaranteed safety, and also was interested in this new idea of ownership and lifetimes etc. Just like pure functional programming forces you to think about computing differently (if you are coming from an imperative background), Rust's ownership system forces a kind of mental shift. When I first encountered the concept of unsafe code in the Rust book, my reaction was something like "this looks like a cop out and I shall try to never use it".
But then for my first semi serious project, I wanted to employ an ECS design pattern, and ended up using Legion as the backbone. I soon encountered requirements that couldn't be catered for (using Legion) without upsetting the borrow checker, and the advice from the author is to make use of unchecked/unsafe facilities. Is this a common feature of API design in Rust?
As I Rust beginner, I'd like some help building an informed opinion APIs that expose unsafe functions. It's one thing to have unsafe internal code, but this is about unsafe code exposed in the API of the library.

From my current perspective, given my limited knowledge right now, it seems like there are 2 competing thoughts battling it out in my head:

  1. Library designers should always be able to write performant safe code with useful abstractions that cater for all intended use cases. Any unsafe function that is used to overcome the restrictions imposed by the borrow checker should be considered a design failure.
  2. There will always be some requirements that cannot be satisfied in a performant and safe way. Exposing unsafe functions in the API is the only way to enable users to satisfy these requirements whilst maintaining good performance.

I suppose a 3rd possibility is that right now unsafe code is more common than it theoretically needs to be given the current state of the compiler. I see that major enhancements to the compiler are still ongoing. Maybe one day, the language will evolve to the point that unsafe code will be deprecated?

I am a Rust beginner, trying to generate an informed option on this. Any advice/comments are welcome.

6 Likes

Well, in my experience, there are two needs for unsafe code.

  1. Something that is by nature dangerous, such as directly accessing hardware when writing a device driver. Or using a C-library (because you depend on the quality, or lack thereof, of the library).

  2. Something that makes code simpler/faster. You can consider this to be the case that the borrow checker is unnecessarily restrictive and restricts some safe code that you know better than the compiler. If the borrow checker is smart enough, and allows you to declare your intent clearly, it should allow what you're writing and so it would not be unsafe.

#1 is always going to stay. #2 can be resolved by improving the intelligence of the compiler, but we may never get 100% of the way there. With improvements in #2, the density of unsafe code should reduce.

So it may not be the library author that fails to avoid unsafe, but maybe the compiler is simply not yet smart enough.

11 Likes

Unsafe code is and always will be necessary, but you should do your very best to encapsulate it.

Let me give an example: The Vec type in Rust contains unsafe code, and fundamentally cannot avoid it, but you can still use a vector without an unsafe block. This is because the api provided by Vec encapsulates the unsafe code in a way that provides a safe api.

You might ask why the raw allocator api cannot be safe even though the api provided by Vec can be. The basic idea is that the memory allocator is more powerful than the api provided by Vec, and that the reason Vec can be safe is that it only provides a subset of the capabilities that the raw allocator provides. For example, a different subset of the raw memory allocator api gives you a BTreeMap, another gives you a LinkedList, and another gives you an Rc.

There are many different ways to carve out a subset of the raw memory allocator that can be safe, but the entire allocator api cannot be. Unsafe code is what allows you to take this very powerful unsafe api and carve out a safe subset.

Without unsafe to do this, every possible safe subset of every unsafe api must be hard-coded in the compiler somehow, which isn't really more safe than doing it with unsafe code. The correctness still has to be verified, it's just that now that verification is done by compiler authors instead of library authors.

13 Likes

My take away from this is that API designs should not expect users to call unsafe functions to perform typical operations. All such unsafe functions should be encapsulated safely. It should only be for very atypical cases that users should need to call unsafe functions. Do you Agree?

2 Likes

As an almost beginner with still limited knowledge I look at it entirely the opposite way around.

  1. Library designers are expected to write useful functionality with as much performance as they can muster. As such they may well need to use "unsafe". Use of "unsafe" is absolutely necessary for many situations. Classic cases being that the compiler cannot check what goes on when you call a C function or what any hardware you are interfacing to does.

  2. A libraries API should not expect the user to use "unsafe". Of course there will be exceptions.

So for me "unsafe" should never be seen in application code. It should be encapsulated in libraries that provide safe API's. Alice's Vec being a good example.

I have no idea what goes on with the Legion ECS system but my gut tells me that if one is going to build such a system it should encapsulate all the unsafe bits it needs. The user should never have to write "unsafe" to use it. That seems all wrong to me.

10 Likes

Sure, it is best to design apis that can be used safely.

1 Like

Also be careful when people suggest us of "unsafe" features to get maximal performance.

During this passed year I have had a number of discussions here about creating performant Rust code. Basically trying to match the performance of C.

In my experience so far there has always been a way to reach C performance without using "unsafe".

Many suggestions that did you "unsafe" features were quicker but there has always been a quicker safe alternative.

My conclusion is that if you find yourself using "unsafe" for performance you have probably missed something. Admittedly that something may not be obvious.

alice is good at this.

4 Likes

I wouldn't say it's good or bad. It's just a tool, which has its uses and has its risks.

unsafe is also a critical component of what makes Rust what it is: a high-performance language with low-level control and a lot of safety guarantees.

It's not possible in general to prove whether any arbitrary code is safe — that is a variant of the Halting Problem. Because of that impossibility, we have low-level languages with weak safety (choosing all the way towards unsafe), and we have safe languages with less performance and control (banning all unsafe). If you remove unsafe from Rust, it won't have the best of both worlds any more.

3 Likes

I have heard this kind of statement many times and it has always puzzled me.

Certainly if "unsafe" wraps some interaction with the hardware or other language, the compiler does not have enough information to know if it is sound or not. Perhaps a human can judge it to be OK with the knowledge of the external system they have. This is not a Halting Problem kind of problem.

But what about code that is totally knowable by the compiler? Do we still need unsafe there? As far as I know we do sometimes.

Fair enough, a human can inspect it and judge it to be OK.

There is then a paradox for me. If that "unsafe" section is some kind of Halting Problem kind of problem the surely a human cannot verify it either?

Let's say we have code like:

let badptr = 0 as *const u8;
if odd_perfect_number_exists() {
   *badptr;
}

Even if the compiler can see and "understand" 100% of that code, it would have to crack an unsolved problem in mathematics to say whether this code is unsafe or not.

If the condition is false, then the code is safe. If you want to be able to analyze arbitrary code, you can't forbid or ignore such cases. For example, you need to analyze dynamic conditions to prove that Rust's Cow runs destructor only when that is safe to do.

Halting Problem is a name derived from the proof that arbitrary code can't be analyzed. It's not about halting, but lots of other analysis problems can be reduced to that problem by making them depend on whether part of a program will halt (such as whether a loop that stops on the first odd perfect number will stop or not).

There is analyzable subset of code, e.g. while true {} obviously never halts, and while false {} obviously always does. But that is not all code, and there's a proof that you will never be able to analyze all code. Rust can obviously analyze safe subset of Rust. It is possible to analyze more than that, but this usually gets exponentially more difficult (e.g. if you try to symbolically execute all possible paths through a function with all possible values), so there has to be a line somewhere between what is analyzed, and where analysis has to give up.

18 Likes

I agree. "good" and "bad" are emotive terms for a technical feature that is actually essential in some cases.

We do want to minimize the risk of our programs failing though. To that end I don't want to "write" unsafe into my application code. I don't like to operate machines with the guards off! If I have to I want "unsafe" wrapped up nicely somewhere not scattered all around my code. Preferably I want it to be in libraries written by people who know what they are doing, vetted by others and tested by even more users.

Yes indeed.

But that, and the rest of what you said, applies to the human reader as well.

Hence my suggestion of a paradox here.

For me, unsafe code is a tool like an engine saw. Should we peel the apple using engine saw? No, it should be better to use safer tools like kitchen knife. Should we recommend to use engine saw for people new to woodworking? No, they should be used to small carving knives first. Should we cut the trees using carving knife? No, that would be too slow. But it would be better to call existing tree-cutting service organized by experienced people. You can't find a such service for your use case? Than please make one for yourself so others can benefit from it.

4 Likes

I'm having doubts about that. As far as I can tell "unsafe" is not a tool to enable you do do things faster, it's to enable you to do things that are otherwise logically impossible. Like calling out to C or dealing with hardware.

As I think I mentioned above, so far I have always managed to match the speed of C/C++ without using unsafe.

Does anyone have an example of using "unsafe" is required to achieve performance that cannot be achieved without it?

Performance benefits are mainly via implementing data structures that aren't possible to implement exclusively in safe code.

As a current example, indexmap is a surprisingly competitive HashMap implementation for being entirely based on safe code using Vec. However, using hashbrown (which is the implementation powering the std HashMap) is unambiguously more performant, due to being a highly optimized implementation using raw pointer manipulation.

4 Likes

This is a good reaction!

I usually think of unsafe code as an escape hatch you can use when you are doing tricky things with memory (channels, allocators, BTreeMap, Vec<T>, etc.) or need to access external code that the compiler can't verify.

I don't usually buy into the "we need unsafe because performance" argument. You are often trading correctness for perceived performance, when the optimiser would have been able to (provably correctly) make those optimisations anyway. These uses of unsafe have a much higher burden of proof than the previous case.

Likewise, using unsafe to work around the borrow checker should be met with skepticism... If the borrow checker thinks something you are doing is sketchy, I'd be inclined to listen to it.

I like this analogy... I work with engineering and sometimes we'll need to do things that are normally unsafe in order to complete a job. For example, imagine needing to remove the guarding from a piece of machinery while it is running so you can see which part isn't spinning properly.

In this case it's not practical/possible to do the job with all the safety measures in place so its up to the human to make sure they don't hurt themselves.

1 Like

An interesting piece to look at here is Pin. As far as I can tell, its real purpose is to allow the ownership of an object to change while there are still references or pointers to it. There’s nothing that makes this theoretically unsound, except that transferring ownership has traditionally invloved copying the value to a different place in memory.

As noted in its docs, doing anything useful with Pin requires unsafe code: There’s no other way to convince the compiler to move an object with active references. If you don’t want to do that, there’s no need for the Pin at all. This is exactly “using unsafe to work around the borrow checker.”

I would not say that this is accurate. One way to think of it is as a way to add more guarantees to a reference, e.g. a &mut T guarantees that the pointee does not move while the reference exists, whereas a Pin<&mut T> guarantees that the pointee will never move again, even after the reference goes away.

1 Like

Right, but is there any situation where this would happen that doesn’t arise from a change in ownership? You’ve accurately described what Pin<&mut T> does; I’m trying to understand why it exists, which is a subtly different question.

I should’ve probably said ‘pointers’ rather than ‘references’, though, as those are two distinct things in Rust.

It depends a bit on your definition of "transfer ownership", e.g. does transferring ownership of a Box<T> also transfer ownership of the T? If the answer to that is no, then you cannot ever transfer ownership of a value that has been pinned. The Pin type exists to make self-referential structs sound. Self-referential structs are normally unsound as all moves are just a memcpy, so any pointers the struct stores into itself would not be updated when the struct is moved, but if you know that the struct is never moved, that's not a problem.

To tie this back to the discussion in this thread, note that a self-referential type would never be usable in safe code without something like Pin (or heap allocation), because the soundness relies on the owner of the object making a promise to never move the object. It is possible to define a pin! macro that makes use of variable shadowing to ensure that the value can never be moved again, allowing safe code to pin something to the stack.

So Pin is actually an example of library code that gives the compiler the ability to verify something, that the compiler was not able to verify previously. Now that the Pin api exists, a user can, in safe code, pin some value, and have the compiler verify that the value does in fact not move later. Once the value is pinned, the user can obtain a Pin<&mut T> to that value, and by passing a pinned reference to other functions, those other functions know that the argument is definitely pinned, and can rely on that guarantee in unsafe code.

As an example, the standard library mutex stores the value in an heap allocation, which is necessary due to some OS apis taking pointers directly to the mutex. If the mutex object was moved, that would invalidate those pointers. Now that Pin exists, it would be possible to rewrite mutex to instead take a pinned reference when locking the mutex, and by doing this, the user can promise that they wont move the mutex object, and it would be perfectly fine to give the OS pointers into objects owned by the caller, instead of using a heap allocation to ensure the pointers remain valid.

3 Likes

In the context of formal computability theory, we're usually talking about a single algorithm that can (in this case) prove the absence of undefined behavior for any Rust program whatsoever with zero false positives and zero false negatives. Such an algorithm is just mathematically/logically impossible, and that's never going to change. The general principle here is Rice's theorem - Wikipedia, which in some sense is a generalization of the famous halting problem.

What is possible is proving soundness for any program in a sufficiently restricted subset of Rust and allow false negatives (as well as false positives if you make mistakes in your unsafe code), which is exactly what rustc does today.

The reason this doesn't apply to humans is simply that humans aren't algorithms. They can guess, and try creative solutions, and invent novel proofs of their own. But doing that usually takes a lot longer than we want for our rustc compile times. In practice, the human task is usually coming up with various "restricted subsets" and acceptable "false negatives" that are algorithmically solvable, proving that they are, and then deciding which ones we want to use to do our day-to-day compilation proofs.

(if you want to learn about all the details I'm sweeping under the rug with phrases like "sufficiently restricted subset of Rust", that's a branch of mathematics called "computability theory")

4 Likes