Difficulty Grasping the Concept of Unsafe in Rust

This morning, I read a Rust book that I borrowed from the library. I went over the section on unsafe in Rust several times, but I still couldn't quite grasp it. In the afternoon, I spent some time searching online for explanations in Chinese to help me understand unsafe better, but I'm still struggling.

Hello, everyone, is it difficult for you to understand and utilize unsafe in Rust?

1 Like

Any formal system that attempts to verify some property of a computer program, such as safety, must have either false negatives or false positives (or both). Because of this, Rust adopts a hybrid approach:

  • Safe Rust is formally verified for memory safety and a few other things. In order to do this, it forbids some operations that are useful but can’t be accounted for in the automatic analysis.
  • The unsafe keyword gives you access to these operations. But, because you’ve stepped over the compiler’s guardrails, you have to rely on your own analysis to maintain safety.

It depends on the situation. I’ve got a pretty good understanding of how to deal with FFI, for example, but would need someone more knowledgeable to review any low-level thread-synchronization code I attempted to write.

12 Likes

The basic idea is that there are invariants that the language / compiler relies upon. Arguably the most fundamental is that &mut _ references are unique / not aliased. But there are many others, like &_ is always aligned and non-null; the bitwise value of a bool is 1 or 0; etc.

When you use safe Rust, the compiler[1] proves that your program is, to some extent, correct -- it contains no undefined behavior (UB). It might still have logic errors, but no UB. In Rust that includes "memory safety" errors like use-after-free and reading of uninitialized data, as well as other things like data races. It knows how to analyze safe Rust and prove that all the invariants hold. If the compiler can't prove it, you get an error (borrow checker error, Sync not implemented error, ...).

But the compiler can't prove that every correct program is, in fact, correct.[2] And since Rust is a systems language that allows access to the system libraries, allocators, assembly, and so on, this is considered an unacceptable limitation.[3] So Rust gives programmers a way to "turn off" the proof engine: unsafe.

But this does not change the invariants. Instead, it is now the responsibility of the programmer to uphold the invariants.[4]

In my opinion, unsafe Rust is very hard to get right, at least, beyond some trivialities like vec.get_unchecked(0) or perhaps well-trod areas like FFI. So yes, I would say it is difficult to utilize correctly. You must understand a lot and be conservative to get it right. Hopefully this will improve over time, with a more thorough specification of the language and more compiler guidance (warnings) for unsafe code. Currently, you are largely on your own.

Definitely it is not something you should reach for whenever you get some borrow check error, say.[5][6]

A big benefit of unsafe code is encapsulation. If you have a program that exhibits UB, you know that the cause is in some unsafe block somewhere.[7] If you do not use unsafe yourself, you know it's not your fault -- it's a bug in some dependency.

The ideal is for unsafe to be encapsulated in targeted, well-tested, and battle-hardened libraries. Then the vast majority of Rust programs need not contain any local unsafe; instead they build on the libraries.


  1. rustc anyway ↩︎

  2. nothing can, it's undecidable ↩︎

  3. In contrast with more "managed languages" that make UB impossible, modulo compiler/runtime-environment errors. ↩︎

  4. Some invariants have to hold everywhere, others can be temporary broken... it's a large topic I can't cover completely here. ↩︎

  5. ↩︎

  6. Confession: Like many, I thought I knew better when first learning Rust. I did not. Rust UB is not directly aligned with any other language's UB (and UB is defined at the language level, pre-compilation). ↩︎

  7. or a compiler bug ↩︎

15 Likes

There are various uses for unsafe code. Unsafe means the compiler cannot check that your program is well-defined, you have to do that yourself, so it marks a change of responsibility.

It can be used for low-level operations (such as memory allocation), for calling other languages (often "C" language), or for pointer operations that the compiler cannot check (raw pointers).

As an example, I recently implemented Cursors for BTreeMap. You could do this by storing positions, but it would mean that every cursor operation would have to start from the root of the tree to get a mutable reference to a leaf node, which is inefficient. By using raw pointers, the pointer to the leaf node can be cached, which is far more efficient. The code is here:

or

/// Cursor that allows mutation of map keys, returned by [`CursorMut::with_mutable_key`].
pub struct CursorMutKey<'a, K, V, A: Tuning> {
    map: *mut BTreeMap<K, V, A>,
    leaf: *mut Leaf<K, V>,
    index: usize,
    stack: StkVec<(*mut NonLeaf<K, V>, usize)>,
    _pd: PhantomData<&'a mut BTreeMap<K, V, A>>,
}

Here map and leaf are raw (unchecked) pointers (stack also has raw pointers). This means you can have multiple mutable pointers pointing to different parts of the BTreeMap, which is not allowed in safe Rust. It means that care has to be taken when updating the map, that pointers are not invalidated, as the compiler cannot check ( it is possible though to run the code using Miri, an interpreter, which checks that operations are correct ).

An alternative approach would be to store the nodes for a BTreeMap in a few (potentially) large Vecs, and using indexing. Indexing can often be used as a safe alternative to raw pointers.

5 Likes

Thank you. For me it is not easy to understand.

Thank you. Maybe I am beginner, so I am not easy to understand unsafe.

Thank you very much. I read your message, but I still am not good to understand unsafe.

Maybe try to ask more specific questions. It is kind of hard to gauge if it is your experience level, programming in general or a language barrier that is preventing you from understanding the concept of unsafe.

In any case, if you have a hard time understanding it maybe just ignore it for the moment. It will 100% not be important for you right now and any of the code you will write in the foreseeable future will not need you to use unsafe.

8 Likes

Here is a simple example that may help. But as others have said, remember that learning unsafe Rust is not necessary to use Rust. Unsafe can be learned if you have a need for it in the future, after you're comfortable with safe (normal) Rust.

Before I give the example I need to emphasize that this is not something you should do. It is just an example of something very simple to illustrate the idea of using unsafe.

Let's say we have a vector or slice of i32 numbers and we want to calculate the sum of these numbers.

fn sum(numbers: &[i32]) -> i32 {
    let mut sum: i32 = 0;
    for i in 0..numbers.len() {
        sum = sum + numbers[i];
    }
    return sum;
}

let numbers: Vec<i32> = vec![1, 2, 3];

assert_eq!(6, sum(&numbers));

This works correctly. But there is something inefficient about it.

The expression numbers[i] is used to get each number. For safety, this expression will check that i is less than numbers.len(). This is called a bounds check. Rust does a bounds check because it would be unsafe to try to return a value from numbers when i is greater or equal to numbers.len().

But we know that i is always less than numbers.len() because i is computed using the range expression 0..numbers.len(). So there is no need for the bounds check in the numbers[i] expression.

So how can I avoid the bounds check in every numbers[i] expression? I can use an unsafe method, get_unchecked. This method does not do a bounds check, so it is unsafe. Because it is unsafe I must enclose the expression in unsafe { ... } and I should always add a comment explaining why it is actually safe in this situation.

// Safety: `i` is always less than `numbers.len()` in the range above.
sum = sum + unsafe { numbers.get_unchecked(i) };

The general concept is that the Rust compiler doesn't know that i is always less than numbers.len(), but I the programmer do know this. So I can use an unsafe method that does not do a bounds check in order to make the program more efficient.

(Please note that this is not a realistic problem because Rust can probably optimize the original code to avoid the bounds check. And there are other ways to sum numbers and avoid the bounds check. I'm using this example only because it is simple to understand.)

Making something more efficient is just one reason that unsafe can be useful. There are many reasons to use unsafe that others have mentioned.

For example, Rust itself implements the numbers[i] expression using unsafe code, but it does the bounds check to ensure it is safe.

11 Likes

I see. Thank you. I never have experience to work in Software Engineer, and I am learning amd improving coding, prepare become Software Engineer in the future.

Okay. Thank you. Let me understand a few about unsafe, but no complete understand. Maybe after several years I will complete understand about unsafe. Best wish to you.

1 Like

Difficult for me? Yes.

The parts I understand took a long time to sink in. What I've learned is that it's best for me to avoid using it altogether.

Unsafe exists for a number of valid reasons, as noted earlier:

Reading in data from a program written in a different language, like C or C++, but anything that is not Rust, is a classic example of how the unsafe mechanism allows you to do things that you cannot do within the walled garden of Safe Rust.

So the others on this thread understand unsafe, and you and I don't. Why am I chiming in when I admittedly do not understand it? I just wanted to let you know that lots of us do not understand unsafe well enough to use it, *ahem*, safely.

In fact, you can get pretty far, even in a professional context, using safe Rust. A lot of the benefits of unsafe are still available to you through third-party crates, exposed to the user through safe APIs. My employer likes the fact that I use safe Rust (actually, they insist).

It's OK if we just use safe Rust. One day I am sure we'll get there, and then you and I can answer the questions on this forum.

6 Likes

I see. Okay. Thank you so much. I still don't understand 'unsafe' in Rust. By the way, I've heard that no one has a good command of C++. I learned C++ and know it is not easy to understand all the knowledge in C++.

1 Like

The way I understand unsafe is that the compiler will refrain from doing it's normally strict checks and expects you, the code writer, to take over the tasks of making sure the code you write is safe.
It's just a delegation of the responsibilities.

Watch the YouTube video. It will clear your questions.

3 Likes

Thank you.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.