Is there a checklist for review of unsafe code?

Hello, I’m new to the Rust community but have been experimenting with Rust for a few weeks now. I’m looking at implementing a data structure that could use some unsafe code to make it performant. However, before doing that, I’ve read through the Rustonomicon and found some good material on what can be done with unsafe and some of the pitfalls.

I was wondering if there is an agreed upon list of things to manually verify through code review when using unsafe. It would seem to me that having a list of all of the checks that are skipped by the compiler for unsafe code would be helpful to make sure manual inspection of such code is thorough.

Thanks,
Tim

2 Likes

I don’t think there is one single widely agreed upon list yet. But https://github.com/rust-lang/unsafe-code-guidelines/ is an effort in that direction, which already produced many interesting discussions. Feel free to read through what they already published, what they are currently discussing in the issue tracker, and get in touch with the authors if you have extra questions!

1 Like

Thank you. I’ll give this a review. My initial impression is that it is very much a work in progress.

1 Like

And I think your initial impression is quite right :slight_smile: It’s the largest effort in that direction that I’m aware of, but is quite far from offering a “production-quality” reference.

Not a checklist, but miri checks for some known problems when interpreting code, so exercising unsafe code under miri may catch bugs.

2 Likes

Thanks for the pointer!

In order to make such a checklist, I suspect the way to do it would be to inspect the rustc sources to see how input source is treated differently wrt to semantic analysis when unsafe is used vs. not. From that, one would have a good list of what the compiler isn’t doing for you so you’re on your own. Of course, I’m making a generous assumption that the compiler sources are organized in such a manner that a non-expert in the compiler internals could actually do this. Again, I’m new so I realize that’s an assumption.

Tim.

1 Like

I suspect this will have little overlap with what constitutes useful information to the user. The semantics of rust code inside or outside unsafe is identical. Most of what you find would probably be either deciding when certain optimizations are legal, or deciding whether the code itself is legal.

…and there’s plenty of things you need to worry about in unsafe code that’s just plain UB. And I don’t just mean language stuff; standard library functions like std::ptr::read, std::mem::uninitialized, and CString::from_raw have plenty of not-so-obvious footguns in addition to the obvious ones. You could probably fill a book with them!

I’ll try throwing something together with all of the not-so-obvious footguns I’m aware of.

2 Likes

No checks are skipped, there are just the “unsafe superpowers” added to what you can do:

  • Dereference a raw pointer
  • Call an unsafe function or method
  • Access or modify a mutable static variable
  • Implement an unsafe trait

For each of those respectively, you need to (roughly and probably non-exhaustively):

  • Ensure that it’s correctly aligned, non-null, points to a valid instance of the type, and any references you give out have a safe lifetime
  • Ensure that you’re meeting everything in the “Safety” section of the docs of the function/method
  • (This is almost impossible to do right)
  • Ensure that you’re meeting everything in the “Safety” section of the docs of the trait

Which is easy to say, but hard to do since all the code in the module is suspect.

And if you’re writing something unsafe yourself, you need to ensure that your “Safety” section includes all the requirements of the things you do/call that you aren’t ensuring yourself.

4 Likes

Well, I wrote a thing. It’s very incomplete, and quite large because I decided to include code examples. Kind of gnarly to look at without some sort of navbar or a table of contents.

Very long thing

Drop safety

Things to look for:

  • Usage of unsafe in any generic function that doesn’t have T: Copy bounds.
  • Usage of unsafe near code that can panic.

Summary: unsafe code often puts data in a state where it would be dangerous for a destructor to run. The possibility that code may unwind amplifies this problem immensely. Most unsafe code needs to worry about drop safety at some point.

Danger: A value read using std::ptr::read may get dropped twice

(This also applies to <*const T>::read, which is basically the same function)

Incorrect

use std::ptr;

struct ArrayIntoIter<T> {
    array: [T; 3],
    index: usize,
}

impl<T> Iterator for ArrayIntoIter<T> {
    type Item = T;

    fn next(&mut self) -> Option<T> {
        match self.index {
            3 => None,
            i => {
                self.index += 1;
                Some(unsafe { ptr::read(&self.array[i]) })
            }
        }
    }
}

When the ArrayIntoIter<T> is dropped, all of the elements will be dropped, even though ownership of some of the elements may have already been given away.

For this reason, usage of std::ptr::read must almost always be paired together with usage of std::mem::forget, or, better yet, std::mem::ManuallyDrop (available since 1.20.0) which is capable of solving a broader variety of problems. (In fact, it is impossible to fix the above example using only mem::forget)

Correct

use std::mem::ManuallyDrop;
use std::ptr;

struct ArrayIntoIter<T> {
    array: [ManuallyDrop<T>; 3],
    index: usize,
}

impl<T> Iterator for ArrayIntoIter<T> {
    type Item = T;
    
    fn next(&mut self) -> Option<T> {
        match self.index {
            3 => None,
            i => {
                self.index += 1;
                Some(ManuallyDrop::into_inner(unsafe { ptr::read(&self.array[i]) }))
            }
        }
    }
}

impl<T> Drop for ArrayIntoIter<T> {
    fn drop(&mut self) {
        // Run to completion
        self.for_each(drop);
    }
}

Danger: Closures can panic

Incorrect

pub fn filter_inplace<T>(
    vec: &mut Vec<T>,
    mut pred: impl FnMut(&mut T) -> bool,
) {
    let mut write_idx = 0;

    for read_idx in 0..vec.len() {
        if pred(&mut vec[read_idx]) {
            if read_idx != write_idx {
                unsafe {
                    ptr::copy_nonoverlapping(&vec[read_idx], &mut vec[write_idx], 1);
                }
            }
            write_idx += 1;
        } else {
            drop(unsafe { ptr::read(&vec[read_idx]) });
        }
    }
    unsafe { vec.set_len(write_idx); }
}

When pred() panics, we never reach the final .set_len(), and some elements may get dropped twice.

Danger: Any method on any safe trait can panic

A generalization of the previous point. You can’t even trust clone to not panic!

Incorrect

pub fn remove_all<T: Eq>(
    vec: &mut Vec<T>,
    target: &T,
) {
    // same as filter_inplace
    // but replace   if pred(&mut vec[read_idx])
    //        with   if &vec[read_idx] == target
}

Danger: Drop can panic!

This particularly nefarious special case of the prior point will leave you tearing your hair out.

Still Incorrect:

/// Marker trait for Eq impls that do not panic.
///
/// # Safety
/// Behavior is undefined if any of the methods of `Eq` panic.
pub unsafe trait NoPanicEq: Eq {}

pub fn remove_all<T: NoPanicEq>(
    vec: &mut Vec<T>,
    target: &T,
) {
    // same as before
}

In this case, the line

drop(unsafe { ptr::read(&vec[read_idx]) });

in the else block may still panic. And in this case we should consider ourselves fortunate that the drop is even visible! Most drops will be invisible, hidden at the end of a scope.

Many of these problems can be solved through extremely liberal use of std::mem::ManuallyDrop; basically, whenever you own a T or a container of Ts, put it in a std::mem::ManuallyDrop so that it won’t drop on unwind. Then you only need to worry about the ones you don’t own (anything your function receives by &mut reference).


Pointer alignment

Things to look for: Code that parses &[u8] into references of other types.

Summary: Any attempt to convert a *const T into a &T (or to call std::ptr::read) requires an aligned pointer, in addition to all the other, more obvious requirements.


CString::from_raw

Things to look for: Any usage of CString::{into_raw, from_raw}.

Summary: As documented, CString::from_raw recomputes the length by scanning for a null byte. What it doesn’t (currently) mention is that this length must match the original length.

I think you’ll be hard pressed to find any C API function that mutates a char * without changing its length!

Incorrect

extern crate libc;

use std::ffi::{CString, CStr};

fn main() {
    let ptr = CString::new("Hello, world!").unwrap().into_raw();
    let delim = CString::new(" ").unwrap();
    
    let first_word_ptr = unsafe { libc::strtok(ptr, delim.as_ptr()) };
    
    assert_eq!(
        unsafe { CStr::from_ptr(first_word_ptr) },
        &CString::new("Hello,").unwrap()[..],
    );
    
    drop(unsafe { CString::from_raw(ptr) });
}

This is incorrect because strtok inserts a NUL byte after the comma in "Hello, world!", causing the CString to have a different length once it is reconstructed. As a result, when the CString is freed, it will pass the wrong size to the allocator.

The fix is to never use these methods. If a C API needs to modify a string, use a Vec<u8> buffer instead.

Correct

extern crate libc;

use std::ffi::{CString, CStr};
use libc::c_char;

fn main() {
    let mut buf = CString::new("Hello, world!").unwrap().into_bytes_with_nul();
    let delim = CString::new(" ").unwrap();

    let first_word_ptr = unsafe {
        libc::strtok(buf.as_mut_ptr() as *mut c_char, delim.as_ptr())
    };

    assert_eq!(
        unsafe { CStr::from_ptr(first_word_ptr) },
        &CString::new("Hello,").unwrap()[..],
    );
}

Also: Store a CString to a local before calling as_ptr()

Just as an aside, there’s another footgun here. If I had written:

Incorrect:

let delim = CString::new(" ").as_ptr();

the buffer would have been freed immediately.

Beware usage of std::mem::uninitialized or std::mem::zeroed to produce arbitrary, unknown types

Things to look for: Usage of either std::mem::uninitialized or std::mem::zeroed in a function with a generic type parameter T.

Summary: Sometimes people try to use std::mem::uninitialized as a substitute for T::default() in cases where they cannot add a T: Default bound. This usage is almost always incorrect due to multiple edge cases.

Danger: T may have a destructor

Yep, these functions are yet another instance of our mortal enemy, Drop unsafety.

Incorrect

fn call_function<T>(
    func: impl FnOnce() -> T,
) -> T {
    let mut out: T;
    out = unsafe { std::mem::uninitialized() };
    out = func(); // <----
    out
}

This function exhibits UB because, at the marked line, the original, uninitialized value assigned to out is dropped.

Still Incorrect

fn call_function<T>(
    func: impl FnOnce() -> T,
) -> T {
    let mut out: T;
    out = unsafe { std::mem::uninitialized() };
    unsafe { std::ptr::write(&mut out, func()) };
    out
}

This function still exhibits UB because func() can panic, causing the uninitialized value assigned to out to be dropped during unwind.

Danger: T may be uninhabited

Still incorrect!!

fn call_function<T: Copy>(
    func: impl FnOnce() -> T,
) -> T {
    let mut out: T;
    out = unsafe { std::mem::uninitialized() };
    out = func(); 
    out
}

Here, the Copy bound forbids T from having a destructor, so we no longer have to worry about drops. However, this function still exhibits undefined behavior in the case where T is uninhabited.

/// A type that is impossible to construct.
#[derive(Copy, Clone)]
enum Never {}

fn main() {
    call_function::<Never>(|| panic!("Hello, world!"));
}

The problem here is that std::mem::uninitialized::<Never> successfully returns a value of a type that cannot possibly exist.

Or at least, it used to. Recent versions of the standard library (early rust 1.3x) include an explicit check for uninitialized types inside std::mem::{uninitialized, zeroed}, and these functions will now panic with a nice error message.

How about std::mem::MaybeUninit?

This new type (on the road to stabilization in 1.36.0) has none of the issues listed above.

  • Dropping a MaybeUninit does not run destructors.
  • The type MaybeUninit<T> is always inhabited even if T is not.

This makes it significantly safer.


C-like enums are not FFI-safe

What to look for: enums appearing in signatures of extern fns.

Summary: It is undefined behavior for an enum in rust to carry an invalid value. Therefore, do not make it possible for C code to supply the value of an enum type.

Incorrect:

#[repr(u16)]
pub enum Mode {
    Read = 0,
    Write = 1,
}

extern "C" fn rust_from_c(mode: Mode) { ... }

Also incorrect:

extern "C" fn c_from_rust(mode: *mut Mode);
fn main() {
    let mut mode = Mode::Read;
    unsafe { c_from_rust(&mut mode); }
}
8 Likes

@ExpHP This is a wonderful collection of unsafe gotchas ! Please consider finding a longer-term host for these so that 1/the effort you expended into writing it does not get lost in the forum archives, 2/we can implement some navigation to make it more readable and 3/people can contribute more gotchas to the list as they come up with new ideas.

(Personally, I know I would have some extra gotchas in the area of low-level multi-threading to contribute. Here be data races and shared &muts!)

3 Likes

For now I moved them into an mdbook. Feel free to contribute!

Thank you! This is great stuff!!

Sent you a PR on thread synchronization gotchas :slight_smile:

Another ingredient to unsafe code review IMO is to view unsafe blocks as having proof obligations. Whenever you dereference a raw pointer, it is your job to show that that pointer is aligned, non-NULL, and points to valid data. Whenever you call an unsafe fn, that function should document why it is unsafe and it is your job to show that the assumptions it makes are satisfied.

So, it is good style to have a comment for every unsafe block saying why the proof obligations in this unsafe block are all met at this point in your code. This generally also requires you to carefully document the invariants that are needed to even make the proof obligations work (e.g. for Vec, that invariant would include things like “the first len many elements of buf are initialized”). Once you have invariants, every instruction that changes the data involved in that invariant now also has a proof obligation! (E.g. for Vec, any time you change len you have to argue that the invariant is still maintained.)

This by no means replaces being aware of specific gotchas that can be collected in a checklist. But I believe this methodology will put you in the right mindset to write unsafe code responsibly and with confidence.

8 Likes

I really like this idea! I’ve had to do a fair amount of unsafe coding in the past and found having an “assumptions comment” to be a pretty good tactic.

This sort of advice works well when looking at individual units (e.g. a single unsafe block or FFI call), but how do we teach writing/designing “safe” abstractions at a higher level?

My area of interest is FFI code, and at the moment I’m currently writing Rust bindings to the libsignal-protocol-c library. They’ve got a reference counting system where most dumb data objects will “inherit” from signal_type_base (basically just a reference count and destructor pointer). There’s also a top-level global Context which contains things like locks and a vtable for crypto routines. One of the soundness problems I’m concerned about is that one of these reference counted objects may contain a pointer to our Context, the original Context gets dropped, then later on when the reference counted object is destroyed it tries to access the Context and gets a use-after-free.

How should we go about auditing more high-level problems like this? You could look at each individual unsafe block and not notice the “safe” abstraction that are written were subtly unsound. Interacting with 3rd party C code also tends to complicate things.

1 Like

It’s the same thing, really. Vec is a “safe abstraction at a high level”, is it not? Whenever you define a type, you get to pick the invariant. This is an extremely important step! It is very helpful to be as precise as you can be here. For a Vec with fields len, cap and buf, I would include things like

  • cap >= len
  • buf points to cap * size_of::<T> many bytes of memory that we fully own
  • (buf as *mut T).add(i) points to something satisfying the invariant of T, for i < len.

Then whenever you change the value of a field you have to make sure that the invariant is still maintained, and whenever you use a value of a field you may assume that the invariant holds, and use that for justifying the local unsafe block. If you do this consistently, correctly and exhaustively, safety of your abstraction follows.

Seems like that would come up when looking at either the unsafe code that accesses the Context (you have to argue that this access is legal, where do you get that knowledge/ownership from?), or the unsafe code that drops the Context (you have to argue that you actually exclusively own the context to be allowed to drop it).

Whenever something like this happens, there has to be at least one unsafe block somewhere that does not properly justify what it is doing. Though, there is a caveat here: once you start having extra invariants, looking at just the unsafe blocks themselves is not enough! You have to look at all code that could break the invariant. I go into this in some more detail in this blog post.

4 Likes

When you put it that way, you make a good point. I guess if I want to ensure my interface to the underlying C library is sound, I’ll need to check each of my types and come up with a list of constraints, then check each block of unsafe code to make sure those constraints are held.

This part makes things a lot more interesting/annoying. I’m guessing I’ll need to read through the original C library’s source code to see how it’s using arguments when calling a function or constructing an object (e.g. is that pointer a &mut T or a Box<T>?). Then if the C struct keeps a pointer to a particular thing I’ll need to make sure the wrapper struct contains a Pin<Box<T>> to make sure it lives long enough and never moves.

Yeah, all of the C code virtually exists “inside” the unsafe block that you need to write to call an FFI function.

You only need pinning if data is still under contrßol of your Rust-side caller. Most of the time, just passing ownership or a mutable reference in is enough.