Idea: Types of unsafety

Recently, I was working on a program using SIMD, and I wrote some code similar to this:

#![cfg(target_feature = "neon")]

use core::arch::aarch64::*;

// in another file
trait Block: Sized {
    fn compress(src: &[u8]) -> Option<Self>;
    // ...
}

struct NeonBlock(u8x16);

impl Block for NeonBlock {
    fn compress(src: &[u8]) -> Self {
        // SAFETY: This module is only compiled if Neon is available.
        Some(Self(unsafe { vld1q_u8(src.as_ptr) }))
    }
    // ...
}           

It took me a little while to realize that <NeonBlock as Block>::compress was unsound, since it'd read out of bounds if given a slice with fewer than 16 bytes. This happened because vld1q_u8 is both memory-unsafe and unsafe on certain targets (which I'll abbreviate as ISA-unsafe from here on), but I had only considered its ISA-unsafety when I was writing that code. (I didn't immediately notice the out-of-bounds read since in the program I was writing Block::compress is allowed to assume that the input slice is the correct length, but it's safe so it still can't UB when it isn't.)

One solution I thought of for this is to annotate unsafes with the types of unsafety they allow. For example, vld1q_u8 would be declared as unsafe("isa", "memory"). Then, if you try to use it in an unsafe("isa") block (which is what I would have used in the above code), you'd get a compiler error since you hadn't opted into memory unsafety.

The three main kinds of unsafety (that I can think of) are:

  • Thread unsafety, like std::env::set_var, or Send and Sync
  • Memory unsafety, like Vec::set_len
  • ISA unsafety, like most SIMD intrinsics

It's also possible to have multiple kinds of unsafety. For example, raw pointer loads and stores are both thread and memory-unsafe, and SIMD loads and stores have all three kinds of unsafety. (Maybe anything memory-unsafe should also be thread-unsafe, since if you can do weird things with memory you can probably do weird things with other threads' memory.)

Plain unsafe blocks would allow any of these kinds of unsafe operations to be used within them, while plain unsafe functions would only be usable in plain unsafe blocks. I think unsafe("memory", "isa", "thread") shouldn't be considered equivalent to plain unsafe, since that's incompatible with adding additional types of unsafety.

Do you think this a good idea, and did I miss anything?

You adding complexity. Adding generic tags does not audit that conditions are met which has to be on a case by case basis.

Also, don't use unsafe. :innocent:

The safety requirements for unsafe {} code should be encoded in the respective // SAFETY: ... comment. The compiler cannot reason about and hence cannot check those. If it could, we wouldn't need unsafe in the first place.
I believe your fallacy was, that you did not encode the (correct) safety constraints correctly in the respective safety comment, which you should.

If you have different safety requirements for different targets, your code should reflect this as well and provide different impls for each of those targets, each having the respective safety requirements written in their respective // SAFETY: ... comments..

1 Like

I agree that this isn't particularly necessary, considering that thread and memory safety are the only two "real" kinds here (IIUC, the only danger in ISA-unsafety is bad instructions getting executed, leading to violations of one of the other two safeties).

Really, you're just describing invariants, of which there are infinite.

I don't think that's a useful comment. Unsafe is useful. There's a reason Rust bothered to make unsafe easy to audit - because it's necessary.

Of course, don't use unsafe when you can avoid it. But that's very often not the case.

2 Likes

The idea reminds me of this thread:

Redirecting the comment against "using unsafe in the first place", maybe in this case one could separate the unsafeness documentation into two discrete functions?

  1. Standalone unsafe function documenting the input data requirements
  2. Trait implementation stating the platform feature requirement

Of course, this doesn't address the "need to identify all invariants upfront" issue.