Language: annotate `unsafe` with invariants?

Hi,

I was wondering whether it would make sense for Rust to offer an "extension" of unsafe whereby we could annotate it with something denoting the invariants that it expects from callers.

For example, something like unsafe<'a> would denote the invariant 'a that callers must assert (which would be documented somewhere).

This idea comes from the following code snipped: assume that we store an array of utf8-encoded u8's of a fixed size size as &[u8].

The function to get a specific element i is

impl MyUtf8Array {
    fn value(&self, i: usize) -> &str {
        let slice = &self.values[i * self.size..(i+1) * self.size];
        // soundness: we checked that every item is valid utf8 on the constructor
        return unsafe {std::str::from_utf8_unchecked(slice)}
    }
}

we also provide an extra method,

impl MyUtf8Array {
    /// # Safety
    /// `i` must be smaller than `self.values.len() / self.size`;
    unsafe fn value_unchecked(&self, i: usize) -> &str {
        // soundness: the invariant.
        let slice = self.values.get_unchecked(i * self.size..(i+1) * self.size);
        // soundness: we checked that every item is valid utf8 on the constructor
        return std::str::from_utf8_unchecked(slice)
    }
}

the issue is that, on the second example, there are two independent invariants (say 'a and 'b) that we must uphold, and the fn unsafe masks them both.

If the from_utf8_unchecked's unsafe would be annotated (e.g. 'b), we could write things like

impl MyUtf8Array {
    /// # Safety
    /// `i` must be smaller than `self.values.len() / self.size`;
    unsafe<'a> fn value_unchecked(&self, i: usize) -> &str {
        // soundness: the invariant.
        let slice = self.values.get_unchecked(i * self.size..(i+1) * self.size);
        // soundness: we checked that every item is valid utf8 on the constructor
        return unsafe<'b> {std::str::from_utf8_unchecked(slice)}
    }
}

for backward compatibility, when a user writes unsafe without any annotation, we would assume that it covers all invariants inside the block it annotates.

This could make it easier to read which invariants each part of the code base covers. We could even use things like 'a < 'b to indicate that invariants 'b covers all invariants 'a.

Not sure, trying to express a problem whereby a block of unsafe masks certain invariants just because we have no mechanism to express the existence of different invariants in the language.

2 Likes

It's certainly an interesting idea.

Introducing new syntax for this is probably overkill, but what about adding a #[safe] attribute to the block that documents invariants/assumptions?

let x: u32 = 42;
let x_ptr = &x as *const u32;

#[safe(
    reason = "The pointer points to a valid integer",
    requires = "!x_ptr.is_null()"
)]
unsafe {
    assert_eq!(42, x_ptr.read());
}

https://crates.io/crates/safe

2 Likes

Naming local things remotely isn't a great pattern. Is the idea to help reviewers by forcing some sort of indicator of exactly which function calls are unsafe?

I think unsafe in unsafe could help, in combination with keeping unsafe blocks as small as possible.

impl MyUtf8Array {
    /// # Safety
    /// `i` must be smaller than `self.values.len() / self.size`;
    unsafe fn value_unchecked(&self, i: usize) -> &str {
        /* n.b. cannot use `unsafe` without an `unsafe` block in here */
        // soundness: the invariant.
        let slice = unsafe { self.values.get_unchecked(i * self.size..(i+1) * self.size) };
        // soundness: we checked that every item is valid utf8 on the constructor
        unsafe { std::str::from_utf8_unchecked(slice) }
    }
}

Although in this particular case, it's pretty clear from the function names (_unchecked) that there's an invariant to follow up on.

1 Like

One thing that doesn't help with, though, is the "there are three invariants but you only checked two of them" problem, which is the core thing I'd like from a feature like how I interpreted the OP.

Strawman: the function could have

#[safety_conditions(
    aligned_pointer = "The pointer `p` must be correctly aligned.",
    n_bytes = "`p` must be the start of `n` bytes of unitialized memory",    
)]

And then the lint would require that the unsafe block in the caller would have, say,

#[safety_reasons(
    aligned_pointer = "it came from a reference, so it's aligned correctly",
    n_bytes = "we static_assert that the type is large enough"
)]

where if either of them is missing it lints that you forgot to check something.

(That attribute on the function could even generate a #[doc] for the # Safety section of the rustdoc.)

3 Likes

Maybe I should have been more explicit:

we have no mechanism to express the existence of different invariants to the compiler in the language

Yes, comments are the current documentation of unsafe. imo the fact that many of such comments exist out there (with even "rules" on how to document unsafe inline and in docs), hints that people have been formalizing how to express these invariants.

The idea here would be to try to come up with a mechanism to express invariants (or set of invariants) to the compiler / linter itself, so that it can guide users whenever some invariant is missing.

I think I've seen a few of those type of crates : contracts and libhoare before it. I think I saw another one which seemed quite good and acted at compile time and was designed to specifically for unsafe code but can't remember its name.

Are you thinking of Dependent Types?

This is where a value's type can be influenced by the value itself. The simplest example is an array which knows its length at the type level. You could imagine writing code like this:

// If we combine an array of length N and an array of length M, we get 
// an array of length N+M.
fn combine_arrays(first: Array<N>, second: Array<M>) -> Array<N + M> { ... }

// by using the min() function we know certain things about the 
// return value
fn min(first: u32, second: u32) -> u32 
where 
  return_value <= first,
  return_value <= second { ... }

// Make sure indexing only compiles when your index is within bounds
// at compile time.
impl Index for Array<N> {
  fn index(&self, index: usize) -> &T where index <= N { .... }
}

// We can make certain assertions about the arguments and return
// value and the compiler will fail to typecheck if they aren't satisfied.
fn get_ptr(array: Array<N>, index: usize) -> *const T
where
  index <= N,
  return_value != null,
  self.start() <= return_value < self.end(),
{ ... }

I know people have asked about adding a full dependent type system to Rust in the past, but it hasn't proceeded past the RFC stage. As alluded to by the syntax of my example you can get a limited form of this using const generics, but that probably won't take it as far as you like.