Should I use unsafe merely to encourage users to maintain invariants?

I'm writing some code that wants to be able to rely on an invariant that the complier can't check. So I'm writing a type which enforces that invariant during construction. However, I want to be able to have code that happens to uphold the invariant already, to be able to use the functions which require that type.

If the invariant I want to be able to rely on is not upheld, then There shouldn't really be any memory unsafety. At worst there might be panics due to out of bounds indexes. Should I use unsafe fn to allow creating instances of my type without checking the invariants? These previous threads seem to suggest that using unsafe in this way is fine, and won't cause UB but I'm still a unsure enough to ask here.

As a simplified example let's use the property of whether a Vec is sorted

struct Sorted<T>(Vec<T>);

impl <T: Ord> Sorted<T> {
    fn new(mut vec: Vec<T>) -> Self {
        vec.sort();
        Sorted(vec)
    }

    unsafe fn new_unchecked(vec: Vec<T>) -> Self {
        Sorted(vec)
    }
}

Is it a good idea to use unsafe on new_unchecked? If I thought I was going to make new_unchecked a lot more complicated later, would it still be a good idea?

4 Likes

No, this isn't a good use of unsafe. By allowing this we water down what unsafe means, which is dangerous because then code that is really unsafe won't be given the attention needed. By labeling it unchecked you are telling the user that to make sure that their vector is sorted, and if they have a logic bug, that's on them.
*_unchecked is the safe way to say, there are some invariants that you need to check, but they are not world ending UB inducing dragons if you don't check them, just some code won't work as expected.

12 Likes

I agree with @RustyYato here -- Rust has a bunch of functions that can easily do bad things, but they're not memory-unsafe so they're not unsafe. (mem::forget and Box::leak being two obvious examples.)

We already get enough people thinking "how bad could it be?" and doing UB; please keep unsafe being memory-unsafe only to reinforce that unsafe really means unsafe. (The usual example I hear for this one is people thinking that str needing to be UTF-8 isn't actually about memory safety, but it actually is because the code that processes strings takes advantage of that in ways that break memory safety if the string isn't UTF-8, such as processing subsequent bytes based on the leading byte of the set of encoding units.)

The name being somewhat longer is sufficient to discourage people from using the unchecked version. (You can also consider debug_asserting the condition even in the "unchecked" version.)

7 Likes

I'm going to disagree and hope someone will explain to me the difference between the proposed Sorted<T> and something like String.

  • Sorted<T> is a Vec<T> that has the additional invariant of being sorted. String is a Vec<u8> that has the additional invariant of being valid UTF-8.
  • Unsafe code is allowed to assume that a String contains only valid UTF-8. It seems reasonable and potentially useful to allow unsafe code to assume that a Sorted<T> is sorted.
  • String::from_utf8_unchecked is an unsafe fn because it allows you to violate String's invariant. I don't see why Sorted::new_unchecked is different: it allows you to violate Sorted's invariant.
  • There's nothing fundamentally unsafe about a Vec<u8> being non-UTF-8; it's just part of String's contract that it is UTF-8. Again, there's nothing fundamentally unsafe about a Vec being unsorted, but it still seems reasonable for a collection type to have a contract that it just is sorted.

By marking the function unsafe, you're not "watering down" unsafe, you're just signaling that calling the function requires manually upholding an invariant that the compiler can't check for you -- that's unsafe's entire raison d'être, no?

6 Likes

The OP explicitly stated that this is not assumed. Unless there is a plan to allow that then marking the function unsafe is requiring additional invariants from users that will not be used.

1 Like

That's not accurate. A more accurate statement, as @scottmcm discussed above, would be, "String::from_utf8_unchecked is an unsafe fn because it allows you to violate an invariant that is relied on internally in order to uphold memory safety." Notice that it isn't just any old invariant; there must be something to tie that invariant to undefined behavior if it's violated in order for unsafe to be correct.

As an example, if Sorted internally made assumptions about the ordering of elements in a way that memory safety would be compromised if the elements weren't in the right order, then it would be permissible to have an unsafe fn new_unchecked. Otherwise, if it's just an invariant but no memory safety is violated (as indicated by the OP), then a plain fn new_unchecked is suitable. (And I would even document that safety is not compromised if the invariant isn't upheld in the docs for that constructor.)

13 Likes

My only answer to this argument is that if you can't rely on an invariant while writing unsafe code, it doesn't seem like much of an invariant. More like a.. suggestion. But I concede the point.

2 Likes

I'm not entirely certain. A security flaw is worse than a segfault. Allowing code to go off the intended state machine (whether that be of the compiler or as make by the developer * think make own interpreter *) is not something you want someone using your library to allow happen without explicitness.

Is a blackbox / implementation detail. Being not marked unsafe prohibits future change to unsafe on a stable API.

I can't remember who said this before, but basically the bottom line about Rust is: Freedom from undefined behavior, even in the presence of bugs in safe code.. This is in contrast to, say, C++, which only protects you from UB in the absence of bugs, period.

So yes. Unsafe code cannot rely on invariants being upheld by arbitrary code. It can only rely on invariants being held by specific code. e.g.

pub fn get(slice: &[i64], index: usize) -> i64 {
    assert!(index < slice.len());
    unsafe {
        slice.as_ptr().offset(index).read()
    }
}

The unsafe code above relies on:

  • The precondition checked above it. (i.e. safe code in the same crate)
  • The invariants of &[T].
  • The correctness of the implementation of &[T] in the standard library. (i.e. upstream code)

If it turns out that e.g. the third bullet fails to hold, we can simply fix the standard library. But we can't fix downstream code.


Now contrast with BTreeMap. Ideally, an implementation of Ord::cmp should:

  • Be transitive.
  • Be antisymmetric.
  • Have reflexive equality.
  • Have definitions of e.g. lt and eq that match

...and it probably shouldn't panic. But unsafe code in BTreeMap cannot rely on any of these invariants! If somebody writes a rogue Ord impl, the implementation of BTreeMap still must not invoke undefined behavior.

However, let's say you really do have some unsafe code that needs to rely on an invariant being held by arbitrary code. For instance, suppose you wanted some FastBTreeMap that was able to assume that the Ord impl was correct. To this end, Rust has unsafe traits:

pub struct FastBTreeMap<T> {
    ...
}

/// Marker trait for types that correctly uphold
/// the invariants of Ord, and whose Ord impls
/// do not panic.
///
/// # Safety
/// It is undefined behavior to implement this trait
/// on a type whose Ord impl panics, or that fails
/// to be transitive or antisymmetric.
pub unsafe trait TrustedOrd: Ord { }

impl<T: TrustedOrd> FastBTreeMap<T> {
    // all the standard map methods...
}

Now, if somebody writes a rogue Ord impl and wants to use it in FastBTreeMap, they have to use unsafe:

struct Naughty;

impl PartialEq for Naughty {
    fn eq(&self, other: &Naughty) -> bool { false }
}

impl PartialOrd for Naughty { /* similar nonsense */ }
impl Eq for Naughty { }
impl Ord for Naughty { /* similar nonsense */ }

unsafe impl TrustedOrd for Naughty { }

fn main() {
    // use Naughty in a FastBTreeMap
}

And so as you can see, this is no longer the case of unsafe code relying on the correctness of just any arbitrary code. The downstream code was forced to forced to write unsafe and acknowledge the threat of undefined behavior.

19 Likes

The risk of using unsafe fn to uphold arbitrary invariants is that it undermines the purpose of unsafe to locate the errors that lead to bad behavior. One way it does this by making any invalid trait impl potentially lead to undefined behavior, as some unsafe code could be relying on a proper implementation. (Implying that almost every trait should be unsafe trait, and... where does it stop?)

I would recommend that anyone who has not yet read the Rust Nomicon, and wants to read unsafe code, read the Rust Nomicon first.

You can do so here.

2 Likes

You've fallen prey to the common misconception that UB causes segfaults. If that happened reliably, it would be defined behaviour -- see stack overflow, for example, which we make safe by defining it to about the program, perhaps via a segfault.

A segfault is the best possible outcome of UB. But it's far more insidious than that -- the real problems are when it bypasses security checks (like happened to Linux once), reads the wrong memory (see Heartbleed), etc.

10 Likes

(O.T. I was thinking explicitly about stack overflow being considered safe, but chose to stick with brevity. Of course you have no idea when unsafe UB code will be exploitable.)

I think some of the replies are going off-track. The topic is about extending the use of unsafe beyond what rust considers safe. (This is unsafe but mark it safe is a big no to ever doing.)

For something that considers being exclusively comparable with the language then sticking to the nomicon is fine;
"Rust is otherwise quite permissive with respect to other dubious operations. Rust considers it "safe" to:"
"Delete the production database"
Rust has no concept of production database or security flaw (a bit broad so say information leak, e.g. password/security-key) or ACID.
Library authors have these concepts.

maybe _unchecked is enough for such search but unsafe does give explicitness in the first case and discourages will-full use. (Think of like #[must-use])

1 Like

I interpreted the OP as asking a question about what the correct thing to do is. Make no mistake about it, there is indeed an unambiguous correct thing to do here in the context of how Rust is written today, and that's to not use unsafe for things that are unrelated to undefined behavior.

The separate question of whether this "should" be correct or not has been debated quite a bit in the past via multiple distinct issues. The two most prominent of those, that I can think of, was the decision to explicitly make leaking safe and to make explicitly make catching panics safe. However, it would be a gross mischaracterization to say that Rust (or the standard library) does not care about either memory leaks or panic safety, because both things are quite well protected in Rust programs. (Leaks are almost always correctly handled by Drop and panic safety is upheld via the UnwindSafe trait.) They simply do not fall under the category of unsafe, which is exclusively the territory of undefined behavior.

13 Likes

Note, BTW, that the following code from @Ryan1729's opening post is not sufficient for unsafe to be able to trust that a Sorted is actually sorted.

    fn new(mut vec: Vec<T>) -> Self {
        vec.sort();
        Sorted(vec)
    }

That's because the Ord implementation can be wrong, so the caller needs to use unsafe somewhere to promise that it's actually correct.

(@ExpHP already talked about this, but I wanted to re-emphasize it. The unsafe trait TrustedOrd in their post is one way to make the "promise" I mention in the previous paragraph.)

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.