Some APIs should have an unsafe version

When using VecDeque and BTreeMap in std::collections, I frequently feel the need of an unsafe version of the APIs that return Option or Result. For example,

fn main() {
    let mut deque = std::collections::VecDeque::new();
    deque.push_back(1usize);
    match deque.pop_front() {
        None=>unreachable!(),
        Some(_)=>todo!(),
    }
}

For me, this code works, but 1) has performance compromise when checking whether the popped value is None at runtime (maybe it got optimized away in this toy example but in general case this is not decidable) and 2) there is something aesthetically unsatisfying about having to write the runtime check when I would rather hand a proof to compiler to tell it that deque can not be empty, or, as a second choice, I could at least assume it in the compiler.

I am not sure this would be a misuse of unsafe, but I am thinking about an api of VecDeque

pub unsafe fn pop_front_unsafe(&mut self)->T;
// as a comparison, the pop_front method's signature is as follows:
// pub fn pop_front(&mut self)->Option<T>{}

which is unsafe because the behaviour is undefined when the VecDeque is empty.

With this API, we can write

fn main() {
    let mut deque = std::collections::VecDeque::new();
    let number = unsafe {
        deque.push_back(1usize);
        deque.pop_front_unsafe()
    };
    todo!()
}

This has some advantages:

  1. The assumption is explicitly made, rather than through an indirection stating that if it returns an Option the Option is exactly of variant Some not `None and
  2. No runtime check.
  3. A clear distinction between maintaining the consistent state of the data structure and its permissible operations (among which unsafe belongs to latter, if I am using unsafe correctly). This makes the data structure more extensible. For example, we could have :
pub fn push_and_pop_if_full<T>(&mut self,t:T)->Option<T> {
    unsafe {
        let capacity = self.capacity();
        self.push_back(t);
        if self.len() > capacity {
            Some(self.pop_front_unsafe())
        } else {
            None
        }
    }
}

As far as I know, the above would not be possible without pop_front_unsafe for a user that does not define struct VecDeque.

Some possible problems I see in doing this:

  1. It is not what unsafe is supposed to do, in which case, I can agree to use other syntactical construct/keyword.
  2. This will make the users of the data structure too powerful and give them too much flexibility.
  3. The APIs would still be too restrictive for many cases anyway, and ultimately we need the VecDeque owner to implement the actual best AND safe APIs for us. For example, we might consecutively insert n elements and then pop the front if needed until its capacity is not exceeded. To efficiently implement this might require touching the buf private field directly.

To make it short, it seems to me that it might be better if we have two (or more) levels of APIs, where at some higher levels the consistent state is maintained with a set of safe and well-defined operations and at lower levels the more fine-grained control which allows for composition of more complex operations that is also safe but more efficient, whose enumeration is a mathematically undecidable problem but whose individual cases could be relatively easy.

What do you think?

There is unsafe unreachable_unchecked in std::hint - Rust, which at least covers the "runtime check" issue.

4 Likes

also, there's unwrap_unchecked for Option and Result:

fn main() {
    let mut deque = std::collections::VecDeque::new();
    deque.push_back(1usize);
    let _ = unsafe { deque.pop_front().unwrap_unchecked() };
}
7 Likes

I would love to see the benchmark where the overhead of the emptyness check or pop_front matters. Mind you there's going to be case distinctions anyways to handle the wrap-around in the VecDeque's ring buffer.

As someone already mentioned, you can already unsafely assert there's no None return value using unreachable_unchecked (edit: or even more conveniently unwrap_unchecked which was also mentioned now, good call @nerditation!) so unless that doesn't optimize as good as it could, the benchmark should already bepossible. And as we all (should) know, unsafe code is only ever worth all the potential troubles it entails if you can prove that it measurably improves performance.

As for APIs, there's not only the burden of them being ever useful but commonly useful. So besides demonstrating that an unsafe unchecked pop_front is ever useful at all, one would need to demonstrate that scenarios where it can be useful are at least somewhat common, i. e. not so incredibly rare that those use cases couldn't get by using unreachable_unchecked instead, and also not so rare that the existence of the unsafe API wouldn't have a majority of users who didn't actually need to be unsafe to begin with.

8 Likes

I have the opposite problem with Vec::remove.
It checks if index is in bounds, but just panics instead of returning None.

3 Likes

The existence of this method actually brings a question for me, which is,

Why do we even allow an unwrap API without being marked unsafe to exist in the first place? Don't we always semantically need to put an axiom (which is what I translate unsafe keyword into in my mind) in a compiler to actually do the unwrapping?

Panics and aborts are memory safe, or speaking more generally, panics (and aborts) don't violate any of Rust's safety guarantees. Panicking or aborting is sometimes the only way to avoid such violations, in fact. Or looping forever I suppose.

4 Likes

In non-generic contexts, returning early with some dummy or garbage values is another alternative. Like the dreaded -1 error values. Oh the horrors of (imagining) an Option<u8> -> i16 unwrapping method that returns -1 for None. Did I say Option<u8> -> i16? I meant Option<i16> -> i16, because who needs unambiguity!

1 Like

Even for something like Vec::pop_back I'd be very surprised if it mattered, because the code needs to load the length regardless in order to decrement it, and once the value is in a register anyway, the check is essentially free.

That said, unwrap_unchecked seemingly doesn't optimize out the length check in practice: https://rust.godbolt.org/z/Wf3c4e33M

It looks like the same underlying issue as `.next().unwrap_unchecked()` on Iterator doesn't optimize as expected · Issue #107681 · rust-lang/rust · GitHub, which apparently on the LLVM side boils down to In a function with a `noundef` return value, replace `ret undef` with `unreachable` · Issue #60717 · llvm/llvm-project · GitHub, which has a patch to fix it but needs clang+libstdc++ to stop being UB before it can land.

5 Likes

The meta-answer here is that unsafe is only for things that make it impossible to see the behaviour in a test. (You can't test for UB in the Rust program itself, because if you hit UB then the program can do whatever it wants -- including appearing to pass the test. Thus you need outside-the-program things to check for it, like MIRI.)

There's an infinite number of correctness problems you can still have in safe Rust code. But at least you can test for them, log about them, etc.

3 Likes

As a question from someone trying to learn, would it be more correct to write

fn main() {
    let mut deque = std::collections::VecDeque::new();
    let _ = unsafe { 
        deque.push_back(1usize);
        deque.pop_front().unwrap_unchecked() 
    };
}

because that single statement pop_front is still unsafe but together with push_back the safety issue is well contained? Or should we just wrap around the call to unsafe function like in yours?

Generally, it is recommended to keep operations which are not unsafe out of an unsafe block, to highlight which operations are actually unsafe and need careful attention, and avoid e.g. accidentally doing another unsafe operation. But this is a readability judgement call. The semantics of the code are the same either way.

1 Like

To add to @kpreid's answer, it's also strongly recommended to add a // Safety: ... comment above the unsafe block which explains why the code is sound (e.g. we just pushed an item into the list, so you'll always get a value when popping).

For example,

queue.push(...);

// Safety: we just pushed an item onto the queue
let item = unsafe { queue.pop_front().unwrap_unchecked() };

Something to keep in mind is that it's quite frowned upon to be using unsafe code when there isn't a pressing need for it (e.g. because it's something innately unsafe like FFI) or if you can't show why not being able to optimise out that branch is a deal breaker in terms of performance. Often, excessive use of unsafe is one of the first things that will jump out at people when reviewing your code and if it's not obvious why the unsafe was needed, they'll get the impression that you are a bit of a cowboy.

The problem with excessive unsafe is that you erode away a lot of Rust's safety guarantees. Sure, you might know that a particular assumption is correct at the time of writing, but by using an unsafe operation you are promising to the compiler that you'll make sure the assumption holds for perpetuity. Doing this once or twice may be justified, but as we've seen by the billions of bugs introduced into code due to incorrect assumptions in the last 70 years, humans aren't always the best at making sure their assumptions are a) correct, and b) stay correct after refactoring or when a new developer takes over.

6 Likes

In particular, every well designed unsafe operation should list the requirements on the caller to keep the operation safe, and every unsafe block should justify why those requirements are met in this block.

unsafe in rust has a specific definition, it's not a "casual" term that you can use to describe anything you think is "bad" or "wrong". safety and correctness are orthogonal concepts. unsafe code can be proved to be correct, while safe code might do the wrong thing.

let v = vec![42];
// safe, and correct
let x = v[0];
// safe, but WRONG
// will panic at runtime, but will NOT trigger MIRI
let x = v[1];
// safe, nothing wrong to a raw pointer itself
let p: *const i32 = v.first().unwrap();
// unsafe, yet correct, safety is ensured by contract:
// SAFETY: v is non-empty with long lifetime, `p` is derived from a valid `&i32` reference
let x = unsafe { p.read() };
// unsafe, and is clearly UB, and MIRI will catch this one.
// can not write to pointer whose provenance is shared immutable reference
// although the code might do "something" without crash at runtime
unsafe { p.cast_mut().write(666); }
// unsafe, and also UB, pointer out of bound is invalid
// it probably will crash at runtime
let x = unsafe { p.offset(0x12345678).read() };
4 Likes

Kind-of tangent to the actual topic, but wouldn't it be better to have a safe method that inserts an element and returns a reference to it? Something like:

fn main() {
    let mut deque = std::collections::VecDeque::new();
    let _item = deque.push_back_get(1usize);
    // do something with _item
}

It's almost the same I would say:

fn runtime_checked_version<T>(deque : &mut std::collections::VecDeque<T>, t:T)->&T{
	deque.push_back(t);
	match deque.back() {
		None=>panic!("Trust me dynamically this is impossible"),
		Some(t)=>t,
	}
}

fn compiletime_assumed_version<T>(deque: &mut std::collections::VecDeque<T>, t:T)->&T {
	// SAFETY: Trust me statically this will always be safe
	unsafe{ 
		deque.push_back(t); 
		deque.back().unwrap_unchecked() 
	}
}

Reminds me of when I first started using Rust to write a game engine type thing. I automatically gravitated towards spamming Results and ? because that was less code to write and seems more "correct" than unwrap(). But that caused me a lot of pain later on because having panics on site would be much more helpful, especially during the learning phase.

Therefore having an easy to write unwrap() is crucial to Rust's success, even if it's not the best thing to do most of the time.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.