Is `alloc` allowed to violate a methods safety requirements?

Consider the current definition of Vec::split_off:

    pub fn split_off(&mut self, at: usize) -> Self
    where
        A: Clone,
    {
        #[cold]
        #[cfg_attr(not(panic = "immediate-abort"), inline(never))]
        #[track_caller]
        #[optimize(size)]
        fn assert_failed(at: usize, len: usize) -> ! {
            panic!("`at` split index (is {at}) should be <= len (is {len})");
        }

        if at > self.len() {
            assert_failed(at, self.len());
        }

        let other_len = self.len - at;
        let mut other = Vec::with_capacity_in(other_len, self.allocator().clone());

        // Unsafely `set_len` and copy items to `other`.
        unsafe {
            self.set_len(at);
            other.set_len(other_len);

            ptr::copy_nonoverlapping(self.as_ptr().add(at), other.as_mut_ptr(), other.len());
        }
        other
    }

it calls set_len before it initialized any of others elements, yet the safety requirements on set_len clearly state:

The elements at old_len..new_len must be initialized.

Is there a reason I'm not seeing that this isn't technically UB?

Should this be fixed?

It is not UB, because UB would only arise if there was some intermediate operation performed, which was optimised based on Vec's safety invariants. The same way having a dangling pointer is not UB, until some operation that is only valid for a non-dangling, non-null pointer is performed.

Code inside a module is allowed to rely on implementation-detail properties of other code inside that module even if the public documentation of those methods is more restrictive. After all, they change together so split_off can change at the same time as set_len.

Well, you have to be technically specific :stuck_out_tongue: It's clearly not language UB, since it didn't violate any language rules, as running MIRI against it can show.

(But it might be library UB.)


That said, if the change is trivial and it better follows always leak first, even if things can't panic, changing it sounds fine.

I would expect such a PR to not just change the code but also improve the safety comments, though -- which would IMHO be more valuable than just whether it meets the docs on set_len.

3 Likes

:smiley: since we're being technically specific MIRI can't show code doesn't contain language UB, (similar to the fact that no amount of tests can show your code is bug free) it can only show that code contains language UB that it's able to detect.

But yea I didn't consider both methods are in the same module nor to differentiate between language and library UB.

MIRI can show that an execution doesn't encounter language UB (at least for the things it can detect, which is at least most), so since you can easily run a program that hits the "new_len is greater than the initialized part" case you can see that that execution doesn't trigger language UB.

What it can't show that code is sound, because soundness is a property of the API, rather than of a particular execution.

Obligatory link to The 'Tootsie Pop' model for unsafe code · baby steps -- basically, all unsafe code depends on the correctness of other code in the same module, even the safe code.

2 Likes

append() does the same.
The example in copy_nonoverlapping() though uses the more cautious approach. This function never panicking is a requirement of not being UB. (const since 1.83.0 but doc reads like in-general there can be panics when just called as regular runtime function.)

set_len is a library concept, which the library defines for its public interface. The library itself doesn't have to follow it in its private implementation, as long as it doesn't do anything to violate language-level safety rules.

1 Like

I understand the second part, but I have no idea how is that related to “Tootsie Pop” and how imagining Tootsie Pop may help you to understand anything.

When dealing with unsafe my own metaphor is of construction site and a crane that operates there with three clearly separate parts: “do not enter area” (where dangerous machinery is in use and civilians are not allowed to enter), the “control cabin” (safe area where the most dangerous part is a pot of tea, but which contains levers that can make the dangerous machinery misbehave and thus is also restricted) and the rest of the world where civilians are allowed.

A more rigorous safety requirement would probably state it in terms of type invariants – constraints that can (and often must be) be temporarily violated by the implementation as long as they're restored before control leaves the abstraction boundary.

Standard library code in general observes the same "as if" rule as the compiler: any rule can be broken as long as the user code provably cannot see it happen.

2 Likes