How do you make miri understand joined allocations?

i'm implementing a "path" data structure. path in the vector graphics sense, an array of verbs and an array of points.

the memory layout looks as follows:

struct PathMemory {
    header: PathHeader,
    verbs:  [Verb;  header.num_verbs],
    points: [Point; header.num_points],
}

now, since we obviously can't define such a struct in rust (though it would be nice if we could), this layout is implemented manually using unsafe.

the user primarily interacts with &'a PathHeader (which is exposed under the alias Path).

now to the issue:
i have a method verbs(&self) -> &[Verb] implemented on PathHeader, which, according to miri, triggers undefined behavior. (it just works by offsetting the pointer derived from &self and aligning it. the returned slice contains the correct values.)

the issue seems to be that &self only borrows the bytes of the header, but the verbs are after the header, of course. - if i'm reading this log correctly:

error: Undefined Behavior: trying to retag from <3947> for SharedReadOnly permission at alloc1950
[0x20], but that tag does not exist in the borrow stack for this location
   --> /Users/leddoo/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library
/core/src/slice/raw.rs:100:9
    |
100 |         &*ptr::slice_from_raw_parts(data, len)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |         |
    |         trying to retag from <3947> for SharedReadOnly permission at alloc1950[0x20], but t
hat tag does not exist in the borrow stack for this location
    |         this error occurs as part of retag at alloc1950[0x20..0x24]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but
 the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borro
ws.md for further information
help: <3947> was created by a SharedReadOnly retag at offsets [0x14..0x44]
   --> /Users/leddoo/dev/rug/rug/src/path.rs:267:41
    |
267 |         let ptr: *const Verb = cat_next(self, size_of::<Self>());
    |                                         ^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::slice::from_raw_parts::<'_, rug::path::Verb>` at /Users/leddoo/.rustup/t
oolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/raw.rs:100:9: 
100:47
note: inside `rug::path::PathData::verbs`
   --> /Users/leddoo/dev/rug/rug/src/path.rs:268:9
    |
268 |         core::slice::from_raw_parts(ptr, self.num_verbs as usize)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `main`
   --> examples/api/src/main.rs:13:22
    |
13  |     println!("{:?}", path.verbs());
    |                      ^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrac
e


here's the implementation of cat_next:

#[inline(always)]
pub unsafe fn cat_next<T, U>(base: *const T, base_size: usize) -> *const U {
    unsafe { cat_next_ex(base, base_size, core::mem::align_of::<U>()) }
}

#[inline(always)]
pub unsafe fn cat_next_ex<T, U>(base: *const T, base_size: usize, next_align: usize) -> *const U {
    let result = ceil_to_multiple_pow2(base as usize + base_size, next_align);
    #[cfg(miri)] {
        // miri doesn't like int->ptr casts.
        let delta = result - base as usize;
        return (base as *const u8).add(delta) as *const U;
    }
    #[cfg(not(miri))] {
        return result as *const U;
    }
}

#[inline(always)]
pub fn ceil_to_multiple_pow2(x: usize, n: usize) -> usize {
    debug_assert!(is_pow2(n));
    (x + (n-1)) & !(n-1)
}

i've considered modeling the path memory like so:

struct PathData {
    header: PathHeader,
    blob: [u8],
}

this doesn't really work for me however, as it would make Path (a &PathData in this case) a fat pointer. - though i suppose i could do that in cfg(miri) builds only :thinking:


any ideas for how i could make miri understand what's going on?

actually, maybe this is a bug in my code :thinking:

the &self of a Vec doesn't include the bytes of the vec's values either, but somehow miri understands Vec, and not just the stdlib implementation.

(though again, the resulting slice is correct.)

just for completeness, here's the verbs function:

impl PathHeader {
    #[inline(always)]
    pub fn verbs(&self) -> &[Verb] { unsafe {
        let ptr: *const Verb = cat_next(self, size_of::<Self>());
        core::slice::from_raw_parts(ptr, self.num_verbs as usize)
    }}
}

Note that Vec very carefully stays in pointer-land most of the time, to avoid shrinking what it can address. See Vec: avoid creating slices to the elements by RalfJung ¡ Pull Request #61114 ¡ rust-lang/rust ¡ GitHub

1 Like

what i mean is, in Vec::index, for example, a &Vec<T> is turned into a &T.
but the &Vec<T> doesn't actually cover the bytes of the T - that's behind a NonNull<T>.

similarly in this case, the &PathHeader doesn't cover the bytes of the verbs.
the only difference seems to be that, in case of the path, there's just one allocation, and in case of the vec, there are two.

unless i'm doing something wrong here, ThinVec should, by that logic, have the same issues with miri.

edit1: but ThinVec doesn't have the issue. okay, let's see how they do it.
edit2: hmm, they do exactly the same thing. only difference appears to be that i use a &Header, while they have a struct with a NonNull<Header>.

okay, so that was the issue.

making Path a struct with a NonNull<PathHeader> fixes the issue.

that's kinda annoying though, cause that means PathBuf, which owns the actual data, can't Deref to Path, cause Path no longer is a reference.

okay, i'm going to close this.

having a custom struct as the "reference version" is the correct/clean thing to do. (also don't have to leak the header in the public interface)

and that's going to be annoying until this issue is resolved: https://github.com/rust-lang/rfcs/issues/997
(actually, ownership transfer is not what i'm after, but someone commented about the Foo and "borrowed Foo" idea)

Disclaimer: I am a member of T-opsem but speak here as myself and am not representing the team.

This is exactly

and you can read that issue's surrounding discussion for more information. In short, this pattern with references cannot be accepted under Miri's default Stacked Borrows model — reborrowing as &PathHeader restricts the pointer provenance to only being usable for the bytes of PathHeader — and you need to either use raw pointers (i.e. make your Path<'a> type (NonNull<PathHeader>, PhantomData<&'a PathHeader>) instead) or opt in to the newer, more experimental Tree Borrows model[1].

There are two standard SB-compatible workarounds for this:

  • Have PathRef<'a> be the thin pointer, but it also dereferences to a Path fat type which holds a [u8] slice. &Path is expected to be transient. (This is essentially the pattern my crates erasable and slice-dst exist to support, spelling PathRef as Thin<&Path>.)
  • Have Path be an opaque/extern type, and any actual functionality is done by "laundering" the pointer through exposed provenance via ptr2int2ptr casts (type_ascribe!(self, &Path) as *const Path as usize as *const PathHeader). Exposed provenance is angelic nondeterminism to use whichever previously exposed provenance works, and is mostly supported by Miri (though with a warning and not perfect support). This is essentially how bitvec gets away with its &BitSlice APIs.

  1. As an overly simplified summary of the situation, SB is a reasonably good bet for the strictest dynamic borrow model Rust might formally adopt, and TB is much closer to the most permissive model that justifies the optimization properties Rust wants to maintain for references. (E.g. LLVM noalias, roughly C restrict; that active &mut don't alias other active pointers and that active & don't alias active writable pointers.) â†Šī¸Ž

6 Likes

i just saw you added some more stuff, thanks!

though i'm not quite sure how it would apply to my case.

here's my setup:

// this owns the actual path data.
// (paths are immutable, this also implements `Arc` logic)
pub struct PathBuf<A: Alloc> {
    data: NonNull<PathHeader>,
    alloc: A,
}

// this is used by all the apis that take paths,
// so they don't have to be allocator generic, which would be annoying.
// this also implements all the path logic (like iteration), so it would be
// nice, if an owned `PathBuf` could deref to `Path`.
pub struct Path<'a> {
    data: NonNull<PathHeader>,
    phantom: PhantomData<&'a ()>,
}

i guess, i could use repr(C), so the pointers overlap, and then use that second type_ascribe magic thing to implement deref?