Unsafe code review request for the inout crate

In RustCrypto we would like to generalize over in-place and buffer-to-buffer modes of operation in our cipher crates. For this purpose I've designed the inout crate which contains several custom reference types built on top of raw pointers. As expected, the crate contains a significant amount of unsafe code. I've done my best to provide safe and robust API without sacrificing potential performance and to my best knowledge the crate does not contain any unsound code, but it would be nice to get a double check from a pair of fresh eyes.

You can find the crate code in this PR: inout: initial crate by newpavlov · Pull Request #675 · RustCrypto/utils · GitHub

Thank you in advance!

1 Like

For everyone who wants to read the documentation of that crate:

AFAICT the best way is something like

git clone -b add_inout https://github.com/RustCrypto/utils.git && cd utils/inout && cargo doc --open
impl<'a, T> InOutBuf<'a, T> {
    /// Create `InOutBuf` from a single mutable reference.
    pub fn from_mut(val: &mut T) -> Self {
        Self {
            in_ptr: val as *mut T as *const T,
            out_ptr: val as *mut T,
            len: 1,
            _pd: PhantomData,
        }
    }

seems like a lifetime might be missing?

1 Like

Seems like a type like InOut<'a, T> could also be implemented with an enum, with variants Same(&'a mut T) and Separate(&'a T, &'a mut T). I suppose performance reasons are why this crate exists (and uses unsafe code)? If so, at least an API-soundness could be achieved by providing an alternative unsafe-free implementations with enums. Perhaps with feature-gates. This way an oversight like a missed lifetime annotation becomes impossible.

It’d also be a good instrument to demonstrate the usefulness of the unsafe code in the first place (since performance comparison becomes trivial).


Edit:

While we’re already at it, looking at something like

            in_ptr: val as *mut T as *const T,
            out_ptr: val as *mut T,

I’m not sure, but it might be necessary to create the pointer first, once, and then copy the pointer.

Thank you! I've applied your patch with the lifetime fix.

It would introduce branching into the very heart of our hot loop code and additional register usage to store the enum tag. Granted, it's a very easy case for branch predictor, but I still would like to get the best possible performance at this level even at the cost of a more complex code.

Also it would make harder to wrap API into C API.

I don't think so. val has type &mut T, so its address should be stable. Though to be extra-safe I will create a temporary *mut T as you suggest.

Just my intuition on how stacked-borrows work. Creating the second pointer from the mutable reference might invalidate the first one. Certainly it can’t hurt to create the pointer first and copy it.

Testing something like

fn main() {
    let mut x = 42;
    let y = &mut x;
    let z1: *const i32 = y;
    let z2: *mut i32 = y;

    let _x = unsafe { *z1 };
    unsafe { *z2 = 1 };
    let _x = unsafe { *z1 };
}

miri seems to agree with

MIRIFLAGS="-Zmiri-track-raw-pointers" cargo +nightly miri run

Compared to


fn main() {
    let mut x = 42;
    let y = &mut x;
    let z: *mut i32 = y;
    let z1: *const i32 = z;
    let z2: *mut i32 = z;

    let _x = unsafe { *z1 };
    unsafe { *z2 = 1 };
    let _x = unsafe { *z1 };
}

Hm, the following code passes MIRI without issues:

fn main() {
    let mut x = 42;
    let y = &mut x;
    let z1 = y as *mut i32 as *const i32;
    let z2 = y as *mut i32;

    let _x = unsafe { *z1 };
    unsafe { *z2 = 1 };
    let _x = unsafe { *z1 };
}

Interesting.

Just to check, are you running with MIRIFLAGS="-Zmiri-track-raw-pointers"?

Yes, I tested it.

In any case, the difference is subtle enough that it’s still worth taking the clearly uncontroversial approach of creating the pointer first and copying that, IMO :slight_smile:

2 Likes

I think the operational difference is that you've written let z: *mut _ = y;, as opposed to let z = y as *mut _;. The former "uses" y as a mutable reference, the latter only as a pointer.

I don't think these should behave differently, though. Perhaps this is a side effect of what allows &raw to work? (Or rather, what allows &place as *const _ to not manifest a reference.)

cc @RalfJung

From my tests, the relevant difference seems to be whether a *mut T is created first, or &mut T “directly” turns into *const T.

That holds with my hypothesis; (x: &mut _) as *const _ coerces the &mut to *mut before casting to *const. Or maybe it coerces to & and then casts to *const, I'm not sure.

Either way, it seems that let y = x as *mut _ is not doing an SB retag operation, whereas let y: *mut _ = x is.

Maybe I wasn’t clear. I can observe no difference in behavior of miri between let y: *mut _ = x and let y = x as *mut _ at all. The only difference I can observe is between whether *const _ is created from &mut _ directly or whether *mut _ is created first. Any choice between explicit casting with as or coercing with a type signature seems to be insignificant. I haven’t tested / compared with creating &_ from &mut _ first to get a *const _. I won’t be doing any further tests on this today.

1 Like

Those seem to generate the same MIR, so that should not make a difference.

That definitely makes a difference: the generated MIR for the latter case will turn &mut T into &T first and then turn that into *const T.

*const vs *raw does matter when converting from an &mut; with *const the result is a read-only pointer that may not be used for writing. Basically, when you cross the boundary between references and pointers, you have to use the right type (*const vs *mut, and alkso the right T to determine how much memory is accessible by this raw ptr, and interior mutability in case of *const). Afterwards, the raw ptr type no longer matters. Also see:

1 Like

Okay, so the “turn into &T first” part makes sense. What I’m not fully understanding is how

fn main() {
    let mut x = 42;
    let y = &mut x;
    let z1 = y as *mut i32;
    let z2 = y as *mut i32;
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
}

is not a problem according to miri. What is the value being coerced into *mut i32 here when writing y as *mut i32? It doesn’t seem to be a re-borrow, because

fn main() {
    let mut x = 42;
    let y = &mut x;
    let z1 = &mut x;
    let z1 = z1 as *mut i32;
    let z2 = &mut x;
    let z2 = z2 as *mut i32;
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
}

is not accepted by MIRIFLAGS="-Zmiri-track-raw-pointers" cargo +nightly miri run. Is this difference genuine or is the former code falsely accepted by miri? Or is y as *mut i32 somehow special in not consuming the passed reference and also reproducibly creating copies of the same pointer that are allowed to alias? This seems a bit dangerous to me because re-borrowing mutable references happens implicitly all the time, so for example

fn main() {
    let mut x = 42;
    let y = &mut x;
    let z1 = y as &mut i32 as *mut i32;
    let z2 = y as *mut i32;
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
    unsafe { *z1 = 1 };
    unsafe { *z2 = 1 };
}

is rejected by miri, too.

FWIW, I find this surprising as well. The only explanation would be that the MIR lowering (it looks like y as *mut i32 acts as a ptr::addr_of_mut!(*y)) wouldn't count as a "full usage" of y, whereby not activating its assertion of unicity, and directly yielding a pointer with "Shared ReadWrite" provenance.

It mainly means that (y) as i32 features different semantics than those of {y} as *mut i32 or { &mut *y } as *mut i32 which, I think, can be legitimately surprising, but it also kind of makes sense when we think about it.

If foo as *mut i32 is really supposed to be equivalent to ptr::addr_of_mut!(*y), and not equivalent to { &mut *y } as *mut i32, then perhaps, we have to add as-casting (to a pointer type) to the list of place expression contexts?