Understanding safety limitions of slice::from_raw_parts

Hi everyone,

I have this rust function:

extern "C" fn foo(ptr: *const c_char, len: c_uint){
    let s = unsafe { slice::from_raw_parts(ptr as *const u8, len as usize) };
    ...
}

I checked every safety requirements of slice::from_raw_parts for my case and it's ok except for this:

The entire memory range of this slice must be contained within a single allocated object! Slices can never span across multiple allocated objects. See below for an example incorrectly not taking this into account.

I have no way to guarantee this in my c code.
The only thing that is guaranteed is that ptr points to len contiguous bytes, It maybe from multiple malloc allocations, but it is contiguous.

My question is what exactly does allocated object mean? and why this safety requirement exist?

I'm not sure if it is ever possible to guarantee, even in C, to be honest. And even if it is, I'm not sure if it's not UB in C to treat these multiple malloc allocations as a single continuous blob of bytes - on stack that's definitely disallowed.

1 Like

I don't think this is a real problem to worry about. malloc doesn't guarantee that it will ever give two adjacent blocks of memory. In practice an allocator will add metadata before every allocation, that makes it impossible for two allocations to be contiguous. Or will use buckets that leave unused padding around allocations, also making them discontinuous in most cases. C can never rely on two malloc calls to group together as one contiguous allocation.

The warning is mostly Rust-specific about trying hacks to undo slice.split_at, but with malloc that issue doesn't come up. C doing some pointer arithmetic to "split" a memory block from malloc doesn't count as separate allocations for Rust's purposes.

3 Likes

It's LLVM terminology. Here's some quotes from about half-way down this page.

In particular, ptr::offset will cause us a lot of trouble, because it has the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to not have dealt with this instruction, here's the basic story with GEP: alias analysis, alias analysis, alias analysis. It's super important to an optimizing compiler to be able to reason about data dependencies and aliasing.

[...]

When you use GEP inbounds, you are specifically telling LLVM that the offsets you're about to do are within the bounds of a single "allocated" entity. The ultimate payoff being that LLVM can assume that if two pointers are known to point to two disjoint objects, all the offsets of those pointers are also known to not alias (because you won't just end up in some random place in memory). LLVM is heavily optimized to work with GEP offsets, and inbounds offsets are the best of all, so it's important that we use them as much as possible.

[ ...]

These cases are tricky because they come down to what LLVM means by "allocated". LLVM's notion of an allocation is significantly more abstract than how we usually use it. Because LLVM needs to work with different languages' semantics and custom allocators, it can't really intimately understand allocation. Instead, the main idea behind allocation is "doesn't overlap with other stuff". That is, heap allocations, stack allocations, and globals don't randomly overlap. Yep, it's about alias analysis. As such, Rust can technically play a bit fast and loose with the notion of an allocation as long as it's consistent.

If the resulting slice covers different malloc operations, it's language (or at least library) level UB. You may be able to "get away with it" if there's no way for LLVM to know that.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.