What is the difference between ptr::offset() and ptr::wrapping_offset()

These two methods seem to do almost the same thing except that one is safe and not the other.

From a C programmer's perspective, offset arithmetic on a pointer is as simple as an add/sub operation on usize type.

So, why rust has two distinct offset methods? What is the concern?

The statement below comes from the doc of ptr::wrapping_offset():

The resulting pointer “remembers” the allocated object that self points to; it must not be used to read or write other allocated objects.

In other words, let z = x.wrapping_add((y as usize) - (x as usize)) does not make z the same as y even if we assume T has size 1 and there is no overflow: z is still attached to the object x is attached to, and dereferencing it is Undefined Behavior unless x and y point into the same allocated object.

`The statement, "let z = x.wrapping_add((y as usize) - (x as usize)) does not make z the same as y even if we assume T has size 1 and there is no overflow", confused me a lot.

I have written a simple toy program for testing:

fn main() {
    let s: &str = "123456789";
    let x: *const u8 = s.as_ptr();
    let y: *const u8 = (s.as_ptr() as usize + 4) as *const u8;
    let off: isize = (y as isize) - (x as isize);
    println!("off {:?}", off);
    let z = x.wrapping_offset(off);
    println!("x {:p}, y {:p}, z {:p}", x, y, z);
}

Output:

off 4
x 0x102ff6920, y 0x102ff6924, z 0x102ff6924

This tells me that z now points to the same address as z points to. What does "z is still attached to the object x is attached to" mean?

C has the same limitations.

I know that C has such a limitation. But in C, all codes are "unsafe", the programmer can cast pointers to integers and do the math on that, and then convert integers back to pointers. The C programmer is obliged to make sure the pointer is valid.

However, in Rust, dereferencing a raw pointer is certainly "unsafe". But why ptr::offset() is "unsafe" as well? To me, ptr::offset() can return any address that is the result of arithmetic on raw pointers, and it is safe because we don't dereference it.

I cannot clarify the difference between wrapping_offset() and offset(). Mabay, some examples can explain.

First of all, with the LLVM backend they produces different LLVM IR.

Cast to integer, do some math, and cast it back is NOT same as pointer arithmetic. You can see they do produce different LLVM IR even in -O3.

In C, plain pointer arithmetics itself is UB if it produce some pointer beyond the object boundary. That's why ptr::offset() itself is unsafe fn. wrapping_offset() preserves the pointer's provenance even if it's not a valid pointer to the same object. But dereferencing it still is UB.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm

https://www.ralfj.de/blog/2020/12/14/provenance.html

2 Likes

I think that I have reached that point.

The final assembly codes are identical, but the intermedia llvm-irs are different. These two methods reach the same solution but in different ways: try the best to helper the compiler optimize code.

Many programmers think that. They're wrong, which is a general theme that many of Ralf's blog posts discuss.

I've never actually encountered a time where wrapping_offset was the right choice. While it may be safe, that doesn't mean it's necessarily useful. It's like how let p = 7 as *const i32; is safe, but there really isn't anything useful you can do with it.

Basically, I'm not aware of any situation in which *p.wrapping_offset(i) works but *p.offset(i) doesn't. It's just a matter of whether you hit the UB in the offset call or in the *.

This is clearer when you look at the inverse function. There's an offset_from, and there used to be a wrapping_offset_from, but it was removed (deprecate wrapping_offset_from by RalfJung · Pull Request #73580 · rust-lang/rust · GitHub) for being too niche to bother existing.

5 Likes

To go with your code instead of the others linking to the topic.
Your s content was the only allocated object but the doc is referring to two different ones.

fn main() {
    let sx: &str = "123456789";
    let sy: &str = "123456789";
    let x: *const u8 = sx.as_ptr();
    let y: *const u8 = (sy.as_ptr() as usize + 4) as *const u8;
    let off: isize = (y as isize) - (x as isize);
    println!("off {:?}", off);
    let z = x.wrapping_offset(off);
    println!("x {:p}, y {:p}, z {:p}", x, y, z);
    println!("{}", unsafe { *z });
}

y and z still show same address. Dereferencing z would be undefined behaviour, it may "work" on the current compiler but some random code change (which you expect/assume is unrelated) or change in compiler will no longer give the same output; or will fail to compile if your lucky.
You can run the above on https://play.rust-lang.org/ with Miri the error is displayed.

2 Likes

The only one I can think off is the " 'launder' / replace provenance of a ptr" pattern:

struct Provenance<T>(*const T);

impl<T> *const T {
    fn provenance(self: *const T) -> Provenance<T> {
        Provenance(self)
    }

    fn with_provenance(self: *const T, provenance: Provenance<T>) -> *const T {
        let Provenance(inner) = provenance;
        // "inner + (self - inner) ≈ self"
        inner.wrapping_add(usize::wrapping_neg(self as _, inner as _))
    }
}

Btw, regarding the OP, I find the documentation on wrapping_… to be quite fantastic already:

So that gives you the only meaningful use of a wrapping_… operation that goes out of bounds: for such an offset to be reversed at some point before any actual dereference or other unsafe operation.


There also seems to be the case of something like, on a 32-bit architecture, mmapping a file that is more than 2-GiB (isize::MAX) big: in that case you can't directly index-offset into that mmap buffer of u8s using ptr::add(), when indexing beyond the 2GiB: while you'd technically be within the boundaries of the allocated object, the operation required for such pointer arithmetic would simply not be available by the optimized .add() implementations of the backend, such as LLVM (FWIW, u8_buffer.add(idx / 2).add(idx / 2).add(idx % 2) would be another way of circumventing that limitation, and one which would still take advantage of telling the compiler that we remain within the same allocation).

5 Likes

I suppose you could use it if you wanted a "one-indexed" array by subtracting one from the pointer.

Or if the iteration just stops past one past the end, that's ok with wrapping add too. I think that's the example in the doc that I added.

I.e iteration with end one past the last elem is ok with both methods, if you step by > 1 and end fuether out use wrapping offset.

You can read more in the RFC; LLVM is indeed the motivator.

Oh, that makes a great citation:

almost no code should actually be using wrapping offsets

https://rust-lang.github.io/rfcs/1966-unsafe-pointer-reform.html#overload-operators-for-more-ergonomic-offsets

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.