Raw pointer evolution, why not use references?

Several times I work with C API like this:

struct ApiWrapper {
   inner: X,
}
unsafe { 
    let p = c_api_create();
    if p.is_null() {
         //handle errors    
         return;  
    }
    api_wrapper.inner = p;
}

In other words C API give non null pointer, and this pointer can be used for call other API.
The memory allocated by malloc internally in C library, so it is properly aligned.
At first I used just as is *mut CType. Then I find out that there is std::ptr::NonNull.
And I use this type instead. But now I think why not use just mutable reference: &mut CType.
Some methods of NonNull is unsafe while &mut CType is safe, and I know that
this is non null, properly aligned pointer to allocated memory chunk.
So any reason to use NonNull instead of mutable reference ?

For example, because C APIs usually don't respect Rust's aliasing rules. If you naïvely wrap such a library using references, then you can end up with aliasing mutable references, which is UB.

2 Likes

Any example how this can happens?

All ways that can I imagine (accept const pointer to CType and modify content of CType;
during c_create_api saving pointer to CType in global memory and modifying it in parallell thread with Rust code) is also UB in terms of C language.

This at least is not an UB in C. Modifying const variable is UB but you can take a const pointer from non-const variable.

Also in C world it's common to share non-const pointer everywhere and document that they can't be modified without proper synchronization. Many people think it's not worth to put const keyword everywhere as it guarantees nothing.

7 Likes

Interesting data point – CXX (bridge between rust and C++) is translating non-const C++ methods not as taking &mut self, but Pin<&mut self> – to avoid accidental swapping (as in C/C++ you're not allowed to move objects such freely as in Rust). See extern "C++" — Rust ♡ C++.

Back to returning &mut reference as a handle – what lifetime would you give to it? Naive solution would give you &'static mut, which is certainly not good.

Perhaps then Pin<Box<X, CustomAlloc>> can work? (where CustomAlloc would be an allocator which would call correct c_api_free on dealloc). Not sure if it's idiomatic/recommended solution in Rust, but in C++ unique_ptr<X, CustomDealloc> is actually very often used wrap C apis into nice handles.

3 Likes

As I rember phrase in standard for this case is "may be UB" for cases like read-only memory,
but let's imagine that C code doesn't contain UB and "may be UB".
Because of this is not Rust specific, and I got problems even if I use C instead of Rust to call C library.
Let's say I use "ideal C library", what advtanges bring NonNull<CType> vs &mut CType?

You definitely can't have fn c_api_create() -> &mut Type, even in Rust. It's semantically incorrect for functions creating new objects, and wouldn't compile.

Rust references are not an equivalent of C pointers. They have a very different role. They are temporarily borrowing data owned somewhere else. They can't exist on their own. You can't create a new object without an owner, and you can't have a reference that doesn't borrow from any specific location in user code (a reference that "borrows" from the heap is a Box, not &).

4 Likes

It's not very useful, but it is possible to write such a function in safe Rust:

fn c_api_create<'a>()->&'a mut String {
    Box::leak(Box::new(String::from("hello!")))
}
3 Likes

And a special case:

pub fn foo<'a, T>() -> &'a mut [T] {
    &mut []
}
2 Likes

When I said "can't", I've meant it in the same sense as "you can't eat a soup with a fork" — yes actually, you can eat a soup with a fork if you freeze the soup into ice cubes, or spin the fork really fast…

8 Likes

But what is difference between only Rust custom allocator and this case?

In allocator you mark address space as "used" and return "raw" pointer,
and then in some "upper" level you got reference.
So in standard library this looks like:

extern "C" fn malloc(...) -> *mut c_void
let b = Box::new();//<- malloc inside
let r: &mut = &mut *b;

In case of C library:

extern "C" fn c_api_create() -> *mut CType;
let p = c_api_create();
let r: &mut = &mut *p;

so what ther difference?

Or another example, let's say C API return pointer to not opaque type:

#[repr(C)]
struct Foo { field: c_int }
extern "C" fn f() -> *mut Foo;

How in Rust you can read field without creating temporary reference?
So all Rust code that works with #[repr(C)] is incorrect?

No, you're mixing up temporary loans (what Rust calls "references") with "passing by reference" from other languages. In Rust, the type for returning new heap-allocated mutable objects by reference is Box, not &mut.

&mut doesn't give you an object, it gives you a temporary permission to access someone else's object. This distinction is invisible in C where * is used for both, but it matters in Rust.

You could have extern "C" fn f() -> Box<Foo> if you ensured it used Rust's allocator. Box<Foo> is FFI-safe. Option<Box<Foo>> is also FFI-safe, and an represented as a nullable pointer (assuming Foo is "sized", not a fat pointer).

1 Like

So code below is UB?

#[repr(C)]
struct Foo {
    field: i32,
}

fn main() {
    unsafe {
        let p = libc::malloc(std::mem::size_of::<Foo>()) as *mut Foo;
        let foo = &mut *p;
        foo.field = 10;
    }
}

No, because your &mut isn't leaving a function scope. This is a very different situation from let p: &mut Foo = c_api_create();. Merely returning &mut to heap data wouldn't be UB, but it would be UB if you tried to free(p), because &mut must always point to valid allocated data, so it's impossible to free it (in Rust you can drop data only after all references to it are gone).

So this is UB?

#[repr(C)] struct Foo { field: i32 }
struct Boo<'a> {
   foo: &'a mut Foo
}
impl<'a> Boo<'a> {
   unsafe fn new() -> Result<Self, ...> {
       let p = libc::malloc(std::mem::size_of::<Foo>()) as *mut Foo;
       if p.is_null() { return Err(AllocationFailure); }
       Self { foo: &mut *p }
   }
}

impl<'a> Drop for Boo<'a> {
   fn drop(&mut self) { unsafe { free(self.foo); } }
}

Yes.

& in Rust literally means "you can't free it". Borrowed types are ones that you don't memory-manage. Owned types are ones that you can destroy, but & means you don't own it.

You could use NonNull to have exclusive ownership, and free it. You could use a bare pointer, and free it. You could hold a pointer in Boo, and lend it out as a reference to others (get_foo(&self) -> &Foo), but you can't call free on an object you don't own.

I don't think you have understood what the post you replied to was saying. In C, whether mutating through a pointer is UB or not depends only on the memory being pointed to (whether it is read-only or not). But in Rust, whether mutating through a reference is allowed or not depends on the permissions granted to that reference through a chain leading back to the original data (this is what is called "pointer provenance").

In C, it is not UB to take a const CType*, cast it to CType*, and mutate through it, as long as the CType itself is mutable. C compilers don't and cannot do any optimizations by assuming const T* can't be mutated through.

You can get C-pointer-like semantics in Rust by using raw pointers instead of references. FFI usually gives you raw pointers, anyway.

2 Likes

I don't think so,
From correct program point of view it is not important.
For example:

int read_prop1(const Obj * const obj);
int read_prop2(const Obj * const obj);
int x1 = read_prop1(obj);
int x2 = read_prop2(obj);

Yes, from C language point of view this is may be not UB, if read_prop2 change state of obj,
and in particular prop1. But from correct program point of view this would be completly unexpected
for programmer and will cause bugs even in C program that uses such API.
So I think that it is not important to consider "may be UB" case here.

But in my code Boo own object.

No, you don't. & is a loan, not a pointer. & means you don't own it. It's not just a side effect, but a primary purpose of this feature. Using it for owned pointers is a semantic error.

Just like in C void* is not a number, even though it compiles down to one and even supports arightmetic. Even though printf("you have %d apples", NULL + 7) may seem to work, it's bogus, and a misuse of the type system.

In Rust & is an absolute opposite of an owning pointer. It exists to signify impossibility of freeing the data.