How to deal best with size_t and bindgen

The issue is that some platforms have pointers that are bigger than the address space. One of these platforms apparently is the CHERI architecture, which I have never heard of before. Here, the address space is 64 bits but the pointers carry extra information to make them non-forgeable. Therefore the pointers are 128 bits in size, but maximum object size is 264. As Rust guarantees that usize can store a pointer value, usize must be 128 bits wide. Yet size_t will be 64 bit.

To a language with memory safety, like (safe) Rust, concepts like these might be unimportant, but we can't know if such platforms will be relevant in future. Thus, I think, people are right that size_of::<usize>() isn't neccessarily size_of::<c_size_t>().

However, I feel like there are bigger problems with these platforms, because a lot of Rust code would bloat up usize (which is returned by Vec::len, for example) by 100%, because usize is guaranteed to store a pointer value (of 128 bit) but often only needs to store sizes (as the name usize suggests).

I found a comment from @kornel, which resembled a bit on how I feel on that issue:

Rust already made a mistake of assuming uintptr_t == size_t , but maybe it should try to back out from it instead of solidifying it further?

(in Issue #1400 on libc crate)

But then there is the argument of backward compatibility (which requires that usize is big enough to store pointers). I just hope this won't cause real big trouble to Rust in the future. As of right now, agreed, I think it's safe to assume that size_t is the same as usize, but I would like my code to work in future too.

The thread on IRLO which you referenced is maybe this one.

Disregarding any considerations on core language design, I would come to the following conclusions:

  • bindgen could translate size_t to u32/u64/u128, or usize on platforms where sizeof(size_t) == sizeof(ptrdiff_t). I feel like translating it to usize would cause less problems. When dealing with Rust and FFI, there's always the problem of writing code that compiles well on your own platform but fails to compile on other platforms. That's because the mapping from the C types to Rust's primitive integer types is done via type aliases. A mismatch won't always cause compile-time errors, as I also figured out in this thread.
  • The clean way seems to either fix bindgen to return the same type as used in std::ffi::c_size_t (which is yet unstable!) or to simply use size_t as emitted by bindgen, i.e. work with bindings::size_t where bindings is the module containing the created bindings. When translating the integer to the Rust world, usize is used, so we could convert the returned integer with .try_into().unwrap() (or assert!(size_of::<usize>() >= size_of::<bindings::size_t>()); value as usize).

This is my real-life code as of now:

fn cursor_get_current_value_count<K, V, C>(
    &mut self,
    cursor: &Cursor<K, V, C>,
) -> Result<usize, io::Error>
where
    K: ?Sized + Storable,
    V: ?Sized + Storable,
    C: Constraint,
{
    cursor.backend.assert_txn_backend(self);
    unsafe {
        // TODO: use c_size_t when stabilized
        let mut count = MaybeUninit::<lmdb::size_t>::uninit();
        check_err_code(lmdb::mdb_cursor_count(
            cursor.backend.inner,
            count.as_mut_ptr(),
        ))?;
        Ok(count.assume_init().try_into().unwrap())
    }
}

I hope the try_into().unwrap() will be zero-cost.