The issue is that some platforms have pointers that are bigger than the address space. One of these platforms apparently is the CHERI architecture, which I have never heard of before. Here, the address space is 64 bits but the pointers carry extra information to make them non-forgeable. Therefore the pointers are 128 bits in size, but maximum object size is 264. As Rust guarantees that usize
can store a pointer value, usize
must be 128 bits wide. Yet size_t
will be 64 bit.
To a language with memory safety, like (safe) Rust, concepts like these might be unimportant, but we can't know if such platforms will be relevant in future. Thus, I think, people are right that size_of::<usize>()
isn't neccessarily size_of::<c_size_t>()
.
However, I feel like there are bigger problems with these platforms, because a lot of Rust code would bloat up usize
(which is returned by Vec::len
, for example) by 100%, because usize
is guaranteed to store a pointer value (of 128 bit) but often only needs to store sizes (as the name usize
suggests).
I found a comment from @kornel, which resembled a bit on how I feel on that issue:
Rust already made a mistake of assuming
uintptr_t
==size_t
, but maybe it should try to back out from it instead of solidifying it further?
(in Issue #1400 on libc
crate)
But then there is the argument of backward compatibility (which requires that usize
is big enough to store pointers). I just hope this won't cause real big trouble to Rust in the future. As of right now, agreed, I think it's safe to assume that size_t
is the same as usize
, but I would like my code to work in future too.
The thread on IRLO which you referenced is maybe this one.
Disregarding any considerations on core language design, I would come to the following conclusions:
bindgen
could translatesize_t
tou32
/u64
/u128
, orusize
on platforms wheresizeof(size_t) == sizeof(ptrdiff_t)
. I feel like translating it tousize
would cause less problems. When dealing with Rust and FFI, there's always the problem of writing code that compiles well on your own platform but fails to compile on other platforms. That's because the mapping from the C types to Rust's primitive integer types is done via type aliases. A mismatch won't always cause compile-time errors, as I also figured out in this thread.- The clean way seems to either fix
bindgen
to return the same type as used instd::ffi::c_size_t
(which is yet unstable!) or to simply usesize_t
as emitted bybindgen
, i.e. work withbindings::size_t
wherebindings
is the module containing the created bindings. When translating the integer to the Rust world,usize
is used, so we could convert the returned integer with.try_into().unwrap()
(orassert!(size_of::<usize>() >= size_of::<bindings::size_t>()); value as usize
).
This is my real-life code as of now:
fn cursor_get_current_value_count<K, V, C>(
&mut self,
cursor: &Cursor<K, V, C>,
) -> Result<usize, io::Error>
where
K: ?Sized + Storable,
V: ?Sized + Storable,
C: Constraint,
{
cursor.backend.assert_txn_backend(self);
unsafe {
// TODO: use c_size_t when stabilized
let mut count = MaybeUninit::<lmdb::size_t>::uninit();
check_err_code(lmdb::mdb_cursor_count(
cursor.backend.inner,
count.as_mut_ptr(),
))?;
Ok(count.assume_init().try_into().unwrap())
}
}
I hope the try_into().unwrap()
will be zero-cost.