I'm dealing with a C FFI API where a callback can be set up, which gets a 'user data' void pointer passed. The pointer can be set up arbitrary values when setting up the callback. The C code doesn't touch the pointer, it just passes it back into the callback.
If I want to make use of that 'user data', what obviously should work is to box my data and set
ptr = Box::into_raw(my_user_data)
Inside the callback I get my data back with
let my_user_data = Box::from_raw(ptr).
But I am wondering if the boxing (heap allocation) can be avoided if the size of my data is not larger as the size of a pointer by transmuting them. As I understand for a *c_void any bit pattern is valid, so it sounds possible to do that without creating undefined behavior. E.g. like this using a union:
union DataTrans<T> {
data: ManuallyDrop<T>,
ptr: *mut c_void,
}
fn data_to_ptr<T>(data: T) -> *mut c_void {
if std::mem::size_of::<DataTrans<T>>() > std::mem::size_of::<*mut c_void>() {
panic!("data type too large");
}
let mut u = DataTrans{ ptr: std::ptr::null_mut() };
u.data = ManuallyDrop::new(data);
unsafe{ u.ptr }
}
unsafe fn ptr_to_data<T>(ptr: *mut c_void) -> T {
let u = DataTrans::<T>{ptr};
unsafe{ ManuallyDrop::into_inner(u.data) }
}
fn main() {
let value: u16 = 12345;
let ptr = data_to_ptr(value);
let value: u16 = unsafe{ ptr_to_data(ptr) };
println!("value={}", value);
}
Would that work (in the general case for arbitrary T)?
But what about uninitialized memory? E.g. structs with padding… I don’t know the rules for C off the top of my head, but *mut c_void on the Rust side certainly doesn’t support this, as far as I remember.
E.g.
fn main() {
let value: (u16, u8) = (12345, 67);
let ptr = data_to_ptr(value);
let value: (u16, u8) = unsafe{ ptr_to_data(ptr) };
println!("value={:?}", value);
}
error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
--> src/main.rs:15:13
|
15 | unsafe{ u.ptr }
| ^^^^^ using uninitialized data, but this operation requires initialized memory
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `data_to_ptr::<(u16, u8)>` at src/main.rs:15:13: 15:18
note: inside `main`
--> src/main.rs:25:15
|
25 | let ptr = data_to_ptr(value);
| ^^^^^^^^^^^^^^^^^^
You can probably try to restrict it to types that don’t have this issue, e.g. something like using bytemuck::Pod.
Yes, I noticed. That's why I initialized the pointer before writing to data field in the union in my example. After that Miri is quiet for me (not that that is sufficient proof for correctness, of course).
Yes, I forgot to say. In my case the callback will only be called once, but might be a different thread. So <T: Send> would be needed, also.
In case you didn’t notice, my code example uses your API here – unchanged[1]. The issue is that the whole (u16, u8) tuples contains interior padding bytes, there’s nothing you can do externally in order to prevent those (generically, in a way that supports all types).
Ouch. Thanks. That's the kind of issue I asked the question in the first place
That's unfortunate.
But thinking about it, as the matter here is not actually the requiring the padding to have a known value, but only to prevent the compiler from recognizing the situation as UB, isn't there a way to tell the compiler it is not UB? Assembly maybe?
Creating any value with uninitialized bytes on non-padding positions is always UB. The only escape hatch is something like MaybeUninit, where all bytes are essentially padding after the unit struct.
That's an interesting question: what happens if you declare zero-bytes assembler sequence as accepting MaybeUninit in register then returning “frozen” value in the exact same register, presumably (as far as Rust is concerned) now initialized… would that act as equivalent of "freeze" to LLVM?
Miri doesn't support that thus we couldn't use it to check…
I did a bit more reading and found a comment here, about a surprising hardware feature, e.g. of Itanium to actually check by itself if data is initialized. That would mean that the compiler really has to init the data, not just assume it is...
It's undefined in C/C++: if you wrote f32 in memory and try to access it as i32… you have UB. That restriction, too, exist in hardware, and you don't even need any exotic and expensive hardware platform to face it: good old IBM 5150 can trip you with 8087.
Because on the 8086+8087 combo 8087 writes result to memory independently and asyncronously from 8086! If you don't explicitlyfwait for that process to finish… you can read garbage from memory.
Yet type punning is defined in Rust – which means Rust can not be made to work quickly and reliably on IBM 5150 with a co-processor.
Yes. I wouldn't be particularly sad, if we could get the freeze feature at the cost of dropping Itanium support. Apparently not everyone agrees, though, which seems to be part of the reluctance to implement it.
For punning the situation was easier I assume, because no one had any serious interest in Rust on 8086.
You can turn this into a compile-time check if your wrap it in a const { } block, like this:
const {
if std::mem::size_of::<DataTrans<T>>() > std::mem::size_of::<*mut c_void>() {
panic!("data type too large");
}
}
I may be missing something, but why do you want to store your data in a pointer, instead of directly working with the DataTrans union? You could erase the generic and define the union like this:
I think I read that fields of a union are not assumed to be fully initialized, so handling them via pointers should be safe, but I'm too lazy to check. You can be conservative and use array of MaybeUninit<u8>. You'll need some out-of-band information to distinguish pointer-sized data from a true pointer, but you also need it in your approach.
That's easiest, I agree. Unfortunately I want to use that in a place where the memory allocation which the Box does has a significant performance impact. Thus I try to find a way to avoid that.