How to coerce a rust uninitialized type to mut mut T

felipebalbi · December 15, 2022, 7:20am

Hi folks,

I'm attempting to create a binding to Linux's libtraceevent library. Bindgen has generated the bindings correctly and I'm attempting to write some tests in order to guide me to the design of a safe API on top of the bindgen generated code.

A particular function expects a *mut *mut tep_handle argument as shown below:

extern "C" {
    pub fn tep_parse_format(
        tep: *mut tep_handle,
        eventp: *mut *mut tep_event,
        buf: *const ::std::os::raw::c_char,
        size: ::std::os::raw::c_ulong,
        sys: *const ::std::os::raw::c_char,
    ) -> tep_errno;
}

tep_event will be allocated and initialized as part of tep_parse_event(). My current attempt is the following:

        unsafe {
            let mut handle: ::std::mem::MaybeUninit<tep_handle> = ::std::mem::MaybeUninit::uninit();
            let mut event: ::std::mem::MaybeUninit<tep_event> =
                ::std::mem::MaybeUninit::uninit().assume_init();
            let event_system = CString::new("irq").unwrap();

            let errno = tep_parse_format(
                handle.as_mut_ptr(),
                (&mut event.as_mut_ptr()) as *mut *mut tep_event,
                dynamic_string,
                dynamic_string_len as u64,
                event_system.as_ptr(),
            );

            assert_eq!(errno, tep_errno::TEP_ERRNO__SUCCESS);
        }

But this is failing with a segfault:

'tests::test_parse_dynamic_string_event'` (signal: 11, SIGSEGV: invalid memory reference)

How can I implement this correctly?

H2CO3 · December 15, 2022, 7:29am

You are passing the address of a temporary. Declare the pointer as a variable instead, and pass its address directly.

See, you shouldn't create a value of type tep_event. The double-pointer pattern in C is usually meant for functions that allocate a heap-allocated object and "return" a pointer to it by writing through the outer pointer (since the real return value of the function is used for something else, likely an error code).

So what you should probably do is:

let mut event_ptr: *mut tep_event = ptr::null_mut();
tep_parse_format(…, &mut event_ptr, …);

On an unrelated note, MaybeUninit::uninit().assume_init() is instant UB.

H2CO3 · December 15, 2022, 7:34am

Sadly, that's not enough – OP would still be passing the address of a temporary here: &mut event.as_mut_ptr()

felipebalbi · December 15, 2022, 7:35am

This is what really worked. Declaring those as null_mut() made everything happy. Thanks

Here's the resulting code for reference:

        unsafe {
            let handle: *mut tep_handle = ::std::ptr::null_mut();
            let mut event: *mut tep_event = ::std::ptr::null_mut();
            let event_system = CString::new("irq").unwrap();

            let errno = tep_parse_format(
                handle,
                &mut event,
                dynamic_string,
                dynamic_string_len as u64,
                event_system.as_ptr(),
            );

            assert_eq!(errno, tep_errno::TEP_ERRNO__SUCCESS);

            tep_free(handle);
        }

semicoleon · December 15, 2022, 7:38am

Ahh yeah you're right!

H2CO3 · December 15, 2022, 7:38am

It's not the null, you could have initialized the pointer with literally anything, because it will be overwritten. It's the fact that you passed the address of an existing variable (and not a temporary) that makes it work.

By the way, are you sure that the handle can be null? It'd be necessary to read the documentation for this function. If it's an input parameter, then it's likely expected to point to a valid object.

felipebalbi · December 15, 2022, 7:42am

Right, I got that much. It's just that null_mut() is easy enough to remember for future tests

Yup, pretty sure. Here's the original test from the C library:

static void parse_cpumask(const char *format, void *data, int size,
			  const char* expected_raw, const char* expected)
{
	struct tep_format_field *field;
	struct tep_event *event;
	struct tep_record record;

	record.data = data;
	record.size = size;

	CU_TEST(tep_parse_format(test_tep, &event,
				 format, strlen(format),
				 CPUMASK_EVENT_SYSTEM) == TEP_ERRNO__SUCCESS);
[...]

As you can see, event is used completely uninitialized. Also, the documentation has this little nugget of information:

The tep_parse_format() function does the same as tep_parse_event(). The only difference is in the extra eventp argument, where the newly created event structure is returned.

With a matching example:

char *buf;
int size;
struct tep_event *event = NULL;
buf = read_file("/sys/kernel/tracing/events/ftrace/print/format", &size);
if (tep_parse_event(tep, buf, size, "ftrace") != 0) {
	/* Failed to parse the ftrace print format */
}

if (tep_parse_format(tep, &event, buf, size, "ftrace") != 0) {
	/* Failed to parse the ftrace print format */
}

steffahn · December 15, 2022, 8:12am

By the way the correct direct translation of this would have been to have a let mut tep_event = MaybeUninit::<*mut teo_event>::uninit();, then pass tep_event.as_mut_ptr() to the function call. The function initializes the value in the MaybeUninit, so afterwards, you're allowed to call tep_event.assume_init() to obtain the pointer of type *mut tep_event that was (logically) returned.

Of course, initializing the pointer with something like null is syntactically a bit lighter, but that makes sense since it avoids the need to manually work with uninitialized data. Uninitialized variables in Rust can never be accessed (e. g. read, or referenced, i. e. borrowed), even in unsafe code, so there is some overhead in how the MaybeUninit might become necessary to use in unsafe cases. But of course, the situation in C (and even more so in C++) is actually worse, because it's way too easy to skip initialization there.

steffahn · December 15, 2022, 8:16am

That's only almost universally true. For MaybeUninit<T> it's actually false if T is a zero-sized type. Similarly, I believe it should also be false if the type is allowed to be completely uninitialized, or does (or can) consist entirely of padding. And this is the case here because, AFAICT (and presumably accidentally), MaybeUninit<MaybeUninit<tep_event>> is used, the inner type MaybeUninit<teo_event> is allowed to be completely uninitialized.

H2CO3 · December 15, 2022, 8:16am

Yes, but I'm not talking about that – it's clearly an out argument. I'm talking about the handle, which is the first argument. I don't see it declared in the quoted snippets, so I don't know how it's initialized. I was just trying to raise awareness about this being another potential pitfall, anyway.

felipebalbi · December 15, 2022, 8:17am

Heh, sorry. Misread.

The tep_handle is global static:

static struct tep_handle *test_tep;

Which will be zero initialized during .bss initialization.

felipebalbi · December 15, 2022, 8:21am

Interesting. That's very nice. After tep_parse_format() initializes event, any following function call that requires a *mut tep_event can simply call event.assume_init().

H2CO3 · December 15, 2022, 9:10am

I'm aware, but in this specific case, it's true.

steffahn · December 15, 2022, 11:09am

Is it? In my reply I was arguing that it isn't UB here, because assume_init is called on a MaybeUninit<MaybeUninit<_>>.

steffahn · December 15, 2022, 11:16am

Typically, one would probably put it into a new variable, so assume_init only needs to be called once, which makes the unsafe code usage easier to review. The new variable even could use the same name and shadow the previous one.

Also, in case tep_parse_format needed to be called in multiple places, a nice approach would be a wrapper function around the tep_parse_format function that could handle this stuff, make sure that only when there is no error that could have precluded the initialization, assume_init is called, and maybe returns a more rusty Result return type containing the *mut tep_event.

2e71828 · December 15, 2022, 11:28am

The MaybeUninit docs use this same pattern to get an array of MaybeUninits, with this justification:

The assume_init is safe because the type we are claiming to have initialized here is a bunch of MaybeUninits, which do not require initialization.

As this appears in the official docs, it seems to be defining the behavior for this case, thus providing an exception to the insta-UB rule.

H2CO3 · December 15, 2022, 12:05pm

I was referring to this line in OPs original code:

let mut event: ::std::mem::MaybeUninit<tep_event> =
                ::std::mem::MaybeUninit::uninit().assume_init();

~~It looks like tep_event is a C FFI struct proper, and not an alias for MaybeUninit.~~

Ah, nevermind, it's creating a MaybeUninit, so you are right.

steffahn · December 15, 2022, 12:47pm

Of course, the guideline “don’t ever write MaybeUninit::uninit().assume_init()” still applies to this case, too, since creating the MaybeUninit<tep_event> directly would be much preferable.

system · March 15, 2023, 12:47pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Rust FFI, mut mut and C help	11	2333	February 10, 2022
How to pass `mut mut ::std::os::raw::c_char` to C function?	12	1883	December 22, 2021
How does rust pass or bind to a mutable reference help	10	1562	May 18, 2022
Scope for structs and raw pointers for FFI/C library help	4	475	May 10, 2021
`&mut T` to `&T` coercion help	6	666	August 29, 2019

How to coerce a rust uninitialized type to *mut *mut T

Related Topics

How to coerce a rust uninitialized type to mut mut T