How to coerce a rust uninitialized type to *mut *mut T

Hi folks,

I'm attempting to create a binding to Linux's libtraceevent library. Bindgen has generated the bindings correctly and I'm attempting to write some tests in order to guide me to the design of a safe API on top of the bindgen generated code.

A particular function expects a *mut *mut tep_handle argument as shown below:

extern "C" {
    pub fn tep_parse_format(
        tep: *mut tep_handle,
        eventp: *mut *mut tep_event,
        buf: *const ::std::os::raw::c_char,
        size: ::std::os::raw::c_ulong,
        sys: *const ::std::os::raw::c_char,
    ) -> tep_errno;
}

tep_event will be allocated and initialized as part of tep_parse_event(). My current attempt is the following:

        unsafe {
            let mut handle: ::std::mem::MaybeUninit<tep_handle> = ::std::mem::MaybeUninit::uninit();
            let mut event: ::std::mem::MaybeUninit<tep_event> =
                ::std::mem::MaybeUninit::uninit().assume_init();
            let event_system = CString::new("irq").unwrap();

            let errno = tep_parse_format(
                handle.as_mut_ptr(),
                (&mut event.as_mut_ptr()) as *mut *mut tep_event,
                dynamic_string,
                dynamic_string_len as u64,
                event_system.as_ptr(),
            );

            assert_eq!(errno, tep_errno::TEP_ERRNO__SUCCESS);
        }

But this is failing with a segfault:

'tests::test_parse_dynamic_string_event'` (signal: 11, SIGSEGV: invalid memory reference)

How can I implement this correctly?

You are passing the address of a temporary. Declare the pointer as a variable instead, and pass its address directly.

See, you shouldn't create a value of type tep_event. The double-pointer pattern in C is usually meant for functions that allocate a heap-allocated object and "return" a pointer to it by writing through the outer pointer (since the real return value of the function is used for something else, likely an error code).

So what you should probably do is:

let mut event_ptr: *mut tep_event = ptr::null_mut();
tep_parse_format(…, &mut event_ptr, …);

On an unrelated note, MaybeUninit::uninit().assume_init() is instant UB.

4 Likes

Sadly, that's not enough – OP would still be passing the address of a temporary here: &mut event.as_mut_ptr()

This is what really worked. Declaring those as null_mut() made everything happy. Thanks :slight_smile:

Here's the resulting code for reference:

        unsafe {
            let handle: *mut tep_handle = ::std::ptr::null_mut();
            let mut event: *mut tep_event = ::std::ptr::null_mut();
            let event_system = CString::new("irq").unwrap();

            let errno = tep_parse_format(
                handle,
                &mut event,
                dynamic_string,
                dynamic_string_len as u64,
                event_system.as_ptr(),
            );

            assert_eq!(errno, tep_errno::TEP_ERRNO__SUCCESS);

            tep_free(handle);
        }

Ahh yeah you're right!

It's not the null, you could have initialized the pointer with literally anything, because it will be overwritten. It's the fact that you passed the address of an existing variable (and not a temporary) that makes it work.

By the way, are you sure that the handle can be null? It'd be necessary to read the documentation for this function. If it's an input parameter, then it's likely expected to point to a valid object.

Right, I got that much. It's just that null_mut() is easy enough to remember for future tests :slight_smile:

Yup, pretty sure. Here's the original test from the C library:

static void parse_cpumask(const char *format, void *data, int size,
			  const char* expected_raw, const char* expected)
{
	struct tep_format_field *field;
	struct tep_event *event;
	struct tep_record record;

	record.data = data;
	record.size = size;

	CU_TEST(tep_parse_format(test_tep, &event,
				 format, strlen(format),
				 CPUMASK_EVENT_SYSTEM) == TEP_ERRNO__SUCCESS);
[...]

As you can see, event is used completely uninitialized. Also, the documentation has this little nugget of information:

The tep_parse_format() function does the same as tep_parse_event(). The only difference is in the extra eventp argument, where the newly created event structure is returned.

With a matching example:

char *buf;
int size;
struct tep_event *event = NULL;
buf = read_file("/sys/kernel/tracing/events/ftrace/print/format", &size);
if (tep_parse_event(tep, buf, size, "ftrace") != 0) {
	/* Failed to parse the ftrace print format */
}

if (tep_parse_format(tep, &event, buf, size, "ftrace") != 0) {
	/* Failed to parse the ftrace print format */
}

By the way the correct direct translation of this would have been to have a let mut tep_event = MaybeUninit::<*mut teo_event>::uninit();, then pass tep_event.as_mut_ptr() to the function call. The function initializes the value in the MaybeUninit, so afterwards, you're allowed to call tep_event.assume_init() to obtain the pointer of type *mut tep_event that was (logically) returned.

Of course, initializing the pointer with something like null is syntactically a bit lighter, but that makes sense since it avoids the need to manually work with uninitialized data. Uninitialized variables in Rust can never be accessed (e. g. read, or referenced, i. e. borrowed), even in unsafe code, so there is some overhead in how the MaybeUninit might become necessary to use in unsafe cases. But of course, the situation in C (and even more so in C++) is actually worse, because it's way too easy to skip initialization there.

1 Like

That's only almost universally true. For MaybeUninit<T> it's actually false if T is a zero-sized type. Similarly, I believe it should also be false if the type is allowed to be completely uninitialized, or does (or can) consist entirely of padding. And this is the case here because, AFAICT (and presumably accidentally), MaybeUninit<MaybeUninit<tep_event>> is used, the inner type MaybeUninit<teo_event> is allowed to be completely uninitialized.

1 Like

Yes, but I'm not talking about that – it's clearly an out argument. I'm talking about the handle, which is the first argument. I don't see it declared in the quoted snippets, so I don't know how it's initialized. I was just trying to raise awareness about this being another potential pitfall, anyway.

Heh, sorry. Misread.

The tep_handle is global static:

static struct tep_handle *test_tep;

Which will be zero initialized during .bss initialization.

1 Like

Interesting. That's very nice. After tep_parse_format() initializes event, any following function call that requires a *mut tep_event can simply call event.assume_init().

I'm aware, but in this specific case, it's true.

Is it? In my reply I was arguing that it isn't UB here, because assume_init is called on a MaybeUninit<MaybeUninit<_>>.

Typically, one would probably put it into a new variable, so assume_init only needs to be called once, which makes the unsafe code usage easier to review. The new variable even could use the same name and shadow the previous one.

Also, in case tep_parse_format needed to be called in multiple places, a nice approach would be a wrapper function around the tep_parse_format function that could handle this stuff, make sure that only when there is no error that could have precluded the initialization, assume_init is called, and maybe returns a more rusty Result return type containing the *mut tep_event.

The MaybeUninit docs use this same pattern to get an array of MaybeUninits, with this justification:

The assume_init is safe because the type we are claiming to have initialized here is a bunch of MaybeUninits, which do not require initialization.

As this appears in the official docs, it seems to be defining the behavior for this case, thus providing an exception to the insta-UB rule.

I was referring to this line in OPs original code:

let mut event: ::std::mem::MaybeUninit<tep_event> =
                ::std::mem::MaybeUninit::uninit().assume_init();

It looks like tep_event is a C FFI struct proper, and not an alias for MaybeUninit.

Ah, nevermind, it's creating a MaybeUninit, so you are right.

Of course, the guideline β€œdon’t ever write MaybeUninit::uninit().assume_init()” still applies to this case, too, since creating the MaybeUninit<tep_event> directly would be much preferable.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.