What is your recommendation on variable structure length handling on `windows` crate?

Hello!
I've spent last weekend writing USB-HID handler for Contour Multimedia Controller PRO v2 on Windows using Microsoft's official windows crate. It works (at least HID part of it), but is ugly AF. (I don't even mention x86_64-pc-windows-gnu vs x86_64-pc-windows-msvc headaches when binding with vlc crate...).
My question is: how do you handle variable-length structures in Rustic (a.k.a safe) way?

I mean, windows C source is like that:

typedef struct tagRAWINPUT {
  RAWINPUTHEADER header;
  union {
    RAWMOUSE    mouse;
    RAWKEYBOARD keyboard;
    RAWHID      hid;
  } data;
} RAWINPUT, *PRAWINPUT, *LPRAWINPUT;

// ...

typedef struct tagRAWHID {
  DWORD dwSizeHid;
  DWORD dwCount;
  BYTE  bRawData[1];
} RAWHID, *PRAWHID, *LPRAWHID;

Rust translation is straightforward:

pub struct RAWINPUT {
  pub header: RAWINPUTHEADER,
  pub data: RAWINPUT_0,
}

pub union RAWINPUT_0 {
  pub mouse: RAWMOUSE,
  pub keyboard: RAWKEYBOARD,
  pub hid: RAWHID,
}

// ...

pub struct RAWHID {
   pub dwSizeHid: u32,
   pub dwCount: u32,
   pub bRawData: [u8; 1],
}

but the actual in-memory structure has variable length, denoted by respective dwSize* (&friends) fields (a sizeof(RAWINPUTHEADER) + dwSizeHid*dwCount in this case).

I worked this around with an ugly

union RAWINPUT_wrapper{
  pub inner: RAWINPUT,
  pub raw: [u8;512],
}

and some hokey-pokey unsafe{} to access real bRawData contents from its raw[] shadow. I could probably give up union and use std::slice::from_raw_parts instead, not making it anything safer in the progress...

As windows crate is infested with [u8; 1] stubs - is there an official/Rustic way to handle this? Did I miss something in crate's documentation? Or is this "we gave you a raw access, writing a safe wrapper is on you"?

I don't know if that helps you, but you can create your own dynamically sized types (DST) by having a DST as the last field of a struct.

pub struct RAWHID {
   pub dwSizeHid: u32,
   pub dwCount: u32,
   pub bRawData: [u8],
}

this comes with the usual caviat of having to put them behind some kind of pointer.
Maybe something like this

pub union RAWINPUT_0 {
  pub mouse: RAWMOUSE,
  pub keyboard: RAWKEYBOARD,
  pub hid: Box<RAWHID>,
}

But I guess that would defeat the purpose of the union.

I have no control over the incoming data (these are incoming Win32 API window messages). Unfortunately your RAWINPUT_0 has different memory layout (hid being pointer vs embedded union variant).

I'm not a great help but I'll ask for my own curiosity.

I only understood now, that the 'rust translations' above are the actual representations of the data in the windows crate and not just your own translation.
Hmm, to me it looks like this might be UB. A [u8: 1] only has a size of 1. I don't know if we can assume data after it to be initialized when it got initialized by c. Can you maybe post which function returns the RAWINPUT or how it gets initialized?

I have my Win code at home. If you insist on your eyes bleeding, in the evening I'll push my whole app to GitHub and link repo here.

I guess it's best if you read the documentation on the windows api and then use the rust api accordingly.

E.g. from the microsoft docs:

Remarks

The handle to this structure is passed in the lParam parameter of WM_INPUT.

To get detailed information -- such as the header and the content of the raw input -- call GetRawInputData.

To read the RAWINPUT in the message loop as a buffered read, call GetRawInputBuffer.

To get device specific information, call GetRawInputDeviceInfo with the hDevice from RAWINPUTHEADER.

Raw input is available only when the application calls RegisterRawInputDevices with valid device specifications.

Although I was a bit surprised to see that this is win32 stuff. You can then look at for example GetRawInputBuffer in the windows docs and the rust crate docs

Rust:

Function windows::Win32::UI::Input::GetRawInputBuffer

[−]

pub unsafe fn GetRawInputBuffer(
    pdata: Option<*mut RAWINPUT>,
    pcbsize: *mut u32,
    cbsizeheader: u32
) -> u32

C++:

Syntax

C++Copy

UINT GetRawInputBuffer(
  [out, optional] PRAWINPUT pData,
  [in, out]       PUINT     pcbSize,
  [in]            UINT      cbSizeHeader
);

You can see here for example that you only pass a pointer to RAWINPUT. So I suspect that you never directly handle RAWINPUT but only somehow get a pointer to it and then handle the data with one of the functions listed above.
This also makes more sense in the C++ world. AFAIK you need to allocate memory on the heap if you want such dynamically sized structs.

Yeah. I'll see if GetRawInputBuffer can get raw HID data (as opposed to mouse/kbd shown in examples). This might be the way. Thank you.

These APIs are very common in Windows. The sane pattern is you call it with null, it tells you how much to allocate, you allocate that size, then call it again with that pointer.

As opposed to the pattern far more common in POSIX where it returns a newly allocated buffer, and you have to call the documented deallocator, it has some small efficiency benefits, but it's mostly just a pain in the ass.

The main troubles you'll have are:

  • It's a dynamic size, so you have to create your own owning handle
  • You need to ensure the correct alignment for the type: so no cheating with Vec<u8>! You'd be getting UB is you tried to get a &mut into it anyway.
  • the standard Rust allocator wants the allocated size when you're freeing it, so you need to keep that around too - or maybe try to recover it from the dwSizeHid, but that seems iffy.

Sigh... I know how to use the the API. I have the HID messages subscribed, delivered, dissected and processed properly.

I just asked if there's a cleaner way than a pile of union{}-s and unsafe{}-s.

I'm then quite confused: you are talking to an API that takes those types in it's FFI, so any safe access is going to involve creating a wrapper. Creating said wrapper involves creating an owning handle, as otherwise the safe code could access it after free. Creating the owning handle is thus pretty much the whole task?

Perhaps you could explain what sort of safe API you expect to be using, and what's not clear about creating it.

Ok, my repo is at GitHub - p-kraszewski/shuttle-pro-rs: Rust handler for Shuttle Pro V2 multimedia controller . You may ignore the whole lib/hid part (it was for another approach, without Windows message loop, with blocking reads). You need to switch to x86_64-pc-windows-gnu and have VLC SDK installed and set-up in environment variables to link the VLC part.

RAWINPUT wrapper:

union RawInputWrapper {
    ri: RAWINPUT,
    data: [u8; 1024],
}

Pulling data from it:

// Prepare memory to be filled
let mut data: RawInputWrapper = unsafe { mem::zeroed() };
let mut size = mem::size_of::<RawInputWrapper>() as u32;

// Fill the memory with data obtained from OS
let rc = unsafe {
                GetRawInputData(
                    HRAWINPUT(lparam.0),
                    RID_INPUT,
                    Some(&mut data as *mut RawInputWrapper as *mut ::core::ffi::c_void),
                    &mut size,
                    mem::size_of::<RAWINPUTHEADER>() as u32,
                )
            };
// ...

// Copy internal data to fresh, known structure
let hiddata = unsafe { *(data.ri.data.hid.bRawData.as_ptr() as *const ContourHidEvent) };

// EXTRA QUESTION: Can I rewrite it as 
// let hiddata = unsafe { data.ri.data.hid.bRawData
//                        .as_ptr() 
//                        as *const ContourHidEvent 
//                        as &ContourHidEvent 
// };
// Do I avoid copying? if "data" lives long enough, is Rust borrow checker fine with it? Or
// do I cause some UB?

// Show it
println!("HID: {:X?}/{}", hiddata, unsafe { data.ri.data.hid.dwCount });

This works perfectly fine as-is, my gut says it is just very un-Rustic, full of multi-stage casts (OTOH I actually love that Rust requires you to cast using small steps to make you at least think if conversion makes sense at all), unsafe keywords, and unholy unions.

Do you have any hints or guidelines on how to prepare safe wrappers consistent with the rest of the Rust ecosystem? Can you suggest some crates that dealt with this subject properly, as a reference?

I don't understand why you insist on trying to stack allocate this instead of allocating memory on the heap like @simonbuchan explained here

Well you are casting, there is going to be at least that much unsafe. I would instead recommend using ptr.cast() and ptr.as_ref() instead of as, these will at least document the safety guarantees you need to uphold (which should be fine here, from a cursory look)

Stack allocation is fine, modulo it having enough space, and being more expensive to copy around, of course. It's just not that much more work to allocate and free yourself, if you're going to be doing the safe wrapper.

Writing safe wrappers is a whole book, but the TLDR is just put all the unsafe code and the safe interface wrapping them in a separate module, so you can minimize the amount of code that you need to check for following the documented safety requirements. Commonly a safe wrapper type that only exists if some safety requirement is met is used to simplify the safe API, here this would be that you only return the RawInputWrapper if it was successfully initialized.

You can then provide accessors or even implement Deref to provide access to the inner fields your safe code uses.

But I'm not sure all that is worth the effort in this case?

This might be UB as currently written. Under Stacked Borrows the reference created to bRawData will only have permission to access the single array element declared in the type, which the raw pointer will inherit

I don't think that's a concern: access to all the remaining actual elements will have the same provenience, so they can't alias, and the FFI contract requires that they are all part of the same object and all are initialized. I'm not too familiar with the interaction of the formal SB model and the FFI safety requirements though...

Ok, I now see how you get problems in actually using the RAWINPUT.
Anyways, I tried writing down an example with allocating on the heap (untested). Probably has some errors.

let mut size = 0;
let res = unsafe {
                GetRawInputData(
                    HRAWINPUT(lparam.0),
                    RID_INPUT,
                    None
                    &mut size,
                    mem::size_of::<RAWINPUTHEADER>() as u32,
                )
            };
if res != 0 {
   // handle error
}

// allocate data
let data = Box::<[u8]>::new_uninit_slice(size as usize);
let size_written = unsafe {
                GetRawInputData(
                    HRAWINPUT(lparam.0),
                    RID_INPUT,
                    Some(data.as_mut_ptr().cast::<core::ffi::c_void>()),
                    &mut size,
                    mem::size_of::<RAWINPUTHEADER>() as u32,
                )
            };

if size_written == (UINT)-1 {  // not actual rust code. Didn't care how to figure out this value in ruyst
    // handle error case
}

if size_written != size {
    // probably also an error ?
}

let data = unsafe { data.assume_init() };
let raw_input = Box::into_raw(data).cast::<RAWINPUT>();

// Probably UB as mentioned by  @semicoleon
let hiddata = unsafe { *(raw_input.data.hid.bRawData.as_ptr().cast::<ContourHidEvent>()) };


// For proper deallocation
unsafe { Box::from_raw(raw_input  as *mut [u8]) };  // doesn't work. See edit below.

EDIT

I just realized deallocation with box doesn't work since raw_input does not have the same size as the original data.

It does seem to fail miri

error: Undefined Behavior: attempting a read access using <2151> at alloc1095[0x21], but that tag does not exist in the borrow stack for this location
   --> src\bin\main.rs:101:28
    |
101 |     let hiddata = unsafe { *(data.ri.data.hid.bRawData.as_ptr() as *const ContourHidEvent) };
    |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                            |
    |                            attempting a read access using <2151> at alloc1095[0x21], but that tag does not exist in the borrow stack for this location
    |                            this error occurs as part of an access at alloc1095[0x20..0x26]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2151> was created by a SharedReadOnly retag at offsets [0x20..0x21]
   --> src\bin\main.rs:101:30
    |
101 |     let hiddata = unsafe { *(data.ri.data.hid.bRawData.as_ptr() as *const ContourHidEvent) };
    |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `main` at src\bin\main.rs:101:28: 101:91

Getting a raw pointer to the whole union and then offsetting into the field works (though you should probably use memoffset or something instead of this monstrosity)

let hiddata = unsafe {
        *(ptr.cast::<u8>().offset(
            addr_of!(data.ri.data.hid.bRawData)
                .cast::<u8>()
                .offset_from(ptr.cast()),
        ) as *const ContourHidEvent)
    };

Interesting, I need to play more with MIRI. I wonder if it works going through a ref to data (say, for a self pointer) it can re-borrow from?

In this case it should as long as you move from references to raw pointers before you move inside the union, since data covers the entire memory allocation for the union. In cases where the target data is "behind" the struct it wouldn't work, as the reference only gets access to the struct even if the original raw pointer had access to those trailing bytes.

I should also note this is all assuming Stacked Borrows, some of this may not apply under Tree Borrows.

So if it was heap allocated the type would presumably have to use a trailing [u8] slice instead of [u8;1]? Interesting.