Working with C - allocating structs on the stack

A typical pattern for accessing a C API involves allocating a struct that holds API stateyness and then calling an init() function of some sort while passing a pointer to struct to it.There are variations on this theme but it's a pretty common one to encounter when dealing with a C library. It's (relatively) straightforward to generate bindings in Rust to call C functions, compared to other languages, you did good with that. This is a snippet from code that runs.

{
    // you can't just alloc a struct on the stack without initing.
    // every single field inside the struct must be initted, even if you know you're
    // going to call an init function to set the struct to a known good state.
    // make sure to set it to writable with mut too, once you decl something, all you
    // can do is read from it by default.
    // welcome to Rust
    let mut hash_state:SHA256_CTX = SHA256_CTX { h: [0; 8usize], 
                                                 Nl: 0, 
                                                 Nh: 0, 
                                                 data: [0; 16usize], 
                                                 num: 0, 
                                                 md_len: 0 };

    do_hash_init(&mut hash_state);
}

fn do_hash_init( h: &mut SHA256_CTX ) {
    unsafe {
       let rc = SHA256_Init(h);
       if rc != 1 {
          panic!("SHA256_Init() failed, reason = {}", rc);
       }
    }
}
  1. I really really really would rather not know what's exactly in that C-struct, and the API provides a call to initialize it to a known good state anyway, so zeroing it all out field by field by field is not only a pain but pointless as well. If the API changes and they add more fields to it, it's yet another place that changes have to be made which is a place where bugs can happen.

  2. If zeroing it out is required no matter what, is there a terser way or doing it? What if that struct had 5x as many fields inside it? You are really going to require that they all get initialized one by one?

  3. Yes, there's probably a crate for that. That's not the point, the point is legacy C code that you are obligated to keep using for reasons.

Use MaybeUninit<SHA256_CTX> until initialized.

4 Likes

Addressing only this part of the picture:

If zeroed instances are valid instances, then the Rust struct should be given an implementation of the Default trait. If zero isn't a sensible Default then write a zeroed() associated function that does return the zeroed instance.

In general in Rust, types should be given associated functions and trait implementations that serve the needs that come up when using those types.

Well,

  1. That worked. Thank you.
  2. That's not very pretty looking, at least to me. But it works and it does what I need it to.
  3. That's not very obvious either. Shouldn't something like that be part of the language itself, instead of hidden away in some corner of a library? What magic is going on behind the scenes to make the compiler stop growling? Do I want to know or is this a "the pony he comes" descent into madness?
    let mut foo: MaybeUninit<SHA256_CTX> = MaybeUninit::uninit();
    unsafe { do_hash_init(&mut *foo.as_mut_ptr()); }
    let mut hash_state = unsafe { foo.assume_init() };

First of all: make do_hash_init take a *mut SHA256_CTX. Don't create a &mut SHA256_CTX until everything is initialized (more practically, just wait until after assume_init). This is UB as it creates a &mut to uninitialized memory:

    unsafe { do_hash_init(&mut *foo.as_mut_ptr()); }

As for your questions, working with pointers in Rust could definitely use some improvement, but it's also not terribly common unless you're doing FFI or writing your own base collection or synchronization types or the like.

MaybeUninit is part of core, and practically no Rust programs don't use core. You'd have to implement a bunch of lang items on unstable. That is to say, core might as well be part of the language.

That said, MaybeUninit itself isn't magic. It's a union (untagged) of a zero-sized type and a ManuallyDrop<T>. You could write your own. You start out as the ZST variant, fill in the memory using pointer writes, and then call ManuallyDrop::into_inner as the other variant.

ManuallyDrop is a part of the language (a type with lang-level properties), but that doesn't seem to matter in this particular case (everything looks like plain ol' data).

unsafe "stops the growling" but also produces UB if held wrong. You're taking on the onus of upholding Rust's safety and validity invariants. The compiler will trust you got it right. I'd agree that getting unsafe Rust right is harder than ideal.

1 Like

A few bits of discussion which may help clarify why things are the way they are:

This is a place where C and Rust differ. In C, adding new fields to a struct is (by itself[1]) an API-compatible change. In Rust, adding new fields to a structure with all public fields is an API-breaking change.

If you want to maintain the right to add new members to the context structure, then the structure should be #[non_exhaustive] on the Rust side of the bridge. This removes the ability for code other than your bindings crate to initialize the structure with a struct literal.

If you want a zeroed instance, you can unsafely create one with fn std::mem::zerored. If you just want any default value, trait Default is derivable.

The standard library isn't intended to be exhaustive, so a safe "give a zeroed instance" is left to the 3rd party ecosystem. trait bytemuck::Zeroable is the most commonly used (and is derivable), as well as the crate providing a trait bytemuck::Pod to enable doing more "plain ol' data" byte mucking safely.

Performance-wise, the compiler/optimizer is typically good at seeing you're zero-initializing all of the state and turning it into a memset like you'd write in C instead of a bunch of piecewise initialization. This doesn't simplify the source, but is meaningful w.r.t. the "one by one" note.

If you don't want to initialize the value before calling the FFI init/constructor, then you need to use MaybeUninit. The reason is safety[2]; it's safe to use a variable but unsound to use uninitialized data, so it must be the case that variables are always initialized (for their type) and dealing with uninitialized data gated behind unsafe APIs.

The moveit crate is probably overkill for a C library, but it offers an interface to make in-place address-aware object construction safe.

The core library (typically used via std) is essentially part of the language. C maintains a reasonably firm split between the language and libc. More modern languages (including even C++[3]) aren't as strict, letting the language interact with the standard library.

For Rust, the most immediately obvious example is for loops existing and using the Iterator trait and Option enum. You can perhaps say those are less hidden due to being available by default without an import. But as a counter to that, 99%+ of Rust code has no need of MaybeUninit and is fine with initializing values in the safe manner. Doing FFI is the exception, and even then MaybeUninit is typically just an optimization over providing a default initialization.


  1. But in typical C fashion, extremely easy to lead to UB unless you're extremely careful entirely manually. Also, it's ABI-breaking for any functions passing the struct by-value, so essentially not backwards compatible in the C world of dynamic linking by default. In short, nonopaque C types are typically considered ABI-frozen and unallowed to change. If they are, it's typically made obvious by explicit struct versioning. ↩︎

  2. It's not just safety; the language assumes that values are initialized and it's UB to use them without initializing them. The exact extent of this is undecided, but it's certainly stricter than C w.r.t. uninitialized values, and even just moving a value requires it to be initialized for its type. ↩︎

  3. C++'s a fun example. Even if you presume that the entirety of the STL can be implemented with just plain C++ and libc, C++ declares it UB to define any names within namespace std, and as such it's impossible to actually implement the STL in standard C++. ↩︎

5 Likes

You could check out how openssl API is wrapped in a Rust API in the openssl crate.

Specifically, SHA API is defined here.

C binding is defined in openssl-sys crate (in same git repository) and is aliased as ffi in openssl crate code. For the C binding concerning sha, you can look here.

1 Like

What, in particular, is ugly about this? You create a piece of uninitialized memory, then you initialize it. That's exactly what is being done. The code 100% expresses the intent. It doesn't get any prettier than this working with a low-level, weakly-typed language such as C.

It's not hidden, it's well-documented. You asked, @quinedot answered, and now you know it. You could just as easily have found MaybeUninit by googling "rust create uninitialized variable" or something. It's not like it would have been easier to discover a language feature than a type in the standard library. MaybeUninit is common knowledge, but like anything, you will come to know it by using it.

If you are merely complaining that you have to make efforts in order to learn the language and the library, then sorry, that's not a valid complaint, that's just how things are.

And re: "in the language vs in the library":

  1. the answer is "no". If something can be in the library, then it explicitly shouldn't be a separate language feature. The point of the language-vs-library distinction is that the language provides a small, closed set of compositional primitives, on which arbitrary other code can be built. Otherwise, every unforeseen niche use case would need its own language feature, and there would be no limit.
  2. MaybeUninit technically does use a language feautre for implementation. It uses the fact that unions can contain one of a variety of types, and they don't automatically get dropped. Therefore, an uninitialized state is possible to represent by creating a union { (), T }, and it won't get dropped unless you explicitly read a T out of it.
1 Like

Hmm. Now that you suggested it.

	// well the pony he comes still, but daintily
	let mut hash_state: SHA256_CTX = unsafe { zeroed() };
	do_hash_init(&mut hash_state);

There we go. Readable understandable code, down to two lines. Somewhat inefficient (still writing redundant zeroes) but I think I'd rather have the readability. Sometimes you do want to go the other way on things and I'll keep that MaybeUninit in mind.

Line count is not a good measure of how understandable some code is. Now your code is wide open to unsoundness in a way that you won't even notice, ever: if the type changes in any subtle way so that the all 0s layout ceases to be a valid representation of it, then you've suddenly got silent undefined behavior. This is practically the worst way you can implement this.

By the way, the equivalent code with MaybeUninit looks like this:

let mut hash_state: MaybeUninit<SHA256_CTX> = MaybeUninit::uninit();
do_hash_init(hash_state.as_mut_ptr());
let hash_state = unsafe { hash_state.assume_init() };

That's 3 lines, if we are still competing in the International Line Counting Championship.

4 Likes

C structs don't have validity invariants, but reading uninit memory is still UB. This means that, in general, using MaybeUninit::zeroed() is safer than MaybeUninit::uninit() if you're doing FFI. You never know when the library author decides to skip writing some fields and grants you UB.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.