Zero initialize a large complex struct as global without code overhead

Hello, I'm used to develop microcontrollers with C for 16 years, but just began learning Rust about 1 months.
In C, sometimes I have large complex global control blocks, could have dozens of fields and/or nested structs/unions, that assumed to be initialized as 0, this is done by startup code with just zeroing .bss.

An example is the "micropython" project, it has the central control block "mp_state" that is composed by many control fields, and even allows its user to add new fields through a macro that expands to extra fields. The mp_state is a global variable and is used all over the micropython code.

However it seems in Rust I'm not allowed to uninitialize a struct nor partially initialize it, I made some strugglings and got below working ugly code:

#![feature(untagged_unions)]
use core::mem;
#[derive(Debug)] // For print
#[repr(C)] // (C,packed) to squeeze
struct Global {
    state:i16,
    t:[u8;3],
    cnt:u16,
    du8: u8,
}
union UGlobal {
    _align:u64,
    dat:Global,
}
static mut g:UGlobal = unsafe { 
    UGlobal{_align:0}};

As you can see, I "cheat" the compiler with an union and just initialize the primitive type field.

Another draw back I don't like that because I have to write "g.dat.xxx" to access the fields, and in C I can use g.xxx directly.

Do appreciate your suggestions to make this better!

Please format your code properly.

2 Likes

You could use MaybeUninit to initialize with zeros, or start uninitialized and then call assume_init when initialization is done.

#[repr(C)]
struct Global {
    state: i16,
    t: [u8; 3],
    cnt: u16,
    du8: u8,
}

fn init() -> Global {
    unsafe {
        MaybeUninit::zeroed().assume_init()
    }
}

In many cases, the compiler can optimize away most of the code for zero-initializing a struct. For example, this:

#[derive(Default)]
#[repr(C, packed)]
pub struct Global {
    state:i16,
    t:[u8;3],
    cnt:u16,
    du8: u8,
}

pub fn global() -> Global {
    Global::default()
}

Compiles down to this (just like the unsafe version above):

example::global:
        xor     eax, eax
        ret

Compiler Explorer link

1 Like

Many thanks!
However I am just not allowed to write

static mut g:Global = unsafe {Global::default()}

Compiler says the default() is not a const function and can't evaluate during compilation.

Many thanks!
However I'm not allowed to use init() to assign a global variable :

static mut g:Global = unsafe{init()}; // error: init() is not a const function

and not allowed to "const fn init() -> Global", get error "calls in constant functions are limited to constant functions, tuple structs and tuple variants".

Ah, for a static, the simplest solution is to write out the fields like this:

static mut g: Global = Global { state: 0, t: 0, cnt: 0, du8: 0 };

This will be evaluated at compile time and stored as static data in your binary; no initialization will happen at run time.

If this is too much boilerplate (perhaps you have a large number of fields, or many different structs), you can write a macro to generate a const fn initializer that does the same thing.

Thank you again! So it sounds like there is no 1 to 1 map to C.
That global has only one instance, a macro expands only once seems weird.
Forgive me, I do know mutable globals are error prone (and I should avoid them in new Rust code), just have to deal with them with existing C code.
Yes this "toy" struct is indeed a heavily simplified demo, the real ones are much more complex and contains nested (unnamed) unions/structs.
As a reference, here is the struct:

// This structure hold runtime and VM information.  It includes a section
// which contains root pointers that must be scanned by the GC.
typedef struct _mp_state_vm_t {
    //
    // CONTINUE ROOT POINTER SECTION
    // This must start at the start of this structure and follows
    // the state in the mp_state_thread_t structure, continuing
    // the root pointer section from there.
    //

    qstr_pool_t *last_pool;

    // non-heap memory for creating an exception if we can't allocate RAM
    mp_obj_exception_t mp_emergency_exception_obj;

    // memory for exception arguments if we can't allocate RAM
    #if MICROPY_ENABLE_EMERGENCY_EXCEPTION_BUF
    #if MICROPY_EMERGENCY_EXCEPTION_BUF_SIZE > 0
    // statically allocated buf (needs to be aligned to mp_obj_t)
    mp_obj_t mp_emergency_exception_buf[MICROPY_EMERGENCY_EXCEPTION_BUF_SIZE / sizeof(mp_obj_t)];
    #else
    // dynamically allocated buf
    byte *mp_emergency_exception_buf;
    #endif
    #endif

    #if MICROPY_KBD_EXCEPTION
    // exception object of type KeyboardInterrupt
    mp_obj_exception_t mp_kbd_exception;
    #endif

    // dictionary with loaded modules (may be exposed as sys.modules)
    mp_obj_dict_t mp_loaded_modules_dict;

    // pending exception object (MP_OBJ_NULL if not pending)
    volatile mp_obj_t mp_pending_exception;

    #if MICROPY_ENABLE_SCHEDULER
    mp_sched_item_t sched_queue[MICROPY_SCHEDULER_DEPTH];
    #endif

    // current exception being handled, for sys.exc_info()
    #if MICROPY_PY_SYS_EXC_INFO
    mp_obj_base_t *cur_exception;
    #endif

    #if MICROPY_PY_SYS_ATEXIT
    // exposed through sys.atexit function
    mp_obj_t sys_exitfunc;
    #endif

    // dictionary for the __main__ module
    mp_obj_dict_t dict_main;

    // these two lists must be initialised per port, after the call to mp_init
    mp_obj_list_t mp_sys_path_obj;
    mp_obj_list_t mp_sys_argv_obj;

    // dictionary for overridden builtins
    #if MICROPY_CAN_OVERRIDE_BUILTINS
    mp_obj_dict_t *mp_module_builtins_override_dict;
    #endif

    // include any root pointers defined by a port
    MICROPY_PORT_ROOT_POINTERS

    // root pointers for extmod

    #if MICROPY_REPL_EVENT_DRIVEN
    vstr_t *repl_line;
    #endif

    #if MICROPY_PY_OS_DUPTERM
    mp_obj_t dupterm_objs[MICROPY_PY_OS_DUPTERM];
    #endif

    #if MICROPY_PY_LWIP_SLIP
    mp_obj_t lwip_slip_stream;
    #endif

    #if MICROPY_VFS
    struct _mp_vfs_mount_t *vfs_cur;
    struct _mp_vfs_mount_t *vfs_mount_table;
    #endif

    #if MICROPY_PY_BLUETOOTH
    mp_obj_t bluetooth;
    #endif

    //
    // END ROOT POINTER SECTION
    ////////////////////////////////////////////////////////////

    // pointer and sizes to store interned string data
    // (qstr_last_chunk can be root pointer but is also stored in qstr pool)
    byte *qstr_last_chunk;
    size_t qstr_last_alloc;
    size_t qstr_last_used;

    #if MICROPY_PY_THREAD
    // This is a global mutex used to make qstr interning thread-safe.
    mp_thread_mutex_t qstr_mutex;
    #endif

    #if MICROPY_ENABLE_COMPILER
    mp_uint_t mp_optimise_value;
    #if MICROPY_EMIT_NATIVE
    uint8_t default_emit_opt; // one of MP_EMIT_OPT_xxx
    #endif
    #endif

    // size of the emergency exception buf, if it's dynamically allocated
    #if MICROPY_ENABLE_EMERGENCY_EXCEPTION_BUF && MICROPY_EMERGENCY_EXCEPTION_BUF_SIZE == 0
    mp_int_t mp_emergency_exception_buf_size;
    #endif

    #if MICROPY_ENABLE_SCHEDULER
    volatile int16_t sched_state;
    uint8_t sched_len;
    uint8_t sched_idx;
    #endif

    #if MICROPY_PY_THREAD_GIL
    // This is a global mutex used to make the VM/runtime thread-safe.
    mp_thread_mutex_t gil_mutex;
    #endif
} mp_state_vm_t;

The most interesting parts are

    // include any root pointers defined by a port
    MICROPY_PORT_ROOT_POINTERS

That macro will expand to some lines of extra legal C struct field lines.

By the way, it sounds to me in Rust I am not allowed to nest nor unname structs/unions (except in enums) as well. Once more, this is a common trick used in many existing C code.

Yeah you will ultimately have to write out every field to initialize your global. And yes, you can't nest unions and structs directly in structs without naming them like you can in C.

Will Rust core team add some Rust language extensions to make path from C to Rust more smoother?

Have you tried the c2rust?

Thanks for the hint! I'm still a brand new new self-learn new comer from C embedded world to Rust, and not yet get familiar with Rust ecosystem yet. I'll learn it to see.
In fact I do not must have to rewrite very fundamental C code, and at least as a student I hope to know some subtle differences.

I could conceive conveniences such as a way to initiallize your type with zeros being added, but not via language extensions. Such changes would become part of the core language.

That's exciting!
I believe this will encourage many embedded C users embrace Rust! So do support to nested struct/union.
By the way, the struct is from the Micropython project for Microcontrollers: "https://github.com/micropython/micropython/blob/cae77daf003212684a84b1b3a331d45564a0c286/py/mpstate.h#L109".
I think this project does heavily use tricks of C language and C pre-processor.

I tried C2rust on line, it's amazing, just the C conditional compilation can still be a problem.