Dereferencing pointers to structs returned by C functions / Danger of misalignment?

Hi all,

I have a question regarding FFI and alignment. I call a C function from Rust, which returns a pointer to a struct cmach_lua_t. This one:

typedef struct {
    lua_State *L;
    size_t mem_used;
    size_t mem_limit;
    void *execlimit_error;
} cmach_lua_t;

The signature of the C function is:

cmach_lua_t *cmach_lua_new()

Now I would assume that such an interface is not unusual. My question is: Do I have any guarantee that the pointer returned by such a C function is properly aligned? I guess generally: no. Which would mean I cannot safely dereference that pointer, right?

However, let's assume the function internally does:

cmach_lua_t *cmach_lua_new() {
    cmach_lua_t *S;
    /* … */
    S = calloc(1, sizeof(cmach_lua_t));
    if (!S) return NULL;
    /* … */
    return S;
}

Will the calloc then assure the structure is properly aligned? And is the alignment done by calloc sufficient to meet Rust's alignment rules? Couldn't they theoretically (or practically) differ?

1 Like

Generally, such pointers will always be properly aligned. Alignment is also a requirement for using the pointers in C. The malloc and calloc calls return pointers with sufficient alignment for all types that are possible to write in C.

Strictly speaking, you can use the #[repr(align(...))] attribute to define a Rust type with a larger alignment than what malloc returns, but this likely does not apply in your case.

8 Likes

If you're interacting with a C struct, the Rust definition needs to be #[repr(C)].

C has a concept of maximum alignment. calloc returns maximally aligned memory from this perspective. If the C struct doesn't have some extended alignment requirement, it will be aligned.

3 Likes

So if I see a C library that returns some_type *, then I can assume that this library will return an aligned pointer (or is that risky?), and if I use bindgen, that should add the #[repr(C)] automatically, I guess.

@alice says:

Thus I would assume it's reasonable to assume that a function returning some_type * will give me an aligned pointer, right? (At least that much aligned that it will fit with Rust's #[repr(C)]?

P.S.: There was a different post (regarding NonNull) before, which confused me a bit but has been deleted meanwhile. I assume NonNull::new does not fix alignment, but functions in C generally will give me aligned pointers unless explicitly documented. Still I'm not entirely sure how alignment in C works. The following sentence in calloc()s manpage (jemalloc) confused me:

The malloc() function allocates size bytes of uninitialized memory. The allocated space is suitably aligned (after possible pointer coercion) for storage of any type of object.

Not sure what "after possible pointer coercion" means here.

As usual, there's no guarantees of anything in C about pointers, so in a strict interpretation you need to read the documentation to be sure.

But practically, pointers in C are sufficiently-aligned by convention, and it's likely that only unaligned ones would get called out specifically.

I doubt I'd find any C API documentation that would give me info on alignment of handles being pointed to by returned pointers. But since they need to be properly aligned to use them, I guess they will be "sufficiently-aligned" in most cases.


Actually I stumbled upon alignment because I want to store values in LMDB, and LMDB does not give any alignment gurantees (except some 2-byte alignment it seems, maybe for UTF-16). So I guess I'll need to use read_unaligned when reading anything that doesn't have a 1-byte alignment in Rust (such as str or [u8]).

I don't understand what is this idea of pointer itself being aligned?
Are you sure you do not confuse it with alignment of data inside?

In any case as long as you take into account repr(C) and any packed directives of C struct you should be able to replicate struct with precisely the same size and alignment

C's sizeof always returns number of bytes necessary to represent object (to put it simply it returns size of struct, including padding).
So as long as there is no packed directives it is going to return proper size with padding suitable for you underlying hardware.

TL;DR; you can deref this pointer as underlying struct without any risk of misalignment. So it boils down to you to write Rust's struct that is equal in its composition to C's one. (much better than reading byte by byte)

P.S. to clarify malloc/calloc has no idea about data being stored so it is designed in a way that alignment is valid for any type

No, I meant the pointer to the struct being aligned (according to the same rules Rust demands for that type when using #[repr(C)].


Let me provide an example of some C code I have written (to demonstrate my question):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct {
    int a;
    char b;
} foo_t;

foo_t *new_foo() {
    foo_t *ptr = calloc(1, sizeof(*ptr));
    ptr->a = 7;
    ptr->b = 12;
    return ptr;
}

void double_foo(foo_t **first, foo_t **second) {
    foo_t *x, *y, *dbl;
    x = new_foo();
    y = new_foo();
    dbl = calloc(2, sizeof(*dbl));
    memcpy(dbl+0, x, sizeof(*x));
    memcpy(dbl+1, y, sizeof(*y));
    free(x);
    free(y);
    memmove((void *)(dbl+1)-1, dbl+1, sizeof(*y));
    *first = dbl;
    *second = (void *)(dbl+1)-1;
}

int main() {
    foo_t *x, *y;
    double_foo(&x, &y);
    printf("x = %p\n", x);
    printf("y = %p\n", y);
    printf("x.a = %i\n", x->a);
    printf("x.b = %hhi\n", x->b);
    printf("y.a = %i\n", y->a);
    printf("y.b = %hhi\n", y->b);
    return 0;
}

(Edit: moved free invocation to be called after memcpy; output is still the same as shown below)

When I compile this C code on my system, I get:

x = 0x800a09000
y = 0x800a09007
x.a = 7
x.b = 12
y.a = 7
y.b = 12

Here, y is not aligned (and apparently I can work with that in C). Yet it's what's returned by double_foo. Maybe this is generally undefined behavior in C though? (I don't know enough about alignment in C to answer that question.)

I understand that when I use calloc, the data will be aligned. But if I have a general API description that returns a pointer to some struct, can I be sure that the data will be aligned as well (when I don't know how the function internally works)? Aparently not, as the example above shows. But will this ever happen in practice?

For any type would be too strong restriction. AVX512 instructions requires its arguments to be aligned by 64bytes, and nobody can prevent intel to add AVX1024 in some future. Some high-performance IO libraries use page aligned buffer types to handle memlocks efficiently.

The C standard calls it "fundamental alignment" that minimum required alignment of the ptr returned by malloc family, which usually is alignment of long double type, which usually is IEEE-754 quadruple precision floating point type, which usually requires 128bit/16byte alignment.

I think there is some confusion.
malloc/calloc guarantees that returning pointer will be valid for any type of data
This guarantee is necessary because it doesn't accept alignment information, only size.

Pointer address itself (or let's simply call it integer) has no relation to alignment of underlying data.
In any case you can safely dereference pointer as long as you use correct type as Rust provides full capability to copy C's type layout

But if I have a general API description that returns a pointer to some struct, can I be sure that the data will be aligned as wel

In general if you trust API then yes, otherwise you have to check source code yourself to know how data has been written.
In your example where y is created using calloc it is a bit difficult to judge whether second pointer is valid (I assume implementation would just use layout similar to array, but I do not know for fact)

For any type would be too strong restrictios

malloc cannot follow C's standard on alignment.
Again there seems to be misunderstanding. All that malloc does is to allocate block of memory that would fit provided size, it cannot know about alignment hence it is not question of whether it is restrictive or not, implementation has to support any type

malloc is a function defined in the C standard library (libc). Implementations should follow the C standard.

malloc doesn't know anything about what type will be stored to the memory region it returns. That's why the stddef.h provides max_align_t type which has alignment that ptrs returned bt malloc must satisfy. If a caller requires more alignment they should use other allocator functions which takes alignment parameter like posix_memalign().

4 Likes

I expanded my example to be used by Rust. This is what I added:

[build-dependencies]
cc = "1.0.73"
bindgen = "0.59.2"

I put this in my Cargo.toml. Then I created a build.rs:

fn main() {
    {
        let mut builder = bindgen::Builder::default();
        builder = builder.header("src/c_code.c");
        builder =
            builder.parse_callbacks(Box::new(bindgen::CargoCallbacks));
        let bindings = builder.generate().unwrap();
        let out_path =
            std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap());
        bindings
            .write_to_file(out_path.join("c_code_bindings.rs"))
            .unwrap();
    }
    {
        println!("cargo:rerun-if-changed=src/c_code.c");
        let mut config = cc::Build::new();
        config.file("src/c_code.c");
        config.compile("c_code.a");
    }
}

I moved the C example code above into a file named src/c_code.c and provided the following src/main.rs file:

use std::mem::MaybeUninit;

mod bindings {
    #![allow(non_camel_case_types)]
    #![allow(non_snake_case)]
    #![allow(dead_code)]
    #![allow(improper_ctypes)]
    include!(concat!(env!("OUT_DIR"), "/c_code_bindings.rs"));
}

fn main() {
    unsafe {
        let mut first = MaybeUninit::<*mut bindings::foo_t>::uninit();
        let mut second = MaybeUninit::<*mut bindings::foo_t>::uninit();
        bindings::double_foo(first.as_mut_ptr(), second.as_mut_ptr());
        let first = first.assume_init();
        let second = second.assume_init();
        println!("first = {:?}", first);
        println!("second = {:?} (this will be unaligned!)", second);
        println!("first.a = {:?}", (*first).a);
        println!("first.b = {:?}", (*first).b);
        println!("second.a = {:?}", (*second).a);
        println!("second.b = {:?}", (*second).b);
    }
}

The c_code_bindings.rs file contains the following struct definition:

/* … */
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct foo_t {
    pub a: ::std::os::raw::c_int,
    pub b: ::std::os::raw::c_char,
}
/* … */

And the output of my Rust program is:

first = 0x80181e000
second = 0x80181e007 (this will be unaligned!)
first.a = 7
first.b = 12
second.a = 7
second.b = 12

Should that scare me? And if yes, where's the problem? Am I doing things wrong in C? Or is this only a problem in Rust?

You do realize there was life before C11?
In any case I do not think you need to care about case where some C library would require alignment above max_align_t

Should that scare me? And if yes, where's the problem? Am I doing things wrong in C? Or is this only a problem in Rust?

I just noticed, but why you do *second = (void *)(dbl+1)-1?
When you offset foo_t* by 1 you get offset by size_of(foo_t) hence addr should be aligned as struct itself is aligned properly

I do this simply to demonstrate that a C API might return an unaligned pointer, e.g. if it's a pointer to memory that has been moved, which might happen when storing data in (and retrieving data from) LMDB, for example.

It's just to make the function deliberately return an unaligned pointer for the purpose of demonstrating that this is possible in C.

I would assume my mistake in C is to access that pointer? (But returning one is valid?)

Well sure, that's why I mentioned you have to trust API, if you cannot trust then you can assume worst.
I usually prefer to see source code to be sure.
Generally if data that you store in pointer is aligned on itself, then valid C code cannot return invalid pointer.
If it returns then yes, you're screwed, but at this point it doesn't matter how you read this pointer, it is going to be invalid data most likely

But behavior of unaligned pointer reading is not defined in C hence it depends on hardware, modern consumer CPU should handle it just fine, but if you work with something esoteric then that C code would likely blow before you can even get this pointer.
So if C was able to handle it, Rust should be able too

I would assume my mistake in C is to access that pointer? (But returning one is valid?)

Reading/writing behavior depends on platform, so if it only returns ptr without doing any read/write to it, then C code is valid regardless of platform

P.S. depending on compiler there are ways to access unaligned pointer (e.g. due packed data) in a valid manner

Hmmm, I'm still unsure what to do. Perhaps I should try to ask some more concise questions:

  1. Is it valid in C to return an unaligned pointer?
  2. Is it valid or invalid in C to dereference an unaligned pointer? Or is this platform-dependent?
  3. Is it always considered undefined behavior in Rust to dereference an unaligned pointer?
  4. When returning unaligned pointers in C, can I expect that the API documentation would include a huge warning? If yes, then I would assume I'm generally safe when dereferencing pointers in Rust which came from a C API as long as there is no warning in the documentation.

I do believe the posix_memalign predates ANSI C which was standardized after the introduction of the POSIX, but it's hard to prove.

Despite the existence of the AVX512 both GNU malloc, jemalloc, mimalloc and tcmalloc not always returns 64bytes aligned ptr when allocating 64 bytes. Are they all invalid implementations?

Is it valid in C to return an unaligned pointer?

Actually checking C11 standard, it seems if you create invalid pointer to object, it is considered UB too (although in practice nothing would happen since it is just math)

Is it valid or invalid in C to dereference an unaligned pointer? Or is this platform-dependent?

I don't think C11 standard has any special case on that topic, but well it is probably UB as C cannot really guarantee anything since pointer is invalid

Is it always considered undefined behavior in Rust to dereference an unaligned pointer?

It is the same as C technically speaking

When returning unaligned pointers in C, can I expect that the API documentation would include a huge warning? If yes, then I would assume I'm generally safe when dereferencing pointers in Rust which came from a C API as long as there is no warning in the documentation.

No, I doubt, it is most likely a mistake of author rather than intentional.
But I think it is correct to assume that returned pointer is valid if you can trust library author.

I'm not sure I understand you here.
Why would it make invalid impl? you seem to misunderstand me, all I'm saying is that malloc was originally designed to work for any valid C type (considering there are compiler extensions for packed structs I'm sure it would handle it too)

Creating an unaligned pointer in Rust, however, seems to be okay. Otherwise std::ptr::read_unaligned would not need to exist. So it would be legal in Rust but illegal in C? :face_with_raised_eyebrow:

P.S.: std's doc on pointers say:

Working with raw pointers in Rust is uncommon, typically limited to a few patterns. Raw pointers can be unaligned or null. However, when a raw pointer is dereferenced (using the * operator), it must be non-null and aligned.