Offsetof a field of a "C" struct (to implement an intrusive linked list)

I have some C structs like

struct list_t {
    struct list_t *next;
    struct list_t *prev;
}

struct object_t {
    int data;
    int more_data;
    struct list_t list;
}

The idea is that you can represent a list of objects by pointing next and prev at the list field of other object_ts. To recover a pointer to the object, you then have to subtract the offset of list from the start of object_t. In C you would use offsetof to do this. Is there an equivalent way to do this in Rust?

Since the C abi is stable and well defined, Rust could give me this information. Whether there is a way to get it or not is another matter.

EDIT Worth noting some features of the above linked list

  • Doubly linked.
  • Circular (no beginning and end), although you could use a null pointer if you did want a beginning/end.

The approach I'm going to take for now is to hard-code the offsets.

There isn't really any way to get the offsets besides constructing a raw pointer to each, casting to usize, and subtracting them from each other. Would it not be simpler to have list_t store pointers directly to the object_t, rather than to its third field?

1 Like

Isn't it undefined behavior in Rust to access the int data field using a list_t pointer?

It depends on the exact nature of that pointer, but you are right, that's something to be mindful of:

Given,

use ::std::{cell::Cell as Mut, os::raw::c_int, ptr};

#[derive(Debug, Default)]
#[repr(C)]
struct IntrusiveList {
    next: Mut< Option<ptr::NonNull<IntrusiveList>> >,
    prev: Mut< Option<ptr::NonNull<IntrusiveList>> >,
}

#[repr(C)]
struct Object {
    data: c_int,
    more_data: c_int,
    list: List,
}

then

let obj1 = Object {
    data: 0,
    more_data: 0,
    list: IntrusiveList::default(),
};
let list = &obj1.list;
let obj2 = Object {
    data: 0,
    more_data: 0,
    list: IntrusiveList {
        prev: Some(ptr::NonNull::from(list)).into(),
        .. IntrusiveList::default()
    },
};
list.next.set(Some(ptr::NonNull::from(&obj2.list))); // --+ Pointer
//                                                        | can only
// Get last data                                          | be used
let at_last_data: *const c_int = { //                     | to access
    let mut cur: ptr::NonNull<IntrusiveList> = //         | the data
        list.into() //                                    | stored in
    ; //                                                  | `.list`
    while let Some(next) = //                             |
        unsafe { cur.as_ref() }.next.get() //             |
    { //                                                  |
        cur = next; //                                    |
    } //                                                  |
    use ::memoffset::offset_of; //                        |
    cur .as_ptr() //                                      |
        .cast::<u8>() //                                  |
        .wrapping_sub(offset_of!(Object, list)) //        |
        .wrapping_add(offset_of!(Object, data)) //        |
        .cast::<c_int>() //                               |
}; //                                                     |
unsafe { // UB, ptr is not allowed to access that memory  |
    println!("{}", *at_last_data); // <-------------------+
}

triggers UB.

That being said, if one makes sure a pointer with provenance over the whole Object is used to create the raw pointer to the IntrusiveList, then all is fine:

- list.next.set(Some(ptr::NonNull::from(&obj2.list)));
+ list.next.set(ptr::NonNull::new(
+     <*const _>::cast::<u8>(&obj2)
+         .wrapping_add(::memoffset::offset_of!(Object, list))
+         as *mut IntrusiveList
+ ));
  • Playground (FWIW, it passes MIRI)

  • There are a few alternative ways to write this:

    • One can use the unsafe .add(…) operation rather than .wrapping_add(…), since we are offsetting to an inbounds field;

    • The current implementation of ::memoffset::raw_field! macro seems to yield a pointer with the right provenance:

      list.next.set(ptr::NonNull::new(
          ::memoffset::raw_field!(&obj2, Object, list) as _
      ));
      
    • One could use the soon-to-be-stable ::core::ptr::addr_of!, but know that it is easy to get the wrong provenance with it, see this issue:

      Whilst

      ptr::addr_of!( (*(&obj2 as *const _)).list )
      

      does yield a pointer with provenance spanning over the whole obj2 (FWIW, this is what ::memoffset::raw_field! currently does),

      ptr::addr_of!( obj2.list )
      

      does not!! :warning: :warning: :warning:


FWIW, I think that by writing this post I have incidentally answered the question of field offsetting :upside_down_face:

8 Likes

Note that Rust's struct layout does not promise that field declaration order matches the order in memory (unless the struct is repr(C)).

3 Likes

You may be interested in the instrusive-rs crate, for ideas if nothing else. (Amanieu is on the Rust library team.)

3 Likes

Thanks for all the replies

The decision to use intrusively linked lists is not mine: it is made by the library author. I'm trying to write bindings to the library. I believe intrustive linked lists are used because the code runs hot in a realtime thread, and so removing the extra indirection between the list and its items is worthwhile.

Yes it is important to note that the ABI of rust structs is unstable. You cannot rely on offsets there.

Awesome, thankyou!

Thankyou for your very detailed answer! In addition to learning that these structures are called an "intrusive" linked list, I did not know about provenance of pointers, and need to go and understand that fully. But thanks again for taking the time to write such a detailed answer.

@Yandros For my specific problem, I am handed a pointer to a list_t from a C library and am expected to construct the object_t pointer myself. Are you saying this is impossible to do without UB?

It depends on what you do with the pointer. If you turn a raw pointer to the full thing into a raw pointer to the field, you can turn it back, but if this conversion goes through a reference and back, you may run into trouble because a reference has all sorts of extra guarantees that may prevent you from going back to a pointer to the full thing.

So I am given a raw pointer to the field from C, and am expected to construct a raw pointer to the thing myself.

Context the library I am wrapping provides macros to do some of these things, but they disappear under bindgen.

If you get the raw pointer from C, it should be fine to cast to a u8 raw pointer, subtract the right number of bytes, and cast to pointer to the full thing.

3 Likes

That's great. Am I then allowed to turn that pointer into a reference (if I know a lifetime it will be valid for)?

There are additional requirements beyond just that you know a lifetime that it is valid for, but yes. Would it be a mutable or immutable reference?

So far I'm only looking at accessing immutable data given to me by the library, but mutable access might be required later.

I think I can guarantee exclusive access when I need mutable access.

If it is an immutable reference, the guarantee you need to uphold is that, between any two uses of the same immutable reference, the value has not been modified.

3 Likes

Gotcha, do I use UnsafeCell otherwise (I'm disallowing sending data to other threads)?

The ways you would need to use unsafe-cell in are a bit subtle. The way you should think of it is that, in the eyes of the requirement I mentioned, an UnsafeCell<T> is not considered modified even if the T inside it is.

Be aware that this implies that the guarantee is only turned off if the UnsafeCell is between the reference and the value. Creating an &T to a value inside an UnsafeCell does not turn off the guarantee, because the cell does not lie between the reference and the value. It is only turned off if you have an &UnsafeCell<T>.

The safety requirements of threading are unrelated to the guarantees imposed by references in the sense that the rule that says that you may not perform a data race applies even if you are using raw pointers.

2 Likes

@derekdreery if you use

use ::core::{cell::Cell as Mut, ptr};
use ::std::os::raw::c_int as int;

#[repr(C)]
struct Object {
    data: Mut<int>;
    more_data: Mut<int>;
    list: IntrusiveList;
}
// where
#[repr(C)]
struct IntrusiveList {
    next: Mut< Option<ptr::NonNull<IntrusiveList>> >,
    prev: Mut< Option<ptr::NonNull<IntrusiveList>> >,
}

then you can upgrade a *mut Object to a &Object just fine (provided the lifetime is OK (object is not deallocated in between) and no data races occur). Using Cell to wrap each field is the best way to represent C semantics, and avoid aliasing/immutability-related UB.

No, in my example I was talking about creating (and potentially giving it to the FFI) a pointer to a list with provenance over the .next and .prev fields only. But if you are receiving such a pointer, then it was C's job to imbue it with the right provenance (which, in practice, can be assumed to be always the case). So no issues in that regard.

2 Likes

So, to summarise:

There are semantically 3 types of pointer in rust, *const/*mut, & and &mut. These pointers might be fat, but its not possible to construct fat pointers directly yourself so I'm ignoring this detail. The rules are as follows

For *const/*mut

  • When dereferencing, the pointer must point to a valid (initialized, aligned, live) object
  • The data must be a valid instance of the pointer type (note that this doesn't mean the types have to be identical: a u8 can be read as an i8, because all valid u8s are valid i8s).
  • There must be no chance that, in parallel, (data race):
    • We write to the pointer while another thread reads or writes through the pointer.
    • We read the pointer while another thread writes through the pointer.
  • Extra conditions apply to some pointer methods, like offset.

For &

  • The data must be a valid instance of the pointer type (see above).
  • The data must be valid for the lifetime of the reference.
    • Strictly speaking, it must be valid at time of access, but if you hand a reference to code you don't control, you must guarantee any safe access is valid (i.e. the data must be valid for the lifetime of the reference).
  • The data must not change between reads, unless there is an UnsafeCell between the changed data and the reference (defined as when an the reference points to a type that contains an UnsafeCell, which contains data that changes). In the UnsafeCell case, there must be no data races.
  • No &mut references exist for the object.

For &mut

  • The data must be a valid instance of the pointer type (see above).
  • The data must be valid for the lifetime of the reference (it cannot change because it is not observed anywhere else).
  • No other references (either & or &mut) exist for the object.
  • The data is not observed except through the pointer, this includes pointer::reading through a raw pointer.
  • Raw pointers to the object can exist, as long as they are not dereferenced (is this correct??).

Have I made any mistakes?