Curious about Arc implementation details

ArcInner is defined as:

#[repr(C)]
struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,
}
  1. Why is data T and not *T, a pointer to a T elsewhere on the heap? I could be wrong but I believe C++'s shared_ptr uses the latter approach. As I understand it, when the strong count hits zero, drop_in_place is called on data. Assuming there are weak pointers keeping this ArcInner alive, no memory is freed (unless data's destructor frees stuff elsewhere). So what's the point of calling drop_in_place, to comply with Arc semantics?

  2. Why is data not declared as the first field? It seems to me that there is some unnecessary complexity in calculating alignment and byte offsets in Arc::from_raw. If data were the first field, wouldn't it be possible to simply cast *mut T as *mut ArcInner<T>?

1 Like

Why is data T and not *T , a pointer to a T elsewhere on the heap?

If you put a pointer there, then every access to the data would be require following two pointers: one to the ArcInner and then another to the T. That'd be significantly slower.

So what's the point of calling drop_in_place , to comply with Arc semantics?

Well, the T has to be dropped some way, or there would be a memory leak. If you mean some other implementation of dropping the T, can you say more?

Why is data not declared as the first field?

It can be dynamically sized, and the compiler requires such fields to be the last field. But note that there isn't necessarily any significant added cost to having the ArcInner fields at the front; if you have Arc<MyStruct> then accessing fields of MyStruct is itself adding a constant offset to the pointer, and the optimizer will easily be able to combine the two offsets into one.

3 Likes

I thought we could just wait for the weak pointers to be dropped, but I just realized that makes no sense as it brings back the problem of reference cycles.

It can be dynamically sized

Ah, totally forgot about that.

Thanks!

Furthermore, consider that even if Arc stores a pointer to data instead of a pointer to the data block, offsets from to the other are still required, just on clone and drop instead of {into|from}_raw and deref.

Also, fwiw, C++ shared_ptr often isn't Arc<*mut T>, but more like (Arc<()>, *mut T). The control block is allocated separately specifically because you can make a shared_ptr from a new T. But if you directly create the shared_ptr, it does use an inline allocation like (Arc<T>, *mut T); that this remains somewhat reasonable is because free and delete[] mandate the capability to deallocate without knowing size.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.