Given the code snippet from the following question:
use std::rc::Rc;
let vec_var = vec![1.0, 2.0, 3.0];
let foo = Rc::new(vec_var);
let a = Rc::clone(&foo);
let b = Rc::clone(&foo);
@thv has made a drawing about the actual memory layout.
struct RcBox<T: ?Sized> {
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
I just wonder why we have the fields in this order.
This memory arrangement favors the cloning aspect of the RC instead of the access, so the ref. counting block is under the pointer.
If I wanted to clone the RC, then in x86 it would be something like that:
mov eax, [esp+0x10] // Loading the address inside the pointer into the eax register
inc [eax]
If I wanted to access the underlying vector
mov eax, [esp+0x10] // Loading the address inside the pointer into the eax register
mov eax, [eax+0x4] // eax contains the address if the first number (1.0)
As you can see there is an address arithmetic step (+0x4) involved in the data access.
I guess for a smart pointer the dereferencing is much more common than the cloning, so I wonder if RC shall optimize for that. Swapping the members around:
struct RcBox<T: ?Sized> {
value: T,
strong: Cell<usize>,
weak: Cell<usize>,
}
Cloning
mov eax, [esp+0x10] // Loading the address inside the pointer into the eax register
inc [eax+sizeof(T)]
Access the underlying vector
mov eax, [esp+0x10] // Loading the address inside the pointer into the eax register
mov eax, [eax] // eax contains the address if the first number (1.0)
Maybe for x86 it makes no difference (or very small), but for other architectures, where there is no such a sophisticated addressing logic, it might make a difference (if an explicit "add" operation is needed)