How expensive is Arc vs Rc vs & reference for accessing data

nickzakirov · May 16, 2023, 7:52am

I understand that Arc is more expensive that Rc because it uses Atomics. And Rc is more expensive than reference because it increases/decreases reference count in runtime during clone/drop operations.

Another difference is that Arc and Rc allocate on the heap (and allocations are expensive) while a & reference is on a stack. However this allocation happens only during ::new, not clone/drop

Another difference is that Arc and Rc both add additional memory on the heap (usually 16 bytes for strong/weak counters). This may play a role when you care a lot about cache performance or embedded systems. However this additional memory is used one time, not for every clone of the Arc/Rc

My question: Is there any additional overhead in accessing (not storing or dropping) the underlying data under Arc/Rc vs reference. All three need dereferencing in Runtime but is Arc/Rc dereferencing more expensive?

H2CO3 · May 16, 2023, 8:19am

No, accessing an Rc/Arc in itself isn't more expensive. Atomics and/or the reference counter itself isn't needed for accessing the inner value, and a reference to the heap isn't inherently slower to access than a reference to any other memory area.

jonh · May 16, 2023, 9:39am

They (Arc/Rc) have identical code to identical* structures. Derefs are just pointers with fixed offset, in theory the offset adds overhead vs regular ref but CPUs have pipelines and dedicated transistors that eliminate any slowdown.

* atomic::AtomicUsize & Cell<usize> are identical when compiled and you don't access them.

jonh · May 16, 2023, 9:51am

There also can be compiler optimization that eliminates runtime code. In theory such might be the same for all types; reality is we don't know for sure and not inclined to put effort in to find out.

jbe · May 16, 2023, 10:33am

I also was wondering about the offset. So it seems to be a theoretical overhead which (maybe?) doesn't happen in practice on common architectures then.

The inner definition is here:

#[repr(C)]
struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

Which is used there:

#[cfg_attr(not(test), rustc_diagnostic_item = "Rc")]
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_insignificant_dtor]
pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
}

nickzakirov · May 16, 2023, 10:34am

Thanks I was worried about this offset actualy. Whether it adds one more instruction.
For the sake of experiment I went to https://rust.godbolt.org/ and compiled the following code with -C opt-level=3 (looks like it compiles to x86 architecture)

use std::hint::black_box;
pub fn test1() {
    let i = &0;
    black_box(i);
    let b = *i;
    println!("{}", b);
}
use std::rc::Rc;

pub fn test2() {
    let i = Rc::new(0);
    black_box(i.as_ref());
    let b = *i;
    println!("{}", b);
}

So the result for this let b = *i; operation in the first case is 2 instructions:

        mov     dword ptr [rsp + 4], 0
        lea     rax, [rsp + 4]

and in second case 3 instructions:

        mov     eax, dword ptr [rbx + 16]
        mov     dword ptr [rsp + 4], eax
        lea     rax, [rsp + 4]

Do you think pipelines eliminate this overhead?

jbe · May 16, 2023, 10:43am

You could write a benchmark to be sure, e.g. using criterion.

nickzakirov · May 16, 2023, 10:45am

Good point. I have a mac though. Can try it on the m1. Would be great if someone tries on x86

nickzakirov · May 16, 2023, 10:53am

Thanks for the Rc code. I will also put here Arc code for completeness

#[repr(C)]
struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,
}

nickzakirov · May 16, 2023, 10:55am

I think this depends on the Ordering used for clone/drop operations (it's Acquire and Release as I see in the code) and on the CPU architecture. It might compile into same instructions on x86 but not on arm64

H2CO3 · May 16, 2023, 11:10am

I'm not sure that's the correct code to test with. The &0 makes the literal subject to static promotion. When testing with the following pieces of code, I got three instructions for both:

use std::hint::black_box;
use std::rc::Rc;

pub fn test1() {
    let i = 0;
    let ref_i = &i;
    black_box(ref_i);
    let b = *ref_i;
    println!("{}", b);
}

pub fn test2() {
    let i = Rc::new(0);
    black_box(i.as_ref());
    let b = *i;
    println!("{}", b);
}

the dereference compiling to

        mov     eax, dword ptr [rsp]
        mov     dword ptr [rsp + 4], eax
        lea     rax, [rsp + 4]

in the first case, and to

        mov     eax, dword ptr [rbx + 16]
        mov     dword ptr [rsp + 4], eax
        lea     rax, [rsp + 4]

in the second case.

Since the mov instruction has all sorts of offsets and multiplers (oh, did I mention it is compute-universal?), I highly doubt that there is any kind of statistically significant observable effect distinguishing the two snippets, and I'm practically entirely sure that dereferencing Rc is never going to be a meaningful bottleneck in any non-trivial application you are writing.

zirconium-n · May 16, 2023, 11:22am

Although it's not done currently. Theoretically, Rc/Arc can store the pointer to T directly and use reverse offset when accessing ref counts. This eliminate the offset in mov instruction.

nickzakirov · May 16, 2023, 11:26am

Great! Thanks for the code correction. I vote this is the solution

jbe · May 16, 2023, 11:31am

The offset is removed when using Rc::into_raw, for example.

system · August 14, 2023, 11:31am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How much expensive is Arc vs Rc?	16	16667	December 14, 2020
Dynamically choose between Arc or Rc	20	4076	January 12, 2023
Arc - zero cost abstraction - optimization help	4	238	December 28, 2025
Rc/RefCell vs Arc/Mutex performance help	10	5933	February 15, 2022
How large is the difference between Rc and Arc? community	8	11007	March 21, 2024

How expensive is Arc vs Rc vs & reference for accessing data

Related topics