Using Arc<T> correctly in Tokio multi-threaded applications

I'm trying to use the Arc<T> type to share a reference to a data structure that connects me to a remote service (Amazon S3 object storage). I want to configure the connection once, and then share an immutable reference with several Futures, across one or more threads.

I thought that using an Arc<T> would allow me to pass the Arc immutable reference into the async function and share it across separate Future instances. However, I am required to call the clone() method on the Arc for it to work correctly. How is this any different / better than simply cloning the S3 Client struct for each Future?


First, I define an async function that creates some random objects in a bucket, accepting an Arc<Client> parameter.

async fn create_object(s3_client: Arc<Client>, bucket_name: String) {
    ....
}

In the main function, I create my S3 client, using the AWS SDK. And then wrap it in Arc<T>

let s3_client = s3::Client::new(&aws_cfg); // This is an aws_sdk_s3::Client struct
// https://docs.rs/aws-sdk-s3/latest/aws_sdk_s3/struct.Client.html

let s3_client_arc = Arc::new(s3_client);

Then I create a couple of threads:

let mut join_handle_list = vec![]; // Holds the JoinHandle instances from spawn()

for _ in 1..=2 {
    let new_future = create_object(s3_client_arc, bucket_name.clone());
    join_handle_list.push(tokio::spawn(new_future));
}

:rotating_light: However, this results in the common error message:

use of moved value: s3_client_arc; value moved here, in previous iteration of loop

The work-around seems to be to call .clone() on the Arc<T>, but this seems to defeat the purpose of using the Arc<T> to wrap the Client, right? :thinking:

:white_check_mark: This compiles successfully:

let mut join_handle_list = vec![]; // Holds the JoinHandle instances from spawn()

for _ in 1..=2 {
    let new_future = create_object(s3_client_arc.clone(), bucket_name.clone());
    join_handle_list.push(tokio::spawn(new_future));
}

What's the correct way to do this? The first example in the documentation seems to indicate that the correct way is to use clone(), but it feels like this is inefficient. Is this more efficient than simply calling .clone() on the Client struct itself, without using Arc<T>?

Cloning an Arc (or Rc) just increments its reference count. It it meant to be very inexpensive. It does not clone the object it contains (the Client).

2 Likes

Thanks, I had a feeling that might be the case. What would be the easiest way of examining this / proving this out?

Looking at process memory utilization?

I'm an amateur low-level developer, so I'm curious if there's a good way of observing the memory layout and utilization in Rust applications.

let c0 = Client::new();
let c1 = c0.clone();
let c2 = c1.clone();
c0
+---+---+---+---+---+---+---+---+---+---+---+---+
|   Client data ...                             |
+---+---+---+---+---+---+---+---+---+---+---+---+

c1
+---+---+---+---+---+---+---+---+---+---+---+---+
|   Client data ...                             |
+---+---+---+---+---+---+---+---+---+---+---+---+

c2
+---+---+---+---+---+---+---+---+---+---+---+---+
|   Client data ...                             |
+---+---+---+---+---+---+---+---+---+---+---+---+
let c0 = Arc::new(Client::new());
let c1 = c0.clone();
let c2 = c1.clone();
c0
+---+
| A |---+
+---+   |
        |
c1      |
+---+   |
| A |---+
+---+   |
        |
c2      |
+---+   |
| A |---+
+---+   |
        |
        v         (heap somewhere)
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| S | W |  Client data ...                              | 
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Key:

  • Each +---+ is a usize
  • A is an address[1]
  • W is the weak count of the Arc
  • S is the strong count of the Arc
  • Layout is also an implementation detail (not guaranteed)

You have to clone because action is required -- maintaining the counts -- to create a new Arc pointing at the same object. Rust doesn't have copy constructors.

The clone is not a "deep clone" that (recursively) duplicates all data. Rust doesn't have a concept of a deep clone; structs can implements Clone however they want.[2][3]

Some prefer to write Arc::clone(&c0) instead of c0.clone() to emphasize it's a cheap and non-deep clone. (Others find that to be unnecessary noise.)


  1. IIRC it points to the object directly, but this is an implementation detail ↩︎

  2. Shared references -- &T -- implement Clone by copying the reference only, for another example. You never call clone on a shared reference on purpose unless it's part of a generic, but that's what it does -- and when it is called in generic context, that's what you want! ↩︎

  3. In the top diagram I assumed you just derived Clone :slightly_smiling_face: ↩︎

4 Likes

I love the ASCII diagrams. Thanks for taking the time to create them! That's pretty much how I was mentally visualizing it. I should have created those myself and just asked for confirmation. :smile:

The weak/strong counters are things I need to read up on, and better understand.

I wrote a rudimentary example program that demonstrates the reduction in memory utilization.

#![allow(unused)]

#[tokio::main]
async fn main() {
    example_with_arc().await;
    example_without_arc().await;
}

async fn example_without_arc() {
    println!("\nThreading without using Arc<T>:");

    let p1 = Person::default();
    println!("Size of Person: {0}", std::mem::size_of_val(&p1));
    
    let p2 = p1.clone();
    println!("Size of Person: {0}", std::mem::size_of_val(&p1));

    let t1 = std::thread::spawn(move || do_something_noarc(p1));
    let t2= std::thread::spawn(move || do_something_noarc(p2));
}

async fn example_with_arc() {
    println!("Threading using Arc<T>:");

    let p1 = Person::default();
    println!("Size of Person: {0}", std::mem::size_of_val(&p1));
    
    let p1_arc = std::sync::Arc::new(p1);
    println!("Size of Arc<Person>: {0}", std::mem::size_of_val(&p1_arc));
    
    let p1_arc_t1 = p1_arc.clone();
    let p1_arc_t2 = p1_arc.clone();
    println!("Size of Arc<Person>: {0}", std::mem::size_of_val(&p1_arc_t1));
    println!("Size of Arc<Person>: {0}", std::mem::size_of_val(&p1_arc_t2));

    let t1 = std::thread::spawn(move || do_something(p1_arc_t1));
    let t2= std::thread::spawn(move || do_something(p1_arc_t2));
}

async fn do_something(person: std::sync::Arc<Person>) {
    println!("{0}", person.first_name);
}

async fn do_something_noarc(person: Person) {
    println!("{0}", person.first_name);
}

#[derive(Clone)]
struct Person {
    first_name: String,
    last_name: String,
    age: u8,
    description: String
}

impl Default for Person {
    fn default() -> Self {
        Person{
            first_name: "Trevor".to_string(),
            last_name: "Sullivan".to_string(),
            age: 30,
            description: "A human being".to_string(),
        }
    }
}

The output looks like this:

Threading using Arc<T>:
Size of Person: 80
Size of Arc<Person>: 8
Size of Arc<Person>: 8
Size of Arc<Person>: 8

Threading without using Arc<T>:
Size of Person: 80
Size of Person: 80

So, as far as stack memory consumption goes, it's 160 bytes after cloning the Person struct, versus 104 bytes when using a shared reference with Arc<T>. That wouldn't include the increase in memory consumption on the heap as well though, would it?

I'm assuming there's also CPU performance benefits to using stack-allocated pointers like Arc<T> versus cloning items on the heap as well. However, that would be more challenging to measure with such a small program.

You can also note that Arc<T> can be Cloned even when T is not able to be cloned (does not implement Clone):

If we have a struct Client { ... } (with no #[derive(Clone)] or other impl Clone):

let client = Client::new();
let copy = client.clone(); // not valid, compile error

let arc = Arc::new(Client::new());
let copy = Arc::clone(&arc); // equivalent to arc.clone()

it is still possible to clone the Arc even when we can't clone the Client, so the two Arcs point to the same one Client.

2 Likes

It doesn't, no. It's probably 16 bytes.

You had your size 80 structure on the stack, then you moved it to the heap along with a couple counters (16 bytes plus maybe padding for alignment). So 96 bytes on the heap, and every Arc is 8 bytes.

Ideally in an optimized function, you never have the 80 bytes on the stack (but it's not guaranteed). Presumably, most of your code will only see the Arc<Person> by value anyway, in which case they're getting an 8 byte pointer instead of 80 bytes of data. Modulo optimizations, again.

There's a bunch of other considerations for optimizations or micro-optimization, like speed of cloning, caching, and so on.

However there's also a semantic consideration which is usually important: Arc gives you shared ownership of some resource. If you're just sharing immutable, plain-ol-data around -- some configuration blob -- this isn't hugely important. But if you're trying to share something like an actual connection, it matters a lot. Do you want everyone on the same connection, or a new connection for each? Big difference.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.