Copy-on-send strings

My code deals with a relatively small set of dynamically generated but immutable strings which are passed around a lot, so to reduce the number of heap allocations I wrap these strings in std::rc::Rcs. Doing so worked wonderfully for the serial parts of my code, but now I want to also be able pass these strings between asynchronous tasks, and this doesn't work out of the box since Rc<String> does not implement Send.

I could of course easily resolve this situation by wrapping my strings in std::sync::Arcs instead of std::rc::Rc, but that would reduce my serial performance. Instead of doing this, I was hoping I can construct a type which clones like an Rc<String> within the serial parts of my code but which clones by copying when a value is sent between async tasks. Does Rust allow writing such a "copy-on-send" type, and if so how do you do it?

With unsafe:

// Invariant: This is the only Rc pointing to this memory.
pub struct SendRc<T: ?Sized>(Rc<T>);

// SAFETY: This Rc has no clones (invariant).
unsafe impl<T: ?Sized + Send> Send for SendRc<T> {}
// SAFETY: This Rc has no clones (invariant).
unsafe impl<T: ?Sized + Sync> Sync for SendRc<T> {}

impl<T: ?Sized + Clone> SendRc<T> {
    pub fn new(mut rc: Rc<T>) -> Self {
        // Make sure this Rc is unique.
        Rc::make_mut(&mut rc);
        Self(rc)
    }
}
impl<T: ?Sized> SendRc<T> {
    pub fn extract(self) -> Rc<T> {
        self.0
    }
}
2 Likes

Out of curiosity, have you quantified the performance impact or is this just speculation?

5 Likes

It's speculation. And my question is also more about learning something new rather than solving this particular problem.

I understand what the code does, but I don't understand what the code is trying to achieve. If an Rc is unique, can't I just .try_unwrap().unwrap() it? And after passing it to a different thread, you could create a new Rc around it.

Hmmm, now that I think about it, is this to avoid the costs of allocating the Rc? Then it makes sense to me.

That would mean when sending the value, an actual action would have to be performed automatically. I don't think this is possible in Rust, but I'm curious if I'm wrong.

What I think could be done is cloning the String manually, this way:

use std::rc::Rc;
use std::thread::{JoinHandle, spawn};

fn sub_task(rc: &Rc<String>) -> JoinHandle<()> {
    // We clone the `String` manually:
    let to_be_sent = String::clone(rc);
    spawn(move || {
        // We can turn the cloned `String` back into `Rc`s:
        let rc3 = Rc::new(to_be_sent);
        let rc4 = rc3.clone();
        // And some debug output:
        println!("Inside thread: {rc3}, {rc4}");
    })
}

fn main() {
    // Let's assume we get an allocated `String` from somewhere:
    let s = String::from("ABC");
    // We turn it into an `Rc`:
    let rc1 = Rc::new(s);
    // We can clone the `Rc` without copying the `String` itself:
    let rc2 = rc1.clone();
    // Let's use one of the `Rc`s for a task that will be executed in another thread:
    let thread = sub_task(&rc1);
    // And some debug output:
    println!("Outside thread: {rc1}, {rc2}");
    // Joining the other thread:
    thread.join().expect("thread panicked");
}

(Playground)

Output:

Outside thread: ABC, ABC
Inside thread: ABC, ABC


P.S.: In order to make the code more abstract, you might also use the * operator instead of calling String::clone (to be independent on String and have the code work with any type that is Clone):

-    let to_be_sent = String::clone(rc);
+    let to_be_sent = (**rc).clone();

(Playground)

The first * dereferences the shared reference, the second * dereferences the Rc, thus giving the inner value (which then gets cloned instead of cloning the Rc).


Another variant that might be more verbose (and less confusing):

+use std::ops::Deref;
 
 fn sub_task(rc: &Rc<String>) -> JoinHandle<()> {
     // We clone the `String` manually:
-    let to_be_sent = String::clone(rc);
+    let to_be_sent = Rc::deref(rc).clone();

(Playground)

2 Likes

If you just create a bunch of them at the start of the program, leaking the strings is not a bad solution. &'static str is pretty easy to pass around.

2 Likes

In case anyone else has a wet fuse like myself, here's a complete example of how (I believe) this type is meant to be used: Rust Playground

Yes, that is the idea: you wrap it before sending and unwrap it on the other thread.

This is how I would do it with deref:

 use std::rc::Rc;
+use std::ops::Deref;
 
-// Invariant: This is the only Rc pointing to this memory.
-#[derive(Debug)]
-pub struct SendRc<T: ?Sized>(Rc<T>);
-
-// SAFETY: This Rc has no clones (invariant).
-unsafe impl<T: ?Sized + Send> Send for SendRc<T> {}
-// SAFETY: This Rc has no clones (invariant).
-unsafe impl<T: ?Sized + Sync> Sync for SendRc<T> {}
-
-impl<T: ?Sized + Clone> SendRc<T> {
-    pub fn new(mut rc: Rc<T>) -> Self {
-        // Make sure this Rc is unique.
-        Rc::make_mut(&mut rc);
-        Self(rc)
-    }
-}
-impl<T: ?Sized> SendRc<T> {
-    pub fn extract(self) -> Rc<T> {
-        self.0
-    }
-}
-
 #[tokio::main]
 async fn main() {
-    let (sender, receiver) = tokio::sync::oneshot::channel::<SendRc<String>>();
+    let (sender, receiver) = tokio::sync::oneshot::channel::<String>();
     sender
-        .send(SendRc::new(Rc::new("hello world".to_string())))
+        .send(Rc::deref(&Rc::new("hello world".to_string())).clone())
         .unwrap();
-    println!("{}", receiver.await.unwrap().extract().as_ref());
+    println!("{}", Rc::new(receiver.await.unwrap()).as_ref());
 }

(Playground)

It's kinda less code, and not sure if there's any disadvantage. (Unless of course you don't need the Rc in the originating thread, in which case moving it is cheaper than cloning.)

Whether you should wrap and send an Rc that has a count of 1 or whether you clone the String depends on whether you need clones of the Rc in the originating thread. If Rc::strong_count is greater than one, then SendRc would not work and you need to clone.

The problem with both of these solutions is that they require explicit action. A detail which I haven't mentioned is that actually my code isn't written in terms of Rc<String>, but in terms of

struct MyString {
    inner: Rc<String>
}

and I would have loved it if the Rc in this type was just an implementation detail, i.e. if the type would behave as if it was

struct MyString {
    inner: String
}

But it seems this is not possible.

In cases such as this one, use:

4 Likes

This performance cost is probably much, much less than you think.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.