How to send a value that is expensive to clone to multiple threads?

I am working on a project where I take a a markdown string, convert it to HTML, and send it to all active websocket connections on a server. The string could be very large, so I want to avoid cloning it. Instead, I save the string in a Arc<RwLock<Option<String>> and each thread has a copy of that Arc. When I get a new String, I notify the channels that there is a new string and they read from the RwLock.

In psudocode:

fn send(&mut self, md: String) {
   *self.html.write().unwrap() = convert_to_html(s);

   for ws_worker in self.ws_workers {
     ws_worker.send(()).unwrap();
   }
}

// in the workers...

for _ in html_notifier.recv() {
  let html = self.html.read().unwrap();
  send_ws_message(&html);
}

However, I'm currently debugging an intermittent test failure where the reader threads are seeing old values.

bus.send("string1");

assert_eq(reader.read_message(), "old string");

bus.send("string2");

assert_eq!(reader.read_message(), "new string");

The second assertion is failing with "old string" != "new string".

Does RwLock not guarantee that the writes will be visible from all readers once the write lock has been dropped? I usually would just send the string itself through the channels but I don't want to copy it once for each worker.

If you'd like to read the actual code, here's the implementation of send: https://github.com/euclio/aurelius/blob/overhaul/src/lib.rs#L161-L191

EDIT: Interestingly, I'm not able to reproduce the failure with --test-threads=1. Maybe I'm hitting a race condition inside TcpListener::bind in my tests?

1 Like

Without some more details about how your test is implemented (e.g. what is reader? Are there other threads involved?) I can't answer your specific question, but I have an architectural thought.

It looks like you're dropping and reallocating the String each time. (Or more specifically, your renderer is allocating a String for you, and you're sticking it into the RwLock, dropping the old one.)

I'd suggest sending an Arc<String> through each channel instead, removing the RwLock. That'll remove any concerns about state synchronization, including missed updates if the clients are slow. The String will be dropped when all threads have gotten the next update, but every thread would be internally consistent without racing the contents of the RwLock vs. the channel.

You should send the actual string instead of mutating a global one. Keep in mind that "sending" is only an abstract concept. It doesn't actually copy the string, it makes the same string visible in another part of the program.

My guess is that you get nonsense, because send does not wait, so your program performs: write, write, write, read, read, read, instead of write+read, write+read, write+read.

Don't I have to clone since I'm sending to multiple channels, though?

The way your code is arranged, you will need to clone something, but you don't need to clone the String.

The threads you're sending to only inspect the string. This means they can share it. The easiest way to achieve that is to put the String inside a reference-counted container like an Arc<String>. You need to clone the Arc each time you want to send it to a thread, but this operation is cheap -- it just adjusts the reference count.

Cloning the string, on the other hand, involves allocating and copying memory -- potentially a lot of it, depending on the size of your inputs.

Did my previous post about removing the RwLock make sense?