Just started learning Rust. How do I modify a vector from multiple threads?

Hello,

Complete Rust newbie here. Just started today.

I really like the idea that there can only be one mutable reference. But it’s not clear to me how to implement, for example, a simple TCP server, counting words coming in from hundreds of concurrent connections.

Let’s say I have a Vec<String> with the list of words. Each TCP request should add a new word to that list. How can I do that?

Thanks a lot.

The simplest is to wrap the Vec<String> as Arc<Mutex<Vec<String>>>, and then share clones of that Arc across the threads. This is a simple playground example of this approach. If you want to stick to the familiar lock/mutex approach, take a look also at the parking_lot crate as it offers more efficient implementations of the various synchronization primitives than the std library.

Since you’re new, I hesitate to overwhelm you with additional information. But, you could also implement a single-threaded TCP server that does something similar using tokio - then you could replace this with Rc<RefCell<Vec<String>>, which is an analogous construct for single-threaded scenarios.

Finally, you can think about different designs in Rust than using locks/mutexes for shared memory. A popular alternative is using channels, and sharing messages across them - there’s a single owner of some state, and the various actors get access to that state via messages.

5 Likes

Thanks a lot for the detailed response and the playground example, Vitaly.

One last question. If I forget to use locks and allow data races, will Rust compiler detect this and complain? I tried to do that in your example, but my current Rust knowledge is not enough.

So that’s the beauty of Rust - unless you use unsafe code (or someone else’s buggy unsafe code), you won’t run into data races (you can, however, deadlock - that’s possible and considered memory safe, if not unfortunate). One nice thing about the Mutex is that it encapsulates the data inside its type - the only way to get access to it is to lock() the mutex. This is different from pretty much every other language’s constructs/libs where the mutex and the data it protects are two (or more) different fields in some place, leaving room for forgetting to lock (or unlock!) properly.

1 Like

That’s beautiful. Thanks a lot!

Ah, so while heaping praises, let me caution you on one thing that I see pretty often with Mutex usage - overextending the lock duration due to when/how Rust drops values in scope. So, be mindful of code like this:

let guard = mutex.lock().unwrap();
let protected_value = &mut *guard; // take a &mut borrow of the value
protected_value.call_a_mut_method();
// now we don't need the lock anymore
// .. more code continues here, possibly blocking or doing lengthy computation or putting the thread to sleep
// or whatever
// finally, the lexical scope comes to an end - only here is the lock released

To prevent this, you can put the code in an additional scope:

// some code here
{
    let guard = mutex.lock().unwrap();
    let protected_value = &mut *guard; // take a &mut borrow of the value
    protected_value.call_a_mut_method();
}  // this scope drops the guard, unlocking the mutex
// continue here

or manually call drop(guard) as soon as you’re done.

4 Likes

The important traits (think interfaces/abstract base classes if you haven’t seen them yet) here are Send, and Sync. A type is Send if you can transfer ownership to another thread, and Sync if you can share immutable references (or, T is Sync if &T is Send). The compiler will never allow you to share mutable references (&mut T).

Most things are Send and Sync. The only things that aren’t are those that lie about their immutability (i.e. they have methods that claim not to change the datatype, but actually do, things like Rc, which has a reference count that is incremented when it is Cloned).

Therefore, unless you tell it not to (by using something not Send or Sync), the compiler will automatically mark all your data structures as both Send and Sync. People writing new abstractions may manually declare that their types are Safe or Sync even though their contents are not, but the compiler can’t check this so you have to use the unsafe keyword to indicate to the compiler that it should trust you.

See https://doc.rust-lang.org/beta/nomicon/send-and-sync.html

1 Like

Also, check out rayon if you want to something iterator-y with your vec - that will automatically parallelize the computation.

4 Likes

You may be interested in the Concurrency chapter of The Rust Programming Language book :slight_smile:

2 Likes

I just realized that it would be easier to just send the words to a channel, and process them (append to the vector) in one thread. This way no locking is required.

Is this a better solution in terms of performance and style?

As I understand, channels are implemented without any locks.

It will definitely be a style improvement, because you'll separate the "concerns" of connection handling (on a pool of threads, the senders) and actual counting (the receiving end).

Performance can only be judged with benchmarks. Especially for such tiny workloads such as simple counting, the fixed overhead costs of a strategy can really matter for the end result.

I'm not familiar with the channel implementations, so I can't give you concrete advice there, but there must be some form of locking internally. However, there is a LOT you can do to avoid, amortise and reduce it with clever optimisations, so they'll definitely be faster than naive locking implementations on the entire data structure.

Thanks.

Regarding channel implementation:

https://doc.rust-lang.org/src/std/sync/mpsc/mod.rs.html#128-276

// The choice of implementation of all channels is to be built on lock-free data
// structures. The channels themselves are then consequently also lock-free data
// structures. As always with lock-free code, this is a very "here be dragons"
// territory, especially because I'm unaware of any academic papers that have
// gone into great length about channels of these flavors.

But yeah this is premature optimization in this case.