Is there fine grained mutex in Rust?

Hi, I have 3 threads which might access a Vec simultaneously. But only 1 thread writes the vector and the other 2 only read it and I can guarantee that the read & write won't touch the same element in vector any time. Also the vector won't grow -- I will set it's capacity when creating it.
So the question is, under this circumstances, is there a way to not use Mutex between these 3 threads? Or is there any fine grained Mutex in Rust like ReadMutex/WriteMutex? I mean, for now I found that the only way to access objects between threads is via Arc<Mutex>, but as long as Mutex is involved, I'll have to lock it before getting my object.

2 Likes

There is std::sync::RwLock, which is a fine grained Read/Write mutex. Honestly, it does not have all the features one could be interested in. In that case, give a look the the parking_lot crate, which has its own RwLock. This, for instance, supports upgreadable locks, feature that can be extremely handy.

6 Likes

Thanks dodomorandi. std::sync::RwLock looks good to me. Let me try that first.
BTW, is there any method to totally avoid using Mutex/Lock? I mean, in C/C++, this is pretty simple -- I declare a global array or vector, then all threads can reference it freely. But I can't do that in Rust due to the borrow checker/lifetime.

You can consider putting your data inside a RwLock, rather than the entire Vec. If you know the fixed size at compile time, then you don't even need a Vec potentially.

If you wanted a global, you can use the lazy_static crate. Here's what this can look like, assuming a fixed size array:

#[macro_use]
extern crate lazy_static;
extern crate parking_lot;

use parking_lot::RwLock;

lazy_static! {
    static ref DATA: [RwLock<String>; 2] =
        [RwLock::new("a".to_string()), RwLock::new("b".to_string())];
}

I'm not sure what you meant by the C/C++ example - you can surely reference the global, but where would the synchronization come from?

If your writer/updater scenario is amenable to "snapshot" style of workflow (i.e. readers get the "current" view of an object, and writer replaces them occasionally), you can take a look at the https://crates.io/crates/arc-swap crate.

Finally, perhaps there's a design to be made that doesn't involve shared memory synchronization at all; maybe there's a way to, e.g., do message passing between these threads rather than sharing memory.

2 Likes

First question: what kind of writing operation are you doing? I mean, concurrently writing data to a fixed buffer is a thing (which can always lead to data races using C++ or unsafe Rust), concurrently using, for instance, Vec::push/std::vector::push_back is a good way to shoot in your feet :smile:

If you are in the second scenario, use a Mutex in C++. The fact that you are free to do so does not mean that you are not causing UB. In Rust, as soon as you don't use unsafe, you need to be Sync, therefore you won't cause any trouble to you data.

The first scenario is a bit more complex, because it is all about what you are doing. Take the following:

extern crate crossbeam; // 0.4.1
extern crate crossbeam_utils; // 0.5.0

fn main() {
    let mut v: Vec<_> = (0..30).collect();

    crossbeam_utils::thread::scope(|scope| {
        let v_len = v.len();
        const N_THREADS: usize = 4;
        let threads: Vec<_> = v
            .chunks_mut(v_len / N_THREADS)
            .map(|chunk| {
                scope.spawn(move || {
                    chunk.iter_mut().for_each(|v| *v = *v * 2 + 1);
                })
            }).collect();

        threads
            .into_iter()
            .for_each(|thread| thread.join().unwrap());
    });

    println!("{:?}", v);
}

Here I splitted a Vec in mutable chunks, and I separately work on them from different threads (I need to use crossbeam threads because they are scoped and this informs the compiler they cannot outlive v). However, if you want to work on the same data from different threads, you need at least atomic values (which are still a bit expensive to use if you use sequentially consistent memory ordering) if you are working with trivial data types. If you need more complex data, you need some kind of way to work on it without data races (here comes the Mutex).

@vitalyd already told you about the possibility of using static data, and a general advice is: avoid it if you can, it will ease your ilfe. Use an Arc and clone it to the different threads, it will be simply easier to do (obviously, when you can).

Always remember that if C++ let you do that, it does not mean you are doing it correctly. Rust saves you from shooting on your own feet, and it brings you exploring other patterns and synchronization primitives (see channels).

2 Likes

First is to performance profile. If you have acceptable performance than no reason to change to a more performant way. Could also consider spitting data between multiple vecs, or take other approaches such as multiple copies.

I'm writing this assuming that at different points in time the same content that is read is also written to in other thread.

Multi threaded programming is more complex than this. No general computer software/processor can give such timing guarantee.

You have likely just created broken (data race) code if not enforcing access rules.

The performant method is using atomics. Would describe it as opposite of fine grained rather than fine. The first step would be decide on ownership of the vector. Arc might be enough or you might want to go quicker and keep it in the base thread, in which case you need to take into account what happens if a panic occurs. You also likely to need to use UnsafeCell.

The atomics then come into enforcing access rules, essentially controlling switching between the two states of; multiples readers and exclusive read/writer. To be performant you use Ordering::Release and Ordering::Acquire. Writing the code is too much for generality on forum. Requires use of unsafe, more brittle and far more likely to have bugs since there is no static analysis.

Thanks guys so let me describe what I'm doing. Basically I have 3 threads, 1 thread append element to a vector then notify other 2 threads start working(via mpsc::channel), that's it. So basically the write thread always go before read threads, and a large vector makes sure there is no overflow issues -- i.e, when overflow happens, write thread starts writting element #1 in vector again, at that time, we can assume read threads have finished their works long time ago. Besides, I'm using vector here while not an array is because I found the elements must implement Copy trait if you want to put it in array.

So the only problem for me is, how to make these 3 threads reference this vector -- and according to some doc reading, I found Arc but I'm not happy with that I need to use lock because I think my scenario is lock free -- that's why I said in C/C++, I can use a global array or a singleton class to wrap the vector so that all threads can access it.

So perhaps a stupid question: why not submit the element (i.e. "work") directly to the mpsc::channel? Why is it added to a Vec? Are the reader threads just reading the element, or do they need to mutate anything? If the element is large and you don't want to copy/clone it across 2 separate channels, you can put it into an Arc (without a Mutex), and then send clones of the Arc over the 2 channels. This assumes the readers don't need to mutate the element.

To handle backpressure (i.e. one or both of the readers' channels are full, assuming you're using bounded channels), you can have the writer thread buffer the element (or pick some other backpressure handling approach) in a local Vec, and then resend it later.

I might be missing something from your scenario, but I can't yet tell that you definitely need a mutex (or any other synchronization mechanism) for your own data.

6 Likes

Oh yeah sending the data via channel sounds great! Yes the data is large that's why I don't want to send it to channel, what I don't know is I can use Arc to send reference of the data. Thanks vitalyd!

2 Likes

Yes, but, you will have undefined behavior and data-races left and right. C++ allows you to put the gun in your mouth and pull the trigger repeatedly. Many times you'll get lucky and there won't be a bullet in the chamber. But, eventually, ..... kerplooey!

3 Likes

@super119, I see you're asking for help with different sub-problems, in parallel in two topics. (The other one sort of highjacked even :wink:)

To prevent "XY problems", where we explain the correct solutions to the wrong problem, may I suggest you describe what you are trying to do more generally?
So far I've pieced together "processing multiple big data structs with one producer and N workers".

What kind of data? What do you want to do with it? Where does it need to go?
If you can give us a better understanding, we can help you come up with the best/easiest overall strategy.

1 Like

FWIW, this sort of example would be much easier with rayon. You could use par_chunks_mut if you really want that control, or just: v.par_iter_mut().for_each(|v| *v = *v * 2 + 1);

2 Likes

Definitely! For this simple cases using multithreading can be overkilling as well, because it takes more time to initialize them than performing the operations.

In this case my idea was to demonstrate a lockless way of handling multithreading in Rust. Your message is right, and must be clear to people reading: don't use code like this in production code :smile:

1 Like

If your situation could make use of a copy-on-write model then check out Arc::make_mut.

1 Like

So finally I can reply now. This morning I was told that I have reached the maximum reply number as a new member of the forum, and I have to wait 12 hours... anyway,

What kind of data? What do you want to do with it? Where does it need to go?

I'm working on a DNS "relay" program. It accepts DNS requests from clients, take a peek of the question in it, then dispatch the DNS packet to different DNS servers. Speaking of "data", it's a struct which I created, contains DNS raw packet, DNS question infos and some other supplementary data members.

So now I'm using crate "dns_parser" to parse the DNS packet, after that's done, I will send the DNS question to another thread to check which DNS server should this packet goes to(since the "checking" is time consuming) then sends it, finally, a third thread receives the DNS response from DNS servers, according to DNS request ID, it's able to get the client IP address(I will have a vector saving all pending DNS request infos) then sends the response back to client.

That's it. It's pretty simple logically, originally I was writing all of these in Python, now I'm trying to convert it to Rust, but you know, I hit lots of problems so still struggling on it. :smiley:

1 Like

Oh, DNS! Always such lovely complexity hiding under that deceptively simple protocol! :smile:

In that case you should definitely be aware of tRust-DNS:

http://trust-dns.org/

@bluejekyll keeps us updated on his progress here:


From your description, you can probably completely avoid using Arc, because once one thread is done with the data, it doesn't need it at all anymore. Just give the entire thing away (put the struct in the channel, not Arc)


I think we have different definitions of "large structs". I would expect a DNS request and a bit of metadata to be well under a kilobyte. I wouldn't be surprised if that copy-overhead is negligible.. and remember, If you pass it via channels, the compiler will probably optimise it to move just the Rust-Ownership, not the physical bits in RAM.

Similarly, I wouldn't be surprised if all the thread-communication overhead is slower than just doing a "dumb" worker threadpool, where each request is handled start-to-finish by a single thread, at least until you start doing the more outrageous optimizations.
Maybe asynchronous IO/futures are also worth considering.

As always with performance: measure measure measure! :smile:

2 Likes

I'm going to respond just to the DNS stuff. What you've described could probably be done through some work with the tools in the trust-dns-server crate, and trust-dns-client if you were interested in using these. I will say, those components aren't built specifically to deal with the situation you're describing, i.e. it might be a pain to link the server to a client as a relay easily right now. In addition, the interior functions are all built around asynio with tokio. It's not the simplest set of structures to necessarily understand, as there are a lot of queues being dealt with in different contexts. It might be a little complex to jump right into. As @juleskers mentions, it would probably be easiest to use threads and the synchronous client, and you can look at the server implementation for how to use some of the trust-dns-proto library features to accept in-bound connections.

I do have plans to incorporate the trust-dns-resolver into the server as a forwarding and resolving agent, which would be much closer to what you're doing, but I haven't had the opportunity yet to focus on that.

1 Like

Big thanks guys! I don't want to discuss DNS stuffs here since there are a lot of details in it. :slight_smile: for example, DNS request is able to carry multiple questions, and it's possible that these questions might need to go to different DNS servers according to my requirement... anyway, we can create a new thread for DNS discussions.

But one thing I think it's worth to try -- use threads and synchronous client. This will be pretty simple and easy to implement and in addition, I don't have to save DNS request infos in case we receive response later -- just get the request, decide the DNS server, pick up a worker thread and use synchronous trust-dns client to solve it! I think I'll try it.

Thanks again guys, appreciate it.

2 Likes