How can I share a struct between variables?

Hi All,

I have some components (A, B) as struct. Over the period of time, we may add some additional components (C, D, ..). So I want the rust tool to be flexible for this changes.

I have a for_each function which process all the S struct as shown in the example [1]. It is a time consuming function to update S and hence, I want to use rayon. I thought of, adding the components’ S as a mutable references to a collection and modify them in a function. Such that, changes in the S collection should reflect in components internal structs. I am not going to access the S collection and Components at the same time, so there should not be any data racing. Also S is not shared between the components or structs.

I created an example in [1]. I looked into ‘rc’ and ‘RefCell’ which might support my use case but it doesn’t support Rayon. Any suggestion on this with reduced computation time?

[1] Rust Playground

The thread-safe equivalent of Rc<RefCell<T>> is Arc<Mutex<T>>, if that helps.

Thanks. Will there be any performance issue when using Arc<Mutex>?

Does it matter if there's no other (obvious) option?

(Btw, Arc is atomic, so it incurs the cost of an atomic increment and decrement. That shouldn't be much in general, though.)

hmm. Thanks. Is it possible to do the clone() of Arc<Mutex<T>> similar to T.clone(). Because, in my code I will be doing both operations. Arc::clone() for simultaniously storing value in 2 different var and T.clone() for creating a copy of T and store in a different var.

I'm not sure what you are asking. Arc is an atomic reference counted smart pointer. Hence it's cloneable, but if you dereference it, you can clone the underlying value if that is cloneable.

If p has type Arc<T>, then you can write:

  • Arc::clone(&p) to clone just the pointer (returns a new Arc pointing to the same T)
  • T::clone(&p) to clone the value (returns a new T)

You can also write these as:

  • p.clone() to clone the pointer
  • (*p).clone() to clone the value
1 Like

Thanks @mbrubeck - One quick question - I came across pin and do you think it will be useful in this scenario?

Background: My each components' struct has different structure. I have a function which update a S struct. I would like to call this function on all S inside a component. Hence, I am collecting all S in each struct to a variable and processing this variable. This way I don't have to know what is the structure of the component struct is and I can blindly call the function on my S struct from the collection.

No, it does not sound like Pin would be useful here.

Thanks. I believe Vec<Arc<Mutex<T>>> is slower than the Vec<T> while iterating using rayon. Is it possible? The items in the Vec are mutually independant.

An example of the code is in Rust Playground

Yes, a Mutex is not free, but it is necessary if you want to mutate something you only have shared access to.

Thanks @alice . What I see is, if I add or don't add below line, the performance is same. I believe multi-threading is not happening. Am I right?

rayon::ThreadPoolBuilder::new().num_threads(1).build_global().unwrap();

If switching between 1 and n threads doesn't make a difference, there's a good chance that one of the following occurred:

  1. Your use of Mutex is effectively causing everything to happen sequentially because a Mutex makes sure only one thread can access something at a time,
  2. That operation was never a bottleneck, so adding more parallelism doesn't help performance (see Amdahl's law)
  3. the operation is so fast that you don't notice the speed increase from parallelism, or
  4. Rayon decided your input is small enough that it's faster to do all the work on one thread instead of incurring the overhead of sending different chunks of the input to different threads and then working out how to merge the results at the end.

Looking at the code linked to on the playground, I'm guessing it's the 4th option. If you've only got a couple inputs and the operation is fast anyway, adding more parallelism can actually slow things down because you've got to coordinate the different threads.

Something to keep in mind is that multithreading isn't free. You'll mainly use it when there are a large number of operations/items, or each operation is slow and can be done independently of the others (imagine hashing the contents of a bunch of files).

1 Like

Thanks @Michael-F-Bryan - My inputs are around 2000 items in the Vec. I gave a minimal code here with 2 items. It takes 15 mins to process the entire Vec

Not knowing the rayon, just looking at this:

Aren't you explicitly opting out of parallelism by running the code on one-thread pool?

upd: sorry, looked through the code - seems that I misinterpreted the question.

@Cerber-Ursi - I tried with and without num_thread to see if it makes changes in my computation time.

I am using Mutex to share the same value between 2 variables. The reason is, I have lot of different Parent structs and each struct's deep down has one to many S struct. In my code, then I have to go to each parent struct and update the value of S struct. So, I decided to have a Vec on all my Parent to contain S pointer and therefore, I don't have to drill down to update S

In my code I do

let all = vec![];
all.append(&mut parent1.all);
all.append(&mut parent2.all);
....

Then do par_iter_mut() on the all

When I changed the par_iter_mut() to par_iter(), it is faster :slight_smile:

It almost sounds like you're walking a tree (or many trees so... a forest?). Instead of explicitly using par_iter(), what about using the more general rayon::scope() to run a closure which will recursively walk the tree, using Scope::spawn() to do the next level of recursion (possibly) in parallel.

As long as updating an S can be done independently of neighbouring S's, you should be able to take mutable references to each field instead of needing any form of synchronisation (Mutex).

It feels like you're trying to fight your architecture here.... Is it possible to restructure your data in a way that is more amenable to parallelism?

Usually Arc<Mutex<T>> hurts parallelism when you want to frequently access the T because it's a Mutex's purpose to make sure only one thread of execution can run at a time.

I'm guessing the update_a_var_in_s() from your example isn't the real code? Because 2000 items isn't that many, a computer could chew through hundreds of thousands of increment operations in a single second.

If the individual operations are slow, you'll probably get better performance gains by using smarter algorithms (e.g. O(n log n) instead of O(n^2)), not collecting into temporary Vec's, and doing it on a single thread so you can get mutable access without the cost of synchronisation.

In my day job I work with a lot of moderately sized data and a pretty tight latency budget (operations need to complete within a couple hundred milliseconds or the user gets frustrated) and have found that algorithmic improvements can give orders of magnitude better performance than doing the naive thing in parallel... Of course, depending on your task there may not be any smarter algorithms available for you (e.g. updating every item in an array will always require touching every item in the array).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.