I'm getting started on a project that will require me to use a lot of random numbers, so pulled up The Rust Rand Book and have started going through it. The examples mention using thread_rng()
as a way to be more efficient than random()
. Of course, I've tried to read up on these functions, but am confused by the use of the word "thread" and what it is referring to. I keep seeing terms like thread_local
and references to a trait named ThreadRng
named similar to the function above. Could someone explain what is being referred to by these terms? Thanks. Since I'm still basically a Rust novice, should I even bother with these functions/concepts?
"Thread" in the name here means that the returned PRNG is thread-local. That's usually what you want, because this means that it can avoid locking (which would be required by a truly global PRNG, as multi-threading is possible).
I don't think that random()
is less efficient than thread_rng()
, because it simply forwards to thread_rng().gen()
. Such trivial calls are almost always inlined and optimized out by modern compilers, so you usually don't need to bother about them.
A random number generator use an internal state to generate the next number.
To be able to generate number from anywhere in your code, this state need to be acceded from anywhere.
In Rust to share a value across thread this value must be Sync
and mechanical like Mutex
are required to ensure the value is acceded by one thread at once.
thread_rng()
using thread_local!
will creates a new ThreadRng
for each thread.
Since each thread have it's own ThreadRng
there no need for synchronization mecanical.
Storing the ThreadRnd
from thread_rng()
is more efficient than random()
in case you need to generate multiple values.
Here a playground generating 100 000 values using both methods:
with thread_rng: 477.564µs
with random: 655.909µs
(To be accurate we would have to make a real benchmark, but this demonstrate that thead_rnd
is faster when you want a lot of values)
A thread means that your program executes several functions in parallel. For example:
use std::thread;
use std::time::Duration;
fn main() {
let mut x = 0;
let mut y = 0;
thread::scope(|s| {
println!("Starting thread 1");
s.spawn(|| {
println!("Thread 1 is running");
for _ in 0..100 {
x += 1;
thread::sleep(Duration::from_millis(1));
}
println!("Thread 1 is done");
});
println!("Starting thread 2");
s.spawn(|| {
println!("Thread 2 is running");
for _ in 0..100 {
// x += 1; // we can't mutably access `x` from two threads!
y += 1;
thread::sleep(Duration::from_millis(1));
}
println!("Thread 2 is done");
});
});
println!("{x}, {y}");
}
Output:
Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 2 is done
Thread 1 is done
100, 100
The two loops counting to 100
are executed at the same time, either on multiple processor cores or by time-slicing / preemption.
Only one thread may mutably access the same variable at the same time, unless you care for proper synchronization, e.g. like this:
use std::thread;
use std::time::Duration;
use std::sync::{Arc, Mutex};
fn main() {
let x = Arc::new(Mutex::new(0));
thread::scope(|s| {
println!("Starting thread 1");
let x1 = x.clone();
s.spawn(move || {
println!("Thread 1 is running");
for _ in 0..100 {
let mut x_guard = x1.lock().unwrap();
*x_guard += 1;
drop(x_guard);
thread::sleep(Duration::from_millis(1));
}
println!("Thread 1 is done");
});
println!("Starting thread 2");
let x2 = x.clone();
s.spawn(move || {
println!("Thread 2 is running");
for _ in 0..100 {
let mut x_guard = x2.lock().unwrap();
*x_guard += 1;
drop(x_guard);
thread::sleep(Duration::from_millis(1));
}
println!("Thread 2 is done");
});
});
let x = Arc::try_unwrap(x).unwrap().into_inner().unwrap();
println!("{x}");
}
(Playground)
(edit: the Arc
isn't necessary, as the threads are scoped, see also simplified Playground)
Output:
Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 1 is done
Thread 2 is done
200
In that last example, each thread lock
s a mutex to ensure that not both threads access the variable at the same time. This creates overhead.
To minimize overhead, rand::Rng::gen
requires exclusive access (works on &mut self
), which means that it does not perform synchronization. That means, if you have two threads, each thread will either need to synchronize accesses to a common Rng
, or you need two Rng
s, one in each thread. Synchronization generally involves overhead, so you try to avoid it if possible or reasonable.
The rand::thread_rng
function helps you to obtain an Rng
which you can use in one thread (or your whole program if you don't use threading). The returned ThreadRng
is tied to the thread, i.e. you cannot pass it to a different thread.
In practice, you don't need to worry much about that. The general rule is: If you need several random values, obtain a ThreadRng
first by calling rand::thread_rng
. Then use that ThreadRng
to create multiple values, e.g. by using the Rng::gen
method repeatedly (on the same ThreadRng
). You can re-obtain a ThreadRng
any time (and you will have to do that if you use threading because you cannot pass the ThreadRng
to a different thread), but you can try to keep the number of calls to thread_rng
low. If you call thread_rng
more often, it's not suuuuch a big issue though. The overhead isn't that big.
Thanks for the answers. That helped a lot. I kind of figured we were talking about using multiple threads. That is way beyond where I'm at in my programming skills and really won't be needed for my project where random number generation will be occasional, not constant. There won't be a need for using parallel processing at all. So, I think I'll stick with random()
for now. :>)
Even if you don't use threads, it can still make sense to call thread_rng
to get the Rng
, and then use that Rng
to generate your numbers. It will be slightly faster, as @tguichaoua showed.
It's as easy as this:
use rand::{Rng, thread_rng}; // note we need the `Rng` trait
fn main() {
let mut rng = thread_rng();
let a: f64 = rng.gen();
let b: f64 = rng.gen();
println!("{a}, {b}");
}
One thing that didn't get answered for me is the meaning of the term thread_local
. Can someone clear that up for me? Thanks.
@jbe You put a lot of effort into your answer. Thanks!
A thread local variable is similar to a global variable, i.e. you can access it from any function in your program without having to pass a reference as an argument to the function.
The difference between a global variable and a thread-local variable is that the global variable is shared by any part of your program, whatever thread it runs on.
A thread-local variable, instead, will have an own value for each thread. That way, you cannot have collisions when using the variable.
thread_rng
uses a thread-local variable internally, so the returned Rng
(ThreadRng
) will use that thread-local variable internally to not require synchronization overhead. It's all hidden though, and as a user of the rand
API, you don't need to know about it.
You might still be fundamentally confused about something (but I'm not sure what). As I (and the very page you linked to) explained, random()
and thread_rng().gen()
are exactly the same.
Also, you can call both from however many threads you want to (including only 1). Whether you choose one or the other does not depend on whether you are using multiple threads.
Thanks, @H2CO3. I did get what you said about random()
and thread_rng().gen()
being exactly the same. So, I figured that I'd just use random()
to simplify things. At least for starters. You're also right that I'm still "fundamentally confused". No doubt about it. I get the feeling that things won't really start clearing up until I start using the language to start building code. It's like learning math. We math teachers love to focus on teaching the concepts that underly the skills we are teaching, but far too often we get it in the wrong order. Sometimes, even usually, it is best to teach the mechanical skills and then set our students to using those skills. After they've been at it for a while, the concepts either just pop into their heads independently of us teachers or simply become much, much easier to teach. So, while I'm going to keep working through my learning resources, I'm also going to jump in, with both feet, to my project and see how it all comes together.
Thanks, that helps. :>)
But it's worth noting that you should use thread_rng
(once) when you need a batch of random values (e.g. 100000). If you call random()
100000 times, it means you have 99999 extra (unnecessary) calls of thread_rng()
internally, right?
Yes or no. It all depends on the optimizations. Since the implementation is trivial and short, I can easily imagine the compiler inlining random()
and then removing the extraneous calls.
Also, thread_rng()
simply clones an Rc, which is literally a single, non-atomic increment — likely to be insigificant compared to all the math that goes into generating a random number in the first place. A quick benchmark confirms this.
It also accesses a lazy initialized thread local, which has its own overhead depending on the implementation
Compare with the other benchmark, which shows a greater (relative) difference.
That benchmark is at least weird. It puts the black_box
around the whole expression, which means that it's basically completely useless.
Why? It simulates consuming/using the whole collect
ed Vec
, I think?
But that's only the value of the whole expression, after all of the evaluation has already happened. The actual calls to the functions under scrutiny aren't protected in any way.
I thought the outer black_box
ensures that the functions in the map
closure get evaluated. I don't see a way for the compiler to optimize this by removing the function calls (because the compiler doesn't know how/if the results are used). That's why I don't see a problem with that benchmark.