How does thread_rng() in the rand crate help me?

I'm getting started on a project that will require me to use a lot of random numbers, so pulled up The Rust Rand Book and have started going through it. The examples mention using thread_rng() as a way to be more efficient than random(). Of course, I've tried to read up on these functions, but am confused by the use of the word "thread" and what it is referring to. I keep seeing terms like thread_local and references to a trait named ThreadRng named similar to the function above. Could someone explain what is being referred to by these terms? Thanks. Since I'm still basically a Rust novice, should I even bother with these functions/concepts?

1 Like

"Thread" in the name here means that the returned PRNG is thread-local. That's usually what you want, because this means that it can avoid locking (which would be required by a truly global PRNG, as multi-threading is possible).

I don't think that random() is less efficient than thread_rng(), because it simply forwards to thread_rng().gen(). Such trivial calls are almost always inlined and optimized out by modern compilers, so you usually don't need to bother about them.

1 Like

A random number generator use an internal state to generate the next number.
To be able to generate number from anywhere in your code, this state need to be acceded from anywhere.
In Rust to share a value across thread this value must be Sync and mechanical like Mutex are required to ensure the value is acceded by one thread at once.
thread_rng() using thread_local! will creates a new ThreadRng for each thread.
Since each thread have it's own ThreadRng there no need for synchronization mecanical.


Storing the ThreadRnd from thread_rng() is more efficient than random() in case you need to generate multiple values.

Here a playground generating 100 000 values using both methods:

with thread_rng: 477.564µs
with random: 655.909µs

(To be accurate we would have to make a real benchmark, but this demonstrate that thead_rnd is faster when you want a lot of values)

6 Likes

A thread means that your program executes several functions in parallel. For example:

use std::thread;
use std::time::Duration;

fn main() {
    let mut x = 0;
    let mut y = 0;
    thread::scope(|s| {
        println!("Starting thread 1");
        s.spawn(|| {
            println!("Thread 1 is running");
            for _ in 0..100 {
                x += 1;
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 1 is done");
        });
        println!("Starting thread 2");
        s.spawn(|| {
            println!("Thread 2 is running");
            for _ in 0..100 {
                // x += 1; // we can't mutably access `x` from two threads!
                y += 1;
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 2 is done");
        });
    });
    println!("{x}, {y}");
}

(Playground)

Output:

Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 2 is done
Thread 1 is done
100, 100

The two loops counting to 100 are executed at the same time, either on multiple processor cores or by time-slicing / preemption.

Only one thread may mutably access the same variable at the same time, unless you care for proper synchronization, e.g. like this:

use std::thread;
use std::time::Duration;
use std::sync::{Arc, Mutex};

fn main() {
    let x = Arc::new(Mutex::new(0));
    thread::scope(|s| {
        println!("Starting thread 1");
        let x1 = x.clone();
        s.spawn(move || {
            println!("Thread 1 is running");
            for _ in 0..100 {
                let mut x_guard = x1.lock().unwrap();
                *x_guard += 1;
                drop(x_guard);
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 1 is done");
        });
        println!("Starting thread 2");
        let x2 = x.clone();
        s.spawn(move || {
            println!("Thread 2 is running");
            for _ in 0..100 {
                let mut x_guard = x2.lock().unwrap();
                *x_guard += 1;
                drop(x_guard);
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 2 is done");
        });
    });
    let x = Arc::try_unwrap(x).unwrap().into_inner().unwrap();
    println!("{x}");
}

(Playground)
(edit: the Arc isn't necessary, as the threads are scoped, see also simplified Playground)

Output:

Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 1 is done
Thread 2 is done
200

In that last example, each thread locks a mutex to ensure that not both threads access the variable at the same time. This creates overhead.


To minimize overhead, rand::Rng::gen requires exclusive access (works on &mut self), which means that it does not perform synchronization. That means, if you have two threads, each thread will either need to synchronize accesses to a common Rng, or you need two Rngs, one in each thread. Synchronization generally involves overhead, so you try to avoid it if possible or reasonable.

The rand::thread_rng function helps you to obtain an Rng which you can use in one thread (or your whole program if you don't use threading). The returned ThreadRng is tied to the thread, i.e. you cannot pass it to a different thread.


In practice, you don't need to worry much about that. The general rule is: If you need several random values, obtain a ThreadRng first by calling rand::thread_rng. Then use that ThreadRng to create multiple values, e.g. by using the Rng::gen method repeatedly (on the same ThreadRng). You can re-obtain a ThreadRng any time (and you will have to do that if you use threading because you cannot pass the ThreadRng to a different thread), but you can try to keep the number of calls to thread_rng low. If you call thread_rng more often, it's not suuuuch a big issue though. The overhead isn't that big.

4 Likes

Thanks for the answers. That helped a lot. I kind of figured we were talking about using multiple threads. That is way beyond where I'm at in my programming skills and really won't be needed for my project where random number generation will be occasional, not constant. There won't be a need for using parallel processing at all. So, I think I'll stick with random() for now. :>)

Even if you don't use threads, it can still make sense to call thread_rng to get the Rng, and then use that Rng to generate your numbers. It will be slightly faster, as @tguichaoua showed.


It's as easy as this:

use rand::{Rng, thread_rng}; // note we need the `Rng` trait

fn main() {
    let mut rng = thread_rng();
    let a: f64 = rng.gen();
    let b: f64 = rng.gen();
    println!("{a}, {b}");
}

(Playground)

One thing that didn't get answered for me is the meaning of the term thread_local. Can someone clear that up for me? Thanks.

@jbe You put a lot of effort into your answer. Thanks! :grinning:

A thread local variable is similar to a global variable, i.e. you can access it from any function in your program without having to pass a reference as an argument to the function.

The difference between a global variable and a thread-local variable is that the global variable is shared by any part of your program, whatever thread it runs on.

A thread-local variable, instead, will have an own value for each thread. That way, you cannot have collisions when using the variable.

thread_rng uses a thread-local variable internally, so the returned Rng (ThreadRng) will use that thread-local variable internally to not require synchronization overhead. It's all hidden though, and as a user of the rand API, you don't need to know about it.

1 Like

You might still be fundamentally confused about something (but I'm not sure what). As I (and the very page you linked to) explained, random() and thread_rng().gen() are exactly the same.

Also, you can call both from however many threads you want to (including only 1). Whether you choose one or the other does not depend on whether you are using multiple threads.

Thanks, @H2CO3. I did get what you said about random() and thread_rng().gen() being exactly the same. So, I figured that I'd just use random() to simplify things. At least for starters. You're also right that I'm still "fundamentally confused". No doubt about it. I get the feeling that things won't really start clearing up until I start using the language to start building code. It's like learning math. We math teachers love to focus on teaching the concepts that underly the skills we are teaching, but far too often we get it in the wrong order. Sometimes, even usually, it is best to teach the mechanical skills and then set our students to using those skills. After they've been at it for a while, the concepts either just pop into their heads independently of us teachers or simply become much, much easier to teach. So, while I'm going to keep working through my learning resources, I'm also going to jump in, with both feet, to my project and see how it all comes together.

Thanks, that helps. :>)

But it's worth noting that you should use thread_rng (once) when you need a batch of random values (e.g. 100000). If you call random() 100000 times, it means you have 99999 extra (unnecessary) calls of thread_rng() internally, right?

Yes or no. It all depends on the optimizations. Since the implementation is trivial and short, I can easily imagine the compiler inlining random() and then removing the extraneous calls.

Also, thread_rng() simply clones an Rc, which is literally a single, non-atomic increment — likely to be insigificant compared to all the math that goes into generating a random number in the first place. A quick benchmark confirms this.

It also accesses a lazy initialized thread local, which has its own overhead depending on the implementation

2 Likes

Compare with the other benchmark, which shows a greater (relative) difference.

That benchmark is at least weird. It puts the black_box around the whole expression, which means that it's basically completely useless.

Why? It simulates consuming/using the whole collected Vec, I think?

But that's only the value of the whole expression, after all of the evaluation has already happened. The actual calls to the functions under scrutiny aren't protected in any way.

I thought the outer black_box ensures that the functions in the map closure get evaluated. I don't see a way for the compiler to optimize this by removing the function calls (because the compiler doesn't know how/if the results are used). That's why I don't see a problem with that benchmark.