How does thread_rng() in the rand crate help me?

preacherdad · July 10, 2023, 9:25pm

I'm getting started on a project that will require me to use a lot of random numbers, so pulled up The Rust Rand Book and have started going through it. The examples mention using thread_rng() as a way to be more efficient than random(). Of course, I've tried to read up on these functions, but am confused by the use of the word "thread" and what it is referring to. I keep seeing terms like thread_local and references to a trait named ThreadRng named similar to the function above. Could someone explain what is being referred to by these terms? Thanks. Since I'm still basically a Rust novice, should I even bother with these functions/concepts?

H2CO3 · July 10, 2023, 9:39pm

"Thread" in the name here means that the returned PRNG is thread-local. That's usually what you want, because this means that it can avoid locking (which would be required by a truly global PRNG, as multi-threading is possible).

I don't think that random() is less efficient than thread_rng(), because it simply forwards to thread_rng().gen(). Such trivial calls are almost always inlined and optimized out by modern compilers, so you usually don't need to bother about them.

tguichaoua · July 10, 2023, 9:42pm

A random number generator use an internal state to generate the next number.
To be able to generate number from anywhere in your code, this state need to be acceded from anywhere.
In Rust to share a value across thread this value must be Sync and mechanical like Mutex are required to ensure the value is acceded by one thread at once.
thread_rng() using thread_local! will creates a new ThreadRng for each thread.
Since each thread have it's own ThreadRng there no need for synchronization mecanical.

Storing the ThreadRnd from thread_rng() is more efficient than random() in case you need to generate multiple values.

Here a playground generating 100 000 values using both methods:

with thread_rng: 477.564µs
with random: 655.909µs

(To be accurate we would have to make a real benchmark, but this demonstrate that thead_rnd is faster when you want a lot of values)

jbe · July 10, 2023, 10:49pm

A thread means that your program executes several functions in parallel. For example:

use std::thread;
use std::time::Duration;

fn main() {
    let mut x = 0;
    let mut y = 0;
    thread::scope(|s| {
        println!("Starting thread 1");
        s.spawn(|| {
            println!("Thread 1 is running");
            for _ in 0..100 {
                x += 1;
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 1 is done");
        });
        println!("Starting thread 2");
        s.spawn(|| {
            println!("Thread 2 is running");
            for _ in 0..100 {
                // x += 1; // we can't mutably access `x` from two threads!
                y += 1;
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 2 is done");
        });
    });
    println!("{x}, {y}");
}

(Playground)

Output:

Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 2 is done
Thread 1 is done
100, 100

The two loops counting to 100 are executed at the same time, either on multiple processor cores or by time-slicing / preemption.

Only one thread may mutably access the same variable at the same time, unless you care for proper synchronization, e.g. like this:

use std::thread;
use std::time::Duration;
use std::sync::{Arc, Mutex};

fn main() {
    let x = Arc::new(Mutex::new(0));
    thread::scope(|s| {
        println!("Starting thread 1");
        let x1 = x.clone();
        s.spawn(move || {
            println!("Thread 1 is running");
            for _ in 0..100 {
                let mut x_guard = x1.lock().unwrap();
                *x_guard += 1;
                drop(x_guard);
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 1 is done");
        });
        println!("Starting thread 2");
        let x2 = x.clone();
        s.spawn(move || {
            println!("Thread 2 is running");
            for _ in 0..100 {
                let mut x_guard = x2.lock().unwrap();
                *x_guard += 1;
                drop(x_guard);
                thread::sleep(Duration::from_millis(1));
            }
            println!("Thread 2 is done");
        });
    });
    let x = Arc::try_unwrap(x).unwrap().into_inner().unwrap();
    println!("{x}");
}

(Playground)
(edit: the Arc isn't necessary, as the threads are scoped, see also simplified Playground)

Output:

Starting thread 1
Starting thread 2
Thread 2 is running
Thread 1 is running
Thread 1 is done
Thread 2 is done
200

In that last example, each thread locks a mutex to ensure that not both threads access the variable at the same time. This creates overhead.

To minimize overhead, rand::Rng::gen requires exclusive access (works on &mut self), which means that it does not perform synchronization. That means, if you have two threads, each thread will either need to synchronize accesses to a common Rng, or you need two Rngs, one in each thread. Synchronization generally involves overhead, so you try to avoid it if possible or reasonable.

The rand::thread_rng function helps you to obtain an Rng which you can use in one thread (or your whole program if you don't use threading). The returned ThreadRng is tied to the thread, i.e. you cannot pass it to a different thread.

In practice, you don't need to worry much about that. The general rule is: If you need several random values, obtain a ThreadRng first by calling rand::thread_rng. Then use that ThreadRng to create multiple values, e.g. by using the Rng::gen method repeatedly (on the same ThreadRng). You can re-obtain a ThreadRng any time (and you will have to do that if you use threading because you cannot pass the ThreadRng to a different thread), but you can try to keep the number of calls to thread_rng low. If you call thread_rng more often, it's not suuuuch a big issue though. The overhead isn't that big.

preacherdad · July 10, 2023, 11:33pm

Thanks for the answers. That helped a lot. I kind of figured we were talking about using multiple threads. That is way beyond where I'm at in my programming skills and really won't be needed for my project where random number generation will be occasional, not constant. There won't be a need for using parallel processing at all. So, I think I'll stick with random() for now. :>)

jbe · July 10, 2023, 11:38pm

Even if you don't use threads, it can still make sense to call thread_rng to get the Rng, and then use that Rng to generate your numbers. It will be slightly faster, as @tguichaoua showed.

It's as easy as this:

use rand::{Rng, thread_rng}; // note we need the `Rng` trait

fn main() {
    let mut rng = thread_rng();
    let a: f64 = rng.gen();
    let b: f64 = rng.gen();
    println!("{a}, {b}");
}

(Playground)

preacherdad · July 10, 2023, 11:40pm

One thing that didn't get answered for me is the meaning of the term thread_local. Can someone clear that up for me? Thanks.

preacherdad · July 10, 2023, 11:40pm

@jbe You put a lot of effort into your answer. Thanks!

jbe · July 10, 2023, 11:43pm

A thread local variable is similar to a global variable, i.e. you can access it from any function in your program without having to pass a reference as an argument to the function.

The difference between a global variable and a thread-local variable is that the global variable is shared by any part of your program, whatever thread it runs on.

A thread-local variable, instead, will have an own value for each thread. That way, you cannot have collisions when using the variable.

thread_rng uses a thread-local variable internally, so the returned Rng (ThreadRng) will use that thread-local variable internally to not require synchronization overhead. It's all hidden though, and as a user of the rand API, you don't need to know about it.

H2CO3 · July 10, 2023, 11:55pm

You might still be fundamentally confused about something (but I'm not sure what). As I (and the very page you linked to) explained, random() and thread_rng().gen() are exactly the same.

Also, you can call both from however many threads you want to (including only 1). Whether you choose one or the other does not depend on whether you are using multiple threads.

preacherdad · July 11, 2023, 4:44pm

Thanks, @H2CO3. I did get what you said about random() and thread_rng().gen() being exactly the same. So, I figured that I'd just use random() to simplify things. At least for starters. You're also right that I'm still "fundamentally confused". No doubt about it. I get the feeling that things won't really start clearing up until I start using the language to start building code. It's like learning math. We math teachers love to focus on teaching the concepts that underly the skills we are teaching, but far too often we get it in the wrong order. Sometimes, even usually, it is best to teach the mechanical skills and then set our students to using those skills. After they've been at it for a while, the concepts either just pop into their heads independently of us teachers or simply become much, much easier to teach. So, while I'm going to keep working through my learning resources, I'm also going to jump in, with both feet, to my project and see how it all comes together.

preacherdad · July 11, 2023, 4:46pm

Thanks, that helps. :>)

jbe · July 11, 2023, 9:32pm

But it's worth noting that you should use thread_rng (once) when you need a batch of random values (e.g. 100000). If you call random() 100000 times, it means you have 99999 extra (unnecessary) calls of thread_rng() internally, right?

H2CO3 · July 12, 2023, 6:13am

Yes or no. It all depends on the optimizations. Since the implementation is trivial and short, I can easily imagine the compiler inlining random() and then removing the extraneous calls.

Also, thread_rng() simply clones an Rc, which is literally a single, non-atomic increment — likely to be insigificant compared to all the math that goes into generating a random number in the first place. A quick benchmark confirms this.

SkiFire13 · July 12, 2023, 6:24am

It also accesses a lazy initialized thread local, which has its own overhead depending on the implementation

jbe · July 12, 2023, 7:25am

Compare with the other benchmark, which shows a greater (relative) difference.

H2CO3 · July 12, 2023, 7:29am

That benchmark is at least weird. It puts the black_box around the whole expression, which means that it's basically completely useless.

jbe · July 12, 2023, 7:31am

Why? It simulates consuming/using the whole collected Vec, I think?

H2CO3 · July 12, 2023, 7:34am

But that's only the value of the whole expression, after all of the evaluation has already happened. The actual calls to the functions under scrutiny aren't protected in any way.

jbe · July 12, 2023, 7:36am

I thought the outer black_box ensures that the functions in the map closure get evaluated. I don't see a way for the compiler to optimize this by removing the function calls (because the compiler doesn't know how/if the results are used). That's why I don't see a problem with that benchmark.

Topic		Replies	Views
Rand::thread_rng() separate copies help	6	161	January 13, 2025
Safe (and efficient) thread local rng help	13	2241	June 6, 2021
The best way to genearate random values in multiple places in the code (rand crate) code review	3	1695	June 22, 2022
Questions about rand crate's thread rng and data race	3	450	August 21, 2023
Rand::thread_rng , rng.gen() // returning same values	3	1680	January 12, 2023

How does thread_rng() in the rand crate help me?

Related topics