Safe (and efficient) thread local rng

Hello everyone,

I am trying to implement a safe and efficient thread local rng. The rand::thread_rng() not works for me, because I need a deterministic seedable RNG, so the results are reproducable. I am currently storing my RNG in a struct and pass as &mut through function calls. My goal is to store the RNG in a thread local static variable, so the function calls would be cleaner. (As don't have to pass references of the rng.)

I've looked at the implementation of rand::rngs::ThreadRng as well as the #968 issue, and the best solution so far seemst to be something like:

use rand_pcg::Pcg64 as MyRng;

thread_local!(
    static THREAD_RNG_KEY: Rc<RefCell<MyRng>> = { ... }
);

pub struct ThreadRng {
    rng: Rc<RefCell<MyRng>>,
}
pub fn thread_rng() -> ThreadRng {
    ThreadRng { rng: THREAD_RNG_KEY.with(|rng| rng.clone()) }
}
impl RngCore for ThreadRng {
    fn next_u32(&mut self) -> u32 {
        self.rng.borrow_mut().next_u32()
    }
}

For this the problem mentined in #968 is

ThreadRng destructors must be run or memory is leaked.

Unless a thread panics, this shouldn't cause problem, or is it? (In every other cases the destructors are guaranteed to run I think.) (In my case if a thread panics the whole program is terminated anyway... so that's not really a problem for me.)

So that I can safely store a ThreadRng in local variables and even in structs. (And myabe use even in the destructors of other thread local variables, although it is unlikely.) (Calling THREAD_RNG_KEY.with() for every number would be really painful.)

Is there any other safety issues related to this solution?

Safety is important, as neither I, nor the later developers of the code (will) have advanced programming skills.

Final solution:

Use UnsafeCell instead of RefCell (In a way, that unsafe code is only used in the implementation of RngCore.) Something like:

thread_local!(
    static THREAD_RNG_KEY: Rc<UnsafeCell<MyRng>> = { ... }
);

pub struct ThreadRng {
    rng: Rc<UnsafeCell<MyRng>>,
}

pub fn thread_rng() -> ThreadRng {
    ThreadRng { rng: THREAD_RNG_KEY.with(|rng| rng.clone()) }
}

impl RngCore for ThreadRng {
    fn next_u32(&mut self) -> u32 {unsafe{(*self.rng.get()).next_u32()}}
    fn next_u64(&mut self) -> u64 {unsafe{(*self.rng.get()).next_u64()}}
    fn fill_bytes(&mut self, slice: &mut [u8]) {unsafe{(*self.rng.get()).fill_bytes(slice)}}
    fn try_fill_bytes(&mut self, slice: &mut [u8]) -> std::result::Result<(), rand::Error> {unsafe{(*self.rng.get()).try_fill_bytes(slice)}}

}

(RefCell had about 50% overhead with Pcg64, according to my measurements.)

What you are doing is perfectly fine. By using a thread-local like that, the resources of the random generator will stay around until the thread is shut down, but that shouldn't be a problem. Defining a struct like you have done here is also perfectly fine.

The only thing that I don't know how well will work is calling the rng in the destructor of other thread-local statics. I would avoid this, but it can at worst cause your program to panic.

To be clear, no safety issues are possible since you are not using unsafe code.

2 Likes

@alice, thanks for your fast answer. Yeah, I guessed it would be fine, I just wasn't sure. (I haven't really worked with TLS so far.)

Is it still Ok, if I use UnsafeCell instead of RefCell? In a way, that unsafe code is only used in the implementation of RngCore. Something like:

thread_local!(
    static THREAD_RNG_KEY: Rc<UnsafeCell<MyRng>> = { ... }
);

pub struct ThreadRng {
    rng: Rc<UnsafeCell<MyRng>>,
}

pub fn thread_rng() -> ThreadRng {
    ThreadRng { rng: THREAD_RNG_KEY.with(|rng| rng.clone()) }
}

impl RngCore for ThreadRng {
    fn next_u32(&mut self) -> u32 {unsafe{(*self.rng.get()).next_u32()}}
    fn next_u64(&mut self) -> u64 {unsafe{(*self.rng.get()).next_u64()}}
    fn fill_bytes(&mut self, slice: &mut [u8]) {unsafe{(*self.rng.get()).fill_bytes(slice)}}
    fn try_fill_bytes(&mut self, slice: &mut [u8]) -> std::result::Result<(), rand::Error> {unsafe{(*self.rng.get()).try_fill_bytes(slice)}}

}

It seemed, that in this way the thread_rng is as efficient, as the native Pcg64 (after initialization with thread_rng()). While using RefCell, there is about 50% overhead.

1 Like

Yes, this use of UnsafeCell is correct. In a single-threaded context, it is typically correct to use UnsafeCell if the equivalent usage of RefCell would not panic.

Note that your methods could take &self.

1 Like

You mean use &self in RngCore? Like:

impl RngCore for ThreadRng {
    fn next_u32(&self) -> u32 {...}
}

That's not really true, as the methods in RngCore (have to) use &mut self. (Because the RngCore trait is defined that way.)

1 Like

Ah, I hadn't noticed that wasn't your own trait.

I would just pass the Rng around by &mut impl Rng . That way it is very clear in what order it is called – something you do need to look out for if it should be repeatable.

1 Like

It's parctically impossible to follow the RNG calls 1 by 1 anyway (e.g. there are places where the call depends on the previous number), so it's not important in my case. It's only important for me to get the same results with the same code (and RNG seed).
Plus if it is important to somebody to follow all calls (for some strange reasons), it might worth implementing an RNG call counter in my opinion.

Why use an Rc and unsafe code at all?

thread_local!(
    static THREAD_RNG: MyRng = { ... }
);

pub struct ThreadRng;

pub fn thread_rng() -> ThreadRng {
    ThreadRng
}

impl RngCore for ThreadRng {
    fn next_u32(&mut self) -> u32 {
        THREAD_RNG.with(|rng| rng.next_u32()}}
    ...
}

Simpler code, no unsafe, no heap storage, no reference counting.

1.) At least a RefCell is needed, as the RNG must be mutable. This code won't compile, as the thread local static is immutable.

2.) With RefCell it's about 50% slower compared to UnsafeCell with Pcg64.

3.) Using with() for every random number is about 75% slower (with Pcg64) compared to reference counting, and heap storage is not really a problem for me.

That's really discouraging. Do you have any idea why with is so slow? I thought the point of thread local storage was to be fast.

It's because with has to perform an extra check in case this is the first use and the variable has not yet been initialized.

2 Likes

There is a #[thread_local] attribute, which I think is more efficient than the thread_local! macro, as it " translates directly to the thread_local attribute in LLVM", but still unstable. (And I think it will stay unstable for a long time.)