Fastest way to bulk generate random numbers within a range?

I'm trying to generate a list of random ints/floats, within a certain range, as fast as possible, eg `generate_x_ints(amount: u32, range: [u32; 2]) -> Vec' (doesn't necessarily have to be a vec). It generates a list with a length of 'amount' filled with numbers inclusively between the min/max of the range given.

My requirements are that it should be uniformly random, and very fast. It doesnt need to be ' secure'. I don't want to involve C or assembly, and it needs to run on linux/windows/mac. I am totally okay with using unsafe for this, given it's a very isolated/disconnected part of my application, and it will mostly just be used in nodejs. I'm using it for a game, and in particular I care about these quantities: 10, 1000, 1,000,000, 10,000,000 and 100,000,000. I'm happy to use more ram for speed improvements.

It's very easy to iterate from 0..max and add a random number to the list, but i'm looking for better optimised ways for my use case. Because i know exactly how many I want, and exactly what range i want, and im not generating them sporadically on demand, im sure it can be optimised for that.

What I want help generally with is suggestions for how to best optimise for my use case for maximum speed. Possibilities i've explored are: SIMD, "fill", splitting random numbers into multiple random numbers (eg if i need 100 numbers between 1 and 5, generating a single u32 can maybe be wasteful if im only using it for that?), and finally threading (which id rather apply if necessary after it's been optimised).

A more specific question I have is how can I use the fill method in rand (or other alternatives) to generate a bunch of random numbers at once, but that are in a given range - without screwing with the uniformity and such.

Here is my benchmarking code: GitHub - gc/rust-rng-benchmarking

The fastest computation is always no computation, i.e. prepare data in advance and read from a lookup table at runtime.

So one possibility is to generate (say) 200 million random u32s on startup and pick a slice of the desired length starting from a random index every time your generate_x_ints function is called.

Pros:

  • Only need to generate one random number (the start index) per function call.
  • Different random slices can share the same underlying data, which could save a substantial amount of memory if you have several large arrays.
  • Uniformity of the underlying distribution should be preserved, assuming the initial data is also uniformly distributed.

Cons:

  • Permanently takes up ~800MB of RAM.
  • Not secure, since you're potentially reusing the same data.
  • Need to think about how the range of generated values can be maintained. If the range size is a power of 2 (e.g. 0 to 15 inclusive), you could lop off the most significant or least significant bits of each number, otherwise it's a bit tricker, but certainly possible.
2 Likes

That seems like a good approach - particularly if the ranges map cleanly on to a whole number of bits (1-5 does not), e.g.:
1-2: 1 bit
1-4: 2 bits
1-8: 3 bits

If it is a 64-bit architecture, generate u64s. Be picky about the prng - since they vary widely in terms of speed and randomness. mt19937_64 might be a good choice.

Not necessarily. Accessing external memory on modern processors can consume hundreds of times more cycles than working with data in cache or registers. Accessing a huge array at random indices is going to ensure the worst cache hit rate, basically slow memory access all the time. I would not be surprised to see that calculating the next number in a simple PRNG would be quicker than going out to RAM and finding it.

As always, whatever one does needs careful benchmarking.

5 Likes

That's true. I was assuming that the generated numbers would be stored in memory anyway for later use, but if that isn't the case, then a generator whose state can be stored very compactly would be a good choice. I've previously used quad-rand in my projects and found it quite good, but haven't benchmarked its performance (speed wasn't a big concern for me).

Honestly, the fastest method might actually be something like:

use rand; // 0.8.5

use rand::distributions::{Distribution, Uniform};

fn gen_x_ints(
    count: u32,
    distribution: impl Distribution<u32>,
    rng: &mut (impl rand::Rng + ?Sized),
) -> Vec<u32> {
    let mut results = Vec::with_capacity(count as usize);
    let iter = distribution.sample_iter(rng);
    for (_, sample) in (0..count).zip(iter) {
        results.push(sample);
    }
    results
}

For what in a game?

If you just need "fast and not obviously correlated", for something like particle effects, you can generate quasi-random multi-dimensional sequences using The Unreasonable Effectiveness of Quasirandom Sequences - Extreme Learning trivially quickly.

(Generally I find that the more you need the less they needs to be quality random, and the fewer you need the more you might want a deck-randomness style anyway because the law of large numbers won't apply.)

2 Likes