[Solved] Efficient rust random shuffle of?

  1. I have a Vec. I want to randomly shuffle it in place. Google suggests: iterator - How do I create a Vec from a range and shuffle it? - Stack Overflow
extern crate rand;

use rand::{thread_rng, Rng};

fn main() {
    let mut vec: Vec<u32> = (0..10).collect();
    let slice: &mut [u32] = &mut vec;

    thread_rng().shuffle(slice);
}
  1. My question is: I'm doing this on a Vec with 300_000_000 entries. Is the above the best way to do it, or is there a more efficient way?

It doesn't look too good, I can't even allocate 300 MB of data on the playground to test the timing... :disappointed_relieved: (Playground)

2 Likes

Rand uses Fisher-Yates shuffle, and that's probably the most efficient way of doing it.

Note that such proper shuffling needs to pick elements from random places in the slice, and that is by definition unpredictable, so it'll cause tons of cache misses.

If you don't need true randomness, then randomizing in chunks closer to cache sizes might give slightly faster results.

Rand also has a few pseudorandom algorithms to choose from, but here you're probably more bottlenecked on memory than CPU.

2 Likes

I see. Unless elems are the size of a cache line, this pretty much guarantees it'll be slow.

Making things cache-line-sized won't make it any faster.

If your input is a sequence of integers that you can describe as a function f(x), then you could randomize the function without moving anything in memory, e.g. f(x ^ y).

1 Like

I am not sure if we are measuring "slowdown" in the same way. I am measuring "slowdown" in terms of:

time_for_shuffle / time_for_memcopy

So I agree with you that if I am shuffling a bunch of u32, then I am wasting a lot of memory bandwidth as we read a large chunk of memory, but then only extract a single u32 from it.

However, by the above measurement, it seems that as sizeof(Elem) goes to infinity, the slowdown factor should appraoch 1.

If they're much larger than u32, then you should shuffle a vec of 0 .. 300_000_000u32 first, then use them as usize to index the larger elements.

(all assuming, as before, that you need a 'perfect' shuffle and some chunked approximation isn't good enough)

If you can impl Hash for the objects nicely enough that collisions are infeasible or rare enough not to matter, you could

  • sort the Vec by the hash value (which you can compute rather than store)
  • just store them in a HashMap rather than a Vec on initial input, and read them out in hash order (if they don't need to be in a Vec for other reasons)

Rust's HashMap uses a strong, salted hash by default, so you'll get a different "shuffle" every time.

This is sort of what I was thinking of.

I think you can use coprime generators of the group of integers modulo n. You generate the numbers statefully based on a random seed, then n iterative applications of the generator until you've filled a vector with the values.

https://math.stackexchange.com/questions/1154286/proof-of-a-generator-for-coprime-integers

At least, this is a very literal interpretation of the example code. Where you're shuffling a range.

1 Like

oh! huge if true!

1 Like

This website will give you primes greater than your table size. Enter the table size in the Number field and hit "search". The first four primes after 3e8 are 300000007, 300000031, 300000047, 300000089.

Any prime larger than your table size can serve as the group generator. Simply iterate and discard all values larger than the table size. Of course the values aren't truly random; they're completely predictable once the generating prime and starting value are known. But they may be adequate for your purpose.

1 Like

Also, depending on how you're consuming this vector, if you return at iterator over that generator instead of the vector, you might never need to allocate it at all.

1 Like

If it isn't apparent, you probably should use rand() to choose a starting value (modulo the table size) for each shuffle. If you want sequences that do not recur from one run to the next once they have generated the same value, then select the generator prime for each run incrementally from the list.