Just how random is rand?

Are statistical randomness test attributes of various rand crate functions published somewhere?

I wrote the following little ditty in rust 1.27 and piped output into Mathematica's AutocorrelationTest[Flattened[data], 256, "TestStatisticTable"] and got disappointing "data is not random" LjungBox and BoxPierce statistics p0.04 across ten thousand or so flattened random_selection_gene(12,16) arrays. My mind exploded a little when I wondered if genetic algorithm seed sequences were excluded by invisible discrimination in the rand function. Happily, nothing obvious after a few generations of recombinations.

extern crate rand;
use rand::prelude::*;

    fn random_selection_gene(codon_length:u32, inclusive_codon_max_value:u32) -> Vec<u32> {
     let mut gene = Vec::new();
        for i in 0..codon_length {
            gene.push( rand::thread_rng().gen_range(0,inclusive_codon_max_value+1) );
        }
        gene
    }

You're using the ThreadRng, the properties of which are clearly documented: rand::rngs::ThreadRng - Rust . It's supposed to give you cryptographically-secure random data.

1 Like

Thanks!
I'm learning rust, and dumped the results into Mathematica to check my work. In no way am I any crypto expert: I don't know if the implementation or protocol needed help. I'm not thrilled with the speed and per LjungBox pseudorandom noise has underlying distinguishing features, but maybe they all do. The shape of my thought is that I just learned that an infinity of random typing rust monkeys will never ever write works of Shakespeare because they don't know what they would do without a job.

If you didn't, compile in release mode.

There are multiple structural things you can do that should improve performance:

  1. Don't repeatedly acquire the random number generator
  2. Don't repeatedly build the uniform range
  3. Don't repeatedly reallocate the Vec by pushing individually; allocate it all at once.

Additionally, try out rustfmt to use the accepted idiomatic Rust style.

extern crate itertools; // 0.7.8
extern crate rand; // 0.5.3

use rand::{distributions::Uniform, prelude::*};

fn random_selection_gene(codon_length: u32, inclusive_codon_max_value: u32) -> Vec<u32> {
    let mut rng = rand::thread_rng();
    let uni = Uniform::new_inclusive(0, inclusive_codon_max_value);

    itertools::repeat_call(|| uni.sample(&mut rng))
        .take(codon_length as usize)
        .collect()
}

fn main() {}

I'm using itertools::repeat_call because iter::repeat_with isn't stable.

2 Likes

You can make optimizations, like those suggested by shepmaster, but ThreadRng and Uniform are safe to use, especially for your uses.

ThreadRng is uses a cryptographically secure PRNG, and because of that certainly doesn't have statistical troubles (otherwise it would not be secure). And range sampling is unbiased.

1 Like

Rng::sample_iter can be an alternative (made for this).

Untested:

extern crate rand; // 0.5.3

use rand::{distributions::Uniform, prelude::*};

fn random_selection_gene(codon_length: u32, inclusive_codon_max_value: u32) -> Vec<u32> {
    let uni = Uniform::new_inclusive(0, inclusive_codon_max_value);

    thread_rng().sample_iter(&uni).take(codon_length as usize).collect()
}

fn main() {}
1 Like

Thanks! I'll give itertools a go.... and I'm back!

Your suggestion with Itertools gives a 2.75~3.5x speedup over a million random_selection_gene operations, very nice!

replacing the memory easy "cargo run" command with the finger shattering "cargo build --release" "cargo rustc --release -- -C target=native" "time ./target/release/gaplay" commands gave an inspiring 22-27x speed increase with itertools.

I'm trying to port a program I wrote to rust to learn rust. Maybe that is a good way to learn rust, but it is not a good way to grade rust. I retract my "not thrilled with speed"" comment- not because I'm now thrilled - rust is meeting expectations and should be easy to multi thread.. maybe.. I hope.. I'm not there yet.
Cheers,

2 Likes

You could also just use cargo run --release, which isn't quite so hard to remember. You can configure some properties of the release build using your Cargo.toml file, but I can't see any way to set codegen options.

1 Like

The first build --release is unnecessary.

These go in .cargo/config, docs are here.

1 Like

Back to "How random is random": By default rng seems to under generate sequences that contain embedded binary 86's. There's kind of a anti-86 texture in the pseudo random noise stream. I pushed 1.2 million random binaries in groups of 40 then stream tallied embedded 86's (01010110) in the 40 digits and counted 1/9th as many in stream 86's compared to the other 255 combos. Repeated twice looking for texture, same result.
I'm going to state that one could, in principle, use one pseudo random number generator to sell lottery tickets and a different generator to pick winners, both seemingly cryptographically qualified, and yet still have an invisible 9x or greater house advantage.

In no way does that mean there is any problem with rust's pseudo random generator: I'm just noting an interesting quirk in pseudo random number generation with no obvious computer science applications outside of one entertainment industry.

Ok, I've spent too much time drilling rng, I still need to learn how to dance in rust.

2 Likes

If this is true then it's a serious bug in rand and will be fixed as soon as possible. Please file an issue with steps to reproduce it. In my own tests I don't see this behavior at all. For example, I ran this program several times and it always produced a number close to 3900, as expected:

extern crate rand;
use rand::RngCore;

fn main() {
    let mut v = vec![0; 1_000_000];
    rand::thread_rng().fill_bytes(&mut v);
    println!("{}", v.iter().filter(|x| **x == 86).count());
}
7 Likes

@dustand please post the code you used.

I think I replicated your algorithm based on your description:

extern crate rand;
use rand::RngCore;

fn main() {
    let mut v = vec![0; 1_200_000/8];
    let mut counts = [0u64; 256];
    rand::thread_rng().fill_bytes(&mut v);
    for chunk in v.chunks(5) {
        let s = format!("{:08b}{:08b}{:08b}{:08b}{:08b}", chunk[0], chunk[1], chunk[2], chunk[3], chunk[4]);
        for window in s.as_bytes().windows(8) {
            let i = u8::from_str_radix(std::str::from_utf8(window).unwrap(), 2).unwrap();
            counts[i as usize] += 1;
        }
    }
    let mut counts = counts.iter().cloned().enumerate().collect::<Vec<_>>();
    counts.sort_by_key(|&(_i, n)| n);
    for (i, n) in counts {
        println!("{:3} Ă—{}", i, n);
    }
}

My outputs from this code don't seem to have any particular bias.

1 Like

I screwed it up- mia culpa. I just piped > my rust output to a text file and parsed the text file with embarrassingly non-rust code.
rust side:

fn main() {
let mut longy: Vec = Vec::new();
for j in 0..1500000 {
for i in 1u32..40 + 1
{
longy.push(i);
let mut x: bool = random();
if x { print!("1") } else { print!("8") }
}
print!(",");}
println("0");

non-rust mathematica notebook to tally:
a=Import[filepath, "csv"];a=a[[;;-2]];
b=ToString[#]&/@ a;;looker=Table[StringReplace[StringJoin[ToString[#]&/@ IntegerDigits[num,2,8]],"0"-> "8"],{num,0,255}];e=Table[b=ToString[#]&/@ a;c=Tally[StringMatchQ[#,StringJoin["",ToString[string],""]] &/@ b];out=c[[1,2]]/(0.+Length[b]),{string,looker}];

The anti-86 texture I saw in my output probably has more to do with a sequence like 11111111 being self similar and 50% more likely to pop up when the 86 (01010110) reading frame shifts one bit, 25% more likely after two bit shifts...

Somehow, yesterday that wasn't apparent to me, being that I was rather distracted by the shiny new idea (to me) of invisible pseudo random sequence bias.

5 Likes

I think it's okay :sparkling_heart: it's better to report a false positive than let a true positive simmer there :slight_smile:

3 Likes

I'm positive there is a market for invisible pseudo random sequence bias. Like the market for people taking ungraceful falls, that almost assures it does exist.