[Solved] Index Out of Bounds


#1

Hi,
I implemented the stochastic universal sampling algorithm. One variant works with f32 and one with f64 and for some reason, I can provoke an index out of bounds panicks with the f32 variant, which is almost identical with the f64 variant.

pub fn random_choice_f32<'a, T>(samples: &'a [T], weights: &[f32], n: usize) -> Vec<&'a T> {
        if weights.len() == 0 || n == 0 {
            return Vec::new();
        }

        let sum: f32 = weights.iter().fold(0.0, |acc, &i| acc + i);
        let spoke_gap: f32 = sum / n as f32;

        // next_f32() ∈ [0.0, 1.0)
        let spin = thread_rng().next_f32() * spoke_gap;

        let mut i: usize = 0;
        let mut accumulated_weights = weights[0];
        let mut choices: Vec<&T> = Vec::with_capacity(n);
        let mut current_spoke: f32 = spin;

        println!("sum: {}, gap: {}", sum, spoke_gap);

        for j in 0..n {
            while accumulated_weights < current_spoke {
                println!("accumulated_weights: {}, current_spoke: {}, spoke number: {}, i: {}", accumulated_weights, current_spoke, j, i);
                i += 1;
                accumulated_weights += weights[i];
            }
            choices.push(&samples[i]);
            current_spoke += spoke_gap;
        }

        choices
    }

The f64 variant is absolute identical, except it uses f64.
Following test provokes an index out of bounds panick:

#[test]
    fn test_random_choice_f32() {
        let capacity: usize = 500;
        let mut samples: Vec<usize> = Vec::with_capacity(capacity);
        let mut weights: Vec<f32> = Vec::with_capacity(capacity);

        for i in 0..capacity {
            samples.push(i);
            weights.push(i as f32);
        }

        let number_choices = 10000;
        let choices = RandomChoice::random_choice_f32(&samples, &weights, number_choices);

        assert!(choices.len() == number_choices);

        let mut weight_counter = BTreeMap::new();

        for choice in choices {
            let counter = weight_counter.entry(choice).or_insert(0);
            *counter += 1;
        }

        let mut last_value: usize = 0;

        for (_, value) in &weight_counter {
            assert!((last_value as i32 - (*value) as i32).abs() <= 2);
            last_value = *value;
        }
    }

The f64 test never panicked and the pseudo code you find in wiki, looks almost identical.
Anyone any idea? You can find the code on github.

Thanks in advance!


#2

I’m assuming that the reduced precision of f32 is causing some index you are calculating to round up to one-past-the-array-end, and the f64 version would have the same issue but it’s triggered 2 ** (52 - 24) times less often.


#3

Actually I am just comparing floats with the less operator and incrementing the index counter with i += 1 which has the type usize.
If you uncomment the println!(), you’ll see the output with cargo test -- --nocapture.


#4

I noticed you didn’t say which line the crash was on. Do you know about the RUST_BACKTRACE environment variable?


#5

Sorry.

thread 'tests::test_random_choice_f32' panicked at 'index out of bounds: the len is 500 but the index is 500', src/lib.rs:102 stack backtrace: 1: 0x55725ede4dcf - std::sys::backtrace::tracing::imp::write::h29f5fdb9fc0a7395 2: 0x55725ede968b - std::panicking::default_hook::_{{closure}}::h2cc84f0378700526 3: 0x55725ede80ba - std::panicking::default_hook::hbbe7fa36a995aca0 4: 0x55725ede870c - std::panicking::rust_panic_with_hook::h105c3d42fcd2fb5e 5: 0x55725ede8561 - std::panicking::begin_panic::hbf62ea4a5ff3f9de 6: 0x55725ede848a - std::panicking::begin_panic_fmt::h20f5943904e5791d 7: 0x55725ede83fe - rust_begin_unwind 8: 0x55725ee22c4f - core::panicking::panic_fmt::h19323e466869c656 9: 0x55725ee22bf2 - core::panicking::panic_bounds_check::ha883fe1527ce6884 10: 0x55725ed88eb1 - random_choice::RandomChoice::random_choice_f32::ha72d9c2e24dd965d at /home/stefano/Dokumente/Programming/rust/Rust_Random_Choice/src/lib.rs:102 11: 0x55725ed9620a - lib::tests::test_random_choice_f32::hb8db11b896a5658c at /home/stefano/Dokumente/Programming/rust/Rust_Random_Choice/tests/lib.rs:52 12: 0x55725eda1326 - _<F as alloc..boxed..FnBox<A>>::call_box::h6c46b1f51a5f97cd 13: 0x55725edb597d - test::run_test::run_test_inner::_{{closure}}::_{{closure}}::h264d6ff770d7074c 14: 0x55725ed98e54 - std::panicking::try::call::h8d6a00e6c0ac8b9e 15: 0x55725edf0e9b - __rust_try 16: 0x55725edf0d7e - __rust_maybe_catch_panic 17: 0x55725edb370b - std::thread::Builder::spawn::_{{closure}}::h9e5d89388555e457 18: 0x55725eda10de - _<F as alloc..boxed..FnBox<A>>::call_box::h173484225f9723d2 19: 0x55725ede6a14 - std::sys::thread::Thread::new::thread_start::h8f3bd45211e9f5ea 20: 0x7f829f07a5c9 - start_thread 21: 0x7f829eb9ceac - __clone 22: 0x0 - <unknown> test tests::test_random_choice_f32 ... FAILED

            while accumulated_weights < current_spoke {
                i += 1;
                accumulated_weights += weights[i]; // <--- HERE
            }

#6

What if current_spoke is greater than sum? The division which produces spoke_up will round up half the time.


#7

Yes, that’s the problem, but if I have following condition:

while current_spoke <= sum {
    //...
    while accumulated_weights < current_spoke {
         i += 1;
         accumulated_weights += weights[i];
    }
    //...
}

Then the loop may exits one iteration too early. That is, when I demand n samples, I get n-1.
This only occurs with f32.


#8

I think the order should be this:

accumulated_weights += weights[i];
i += 1;

As written, you’re not using weights[0] in the loop.


#9

That won’t fix the crash in all cases though, because current_spoke == sum might be true. You could change it to while current_spoke < sum, which might work, but would still have the n-1 problem.

What I suggest instead is to treat samples which are just-past-the-end as if they were just-before-the-end, by modifying the inner loop condition:

while i < weights.len() && accumulated_weights < current_spoke {
    println!("accumulated_weights: {}, current_spoke: {}, spoke number: {}, i: {}", accumulated_weights, current_spoke, j, i);
    accumulated_weights += weights[i];
    i += 1;
}

#10

Side note: if your population has more than a few million individuals, doing this with all f32 variables will have enough roundoff error to substantially influence the result. If your population is that large, it’d probably be better to make all the local variables f64 and do weights[i] as f64 in all three places.


#11

Thanks. I had an old version where I used the modulo operator, so if the current spoke is greater than the sum, it gets smaller with the modulo operator, but the pseudo code of wikipedia has no such check and in my opinion the accumulated spoke can’t get bigger than sum, because of this:

let spoke_gap: f32 = sum / n as f32;

// next_f32() ∈ [0.0, 1.0)
let spin = thread_rng().next_f32() * spoke_gap;

So spin is smaller than spoke_gap and that’s why current_spoke should always be smaller than the sum.
If the last spoke has the position sum - spoke_gap and you add spin, it is still less then the sum, because spin is less than spoke_gap, right?
So, there must be some implementation error.


#12

The pseudo code on Wikipedia assumes infinitely accurate numbers. Your numbers are f32, which round off at 24 bits after the binary point. They can round in either direction, and if the random number is very close to one, the rounded result of current_spoke at the end of the loop could very well be greater than spin.

https://is.gd/bQArNn

fn main() {
    println!("{:.20}", 1./9. + 1./9. + 1./9. + 1./9. + 1./9. + 1./9. + 1./9. + 1./9. + 1./9.);
}

#13

Well, ok, then I take your fix with:

while i < weights.len() && accumulated_weights < current_spoke

Thank you, very much! :slight_smile: