Population sim, with exponentially bad performance

Hello, i am incredibly new to rust and have begun rewriting my population sim from python to rust. My problem is that my Vector: PEOPLE is exponentially becoming slower, the more people are added to the vec. My Goal is to have such code efficiency that i will be able to simulate over 500million people. Should i use arrays instead or is this just awful code structure? Help will be much appreciated :slight_smile:

use std::str;
use std::time::Instant;
use rand::Rng;
use std::mem::size_of_val;
// use plotters;

// Person data struct
#[derive(Debug)]
pub struct Person {
    id: u64,
    name: &'static str,
    gender: u8,
    age: u32,
    love_vec: Vec<i64>,
}

fn main() {
    static mut POPULATION: u64 = 0;
    static mut PEOPLE: Vec<Person> = Vec::new();

    pub fn create_person() -> Person {
        unsafe { POPULATION += 1 };
        let temp_person: Person = Person {
            id: unsafe { POPULATION },
            name: "John",
            gender: 0,
            age: 0,
            love_vec: vec![-1, 100],
        };

        return temp_person;
    }

    pub fn update_sim(mut steps: i32) -> i32 {
        let people_temp = unsafe { &mut PEOPLE };

        for id in 0..unsafe { PEOPLE.len() as usize } {
            // Ages all people by 1 month
            // println!("{:?}", people_temp);
            people_temp[id].age += 1;

            if people_temp[id].love_vec[0] == -1 {
                // Creates a random number to chose a lover for person
                let lover = rand::thread_rng().gen_range(0..=(unsafe { PEOPLE.len() } - 1)) as i64;

                // If the person is not the lover and if the person does not have a lover one is given
                if lover != id as i64 && people_temp[id].love_vec[0] == -1 {
                    people_temp[id].love_vec[0] = lover;
                    steps += 1;
                }
                steps += 1;
            }


            if people_temp[id].love_vec[1] as i32 != -1 {
                let baby_chance = rand::thread_rng().gen_range(0..100) as u32;
                if baby_chance < 2 {
                    // Creates a baby!!!
                    let people_temp = unsafe { &mut PEOPLE };
                    let john: Person = create_person();
                    people_temp.push(john);
                    steps += 1;
                }
            }

            if people_temp[id].age > 12 * 30 {
                unsafe { PEOPLE.remove(id); }
            }
            steps += 1;
        }
        steps
    }

    pub fn print_people() {
        println!("\n**~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~**");
        for id in 0..unsafe { PEOPLE.len() } {
            println!("------------------------------------------");
            unsafe {
                println!("[ID: {:?}]\n\
                  Name: {:?}\n\
                  Age: {:?}\n\
                  Gender: {:?}\n\
                  Lover: {:?}", PEOPLE[id].id, PEOPLE[id].name, PEOPLE[id].age,
                         PEOPLE[id].gender, PEOPLE[id].love_vec)
            }
        }
    }

    let start = Instant::now();

    let people_temp = unsafe { &mut PEOPLE };

    let john: Person = create_person();
    people_temp.push(john);

    let john2: Person = create_person();
    people_temp.push(john2);

    print_people();
    let mut steps = 0;

    for _ in 0..12 * 100 {
        steps = update_sim(steps);
    }

    let duration = start.elapsed();

    println!("\nPeople: {:?} | Steps: {}", people_temp.len(), steps);
    println!("The memory size of POPULATION is {}", size_of_val(unsafe { &*PEOPLE }));

    // Time took to complete code
    println!("Time taken to calculate: {:?}", duration);
    // print_people();
}

Are you using release profile? If no, consider building with cargo build -r.

I’d recommend to avoid mutable statics.

2 Likes

Yes using release helps, but after there are even more Person structs in the people vec then the same problem with bad performance occurs. How should i write the code without the mut statics?

Taking exponential time to run this sim should be expected, as you are modeling a population who's growth should at resemble logistic growth, which does exhibit exponential behavior for the first half of the curve.

That being said, your code is liable to do anything because it exhibits undefined behavior, mutable references aren't allowed to alias, and your use of mutable statics causes that to happen repeatedly. A rewrite to avoid that might look like this:

use rand::Rng;
use std::str;
use std::time::Instant;
// use plotters;

// Person data struct
#[derive(Debug)]
pub struct Person {
    id: u64,
    name: &'static str,
    gender: u8,
    age: u32,
    love_vec: Vec<i64>,
}
struct Sim {
    population: u64,
    people: Vec<Person>,
}

impl Sim {
    pub fn create_person(&mut self) -> Person {
        self.population += 1;
        let temp_person: Person = Person {
            id: self.population,
            name: "John",
            gender: 0,
            age: 0,
            love_vec: vec![-1, 100],
        };

        temp_person
    }
    pub fn update_sim(&mut self, mut steps: i32) -> i32 {
        for id in 0..self.people.len() {
            // Ages all people by 1 month
            // println!("{:?}", people_temp);
            self.people[id].age += 1;

            if self.people[id].love_vec[0] == -1 {
                // Creates a random number to chose a lover for person
                let lover = rand::thread_rng().gen_range(0..=self.people.len()) as i64;

                // If the person is not the lover and if the person does not have a lover one is given
                if lover != id as i64 && self.people[id].love_vec[0] == -1 {
                    self.people[id].love_vec[0] = lover;
                    steps += 1;
                }
                steps += 1;
            }

            if self.people[id].love_vec[1] as i32 != -1 {
                let baby_chance = rand::thread_rng().gen_range(0..100) as u32;
                if baby_chance < 2 {
                    // Creates a baby!!!
                    let john: Person = self.create_person();
                    self.people.push(john);
                    steps += 1;
                }
            }

            if self.people[id].age > 12 * 30 {
                self.people.remove(id);
            }
            steps += 1;
        }
        steps
    }
    pub fn print_people(&self) {
        println!("\n**~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~**");
        for id in 0..self.people.len() {
            println!("------------------------------------------");
            println!(
                "[ID: {:?}]\n\
                  Name: {:?}\n\
                  Age: {:?}\n\
                  Gender: {:?}\n\
                  Lover: {:?}",
                self.people[id].id,
                self.people[id].name,
                self.people[id].age,
                self.people[id].gender,
                self.people[id].love_vec
            )
        }
    }
}
fn main() {
    let mut sim = Sim {
        people: vec![],
        population: 0,
    };

    let start = Instant::now();

    let john: Person = sim.create_person();

    let john2: Person = sim.create_person();
    sim.people.push(john);
    sim.people.push(john2);
    sim.print_people();
    let mut steps = 0;

    for _ in 0..12 * 10 {
        steps = sim.update_sim(steps);
    }

    let duration = start.elapsed();

    println!("\nPeople: {:?} | Steps: {}", sim.people.len(), steps);
    println!(
        "The memory size of POPULATION is {}",
        sim.people.len() * std::mem::size_of::<Person>()
    );

    // Time took to complete code
    println!("Time taken to calculate: {:?}", duration);
    // print_people();
}

playground
The thing I do here is use a struct to contain the global state, rather than actually use global state. This way I can write things entirely with safe code, and the borrow checker can ensure sound code.

Also, the memory calculation currently doesn't take into account any memory allocated by the strings, just so you're aware.

4 Likes

Thanks a lot, i will take this into account when doing more. Still thinking about if arrays will be faster, as you can do changes on all structs at the same time instead of iterating through

I think, it can’t be exactly the same if you’re using release :face_with_monocle:

Well, you think like a pythonist. Try a little bit more functional approach. I don’t really understand your code logic so I cannot help you in this case. All I can tell you is:

  1. Avoid mutability until you need something like Vec or HashMap or use interior mutability (core::cell - Rust);
  2. Avoid statics until you need some kind of low-level or multithreaded code;
  3. Don’t write
struct Foo {…}

static mut FOOs: Vec<Foo> = Vec::new();

fn create_foo() -> Foo {…}
fn update_bar(_: Bar) -> Bar {…}
fn print_baz() {…}

but

struct Foo {…}

impl Foo {
    pub fn new() -> Self {…}
}

struct Foos {
    foos: Vec<Foo>,
    pub population: usize
}

impl Foos {
    pub fn new() -> Self {
        Self {
            foos: Vec::new(),
            population: 0
        }
    }

    pub fn inc(&mut self) {
        self.population += 1;
        self.foos.push(Foo::new());
    }

    pub fn update(&mut self, steps: i32) -> i32 {…}

    pub fn print(&self) {…}
}

UPD: Looks like I’m late

Thanks yeah, its not the same with release but the exact problem occurs like in debug but with larger population numbers

Arrays aren't usable in this context, unlike NumPy arrays rust arrays are fixed size and on the stack. Rust's Vec<T> is roughly equivalent to a one-dimensional NumPy array.

2 Likes

Ah that's sad. So no there is no way to increase the vec performance?

Because with millions of structs, performance will be absolutely awful

In this context, it is highly unlikely there will be a better data structure than a Vec.

What might help, however, is rewriting things in terms of Iterators, doing so tends to result in faster and more idiomatic code. Algorithmic improvements (like not removing elements from the middle of a vec in the hot loop) will also likely help. Actually, come to think of it, your code, including the version I wrote, likely doesn't behave properly because of the concurrent modification during iteration.

All that being said, the model you're trying to run is fundamentally slow and no language feature or algorithm is going to change that. To handle the case with millions of input people you will either need an obscene amount of computing power and memory, or you're going to need to solve the relevant differential equation and model the expected value of the population directly.

5 Likes

Thanks man

Removing people from the Vec is going to be disastrously slow as it gets very large. Especially since the people getting removed are going to tend to be at the beginning of the Vec. You might be able to make it considerably faster by just flagging people as "removed" and skipping them with a single check in the loop. There are obviously other problems with that as the sim keeps running though.

5 Likes

Yep that's true, thanks

This is an interesting problem. The most likely culprit is the the O(n**2) performance of the remove calls as you will each update need to move n/2 items on average for n*k removed items, for a death rate k. Since you're preserving order, the oldest items are always first, so you're even hitting the worst case of removing the first item every time.

The trivial fix there is to just switch remove to swap_remove, which swaps the last item into the removed position instead of moving everything above down one, so if you don't care about preserving order this should be fine!

Another option is to switch to using a map, which requires simultaneous mutation and iteration, which Rust will (rightly) stop you from doing (annoyingly, it looks like there's no explicit entry iteration API, so you could remove the current entry?). You could build up a Vec of items to add and remove them apply them after: this means you do double the work, but that's still O(n), so shouldn't be too bad.

But one other interesting option is it might actually be faster to write the update as copying from the old to a new vec, skipping dead people and adding new births. This is both O(n) and means you have much simpler code.

Regardless, if you stick with a Vec you will continue to have the problem that removals (of any method) invalidate IDs used as indices: you're not using them yet, but you might want to soon the way this looks with the "lover" stuff.

You also need to make sure you don't skip the next item: eg if you are iterating 0..10, and on 3 you remove 3, the next item is still at index 3. Likewise the range you are using is only computed at the start of the loop, so if you try to fetch index 9 without adding any more you will panic with an out of bounds error, or if you add two more, then only the first at index 9 will be seen, and not the second.

7 Likes

VecDeque might be another interesting option if you want to remove the oldest people and add new ones.

1 Like

I think another possibly useful crate might also be generational-arena.