First Rust project - how do I deal with memory leaks?

I'm a complete beginner in Rust. In order to learn the language I'm working on a project to convert an a-life simulation I wrote in C to Rust.

In this program, a bunch of entities move around a sparse 2-dimensional array eating smaller entities and try to avoid being eaten by larger ones. If they do well they spawn new entities. If they don't, they die.

In the Rust version, I've implemented the sparse array as a vector of structs wrapped in a Option. The empty cells all point to a single None value, the rest move around the vector consuming any lower level entities they encounter. In Rust terms, all these entities have the same scope and lifetime, but in terms of the program they don't. Only the successful ones persist.

The problem is, I can't figure out how to delete "dead" entities. I can overwrite them by assigning a None to the same index, but what happens the original entity? I assume that it creates a memory leak. The vector element now points to a different location, but the memory allocated to the now dead entity stills holds that data. We just don't have the pointer anymore. As the program runs, it is expected that there will be lots of dead entities that need to be cleared off, so I need a reliable way to free the memory they used.

I've created a gist that demonstrates a simplified version of my project. Is there a way to free the dead entities before reallocating the vector element, or is my approach to this problem flawed?

1 Like

When you write vec![empty_entity; 10] you allocate memory for all 10 entities, note that for Option<Entity> it does not matter if you have None or Some(Entity), memory usage will be the same. If now for every new entity you will push it to the end of the Vec, then you memory usage will be inefficient, as emptied slots will not be reused. (btw, I don't think it's a memory leak per se, as you still have access to the data and it will be dropped eventually, so it's just suboptimal code) To solve this, you'll need to reuse empty slots in the vector, so:

// instead of this
entity_vec.push(Some(new_entity));
// you'll need to write:
let pushed = false;
for slot in entity_vec.iter_mut() {
    if slot.is_none() {
        *slot = Some(new_entity);
    }
}
if !pushed { entity_vec.push(Some(new_entity)); }

As you can see pushing will have non-constant cost, you can optimize it by storing index of empty nodes which will be updated at entity deletion and addition. Code can look like this:

struct World {
    buf: Vec<Option<Entity>>,
    empty_index: Vec<usize>,
}

impl World {
    fn add_entity(&mut self, entity: Entity) {
        match self.empty_index.pop() {
            Some(idx) => {
                assert!(self.buf[idx].is_none());
                self.buf[idx] = Some(entity);
            },
            None => self.buf.push(Some(entity)),
        }
    }
    
    fn del_entity(&mut self, idx: usize) {
        assert!(self.buf[idx].is_some());
        self.buf[idx] = None;
        self.empty_index.push(idx);
    }
}
1 Like

A vector has a pointer that points to a contiguous block of memory where the data is stored. If you overwrite one of the vector's members, then no memory leak is made because unless you were storing a vector of pointers that were pointing to heap allocated memory, the data is simply overwritten. When the vector reaches the end of it's scope, then the drop method is called and the data is freed.

1 Like

Thanks for the replies.

@Noah11012 I think I have this conceptually wrong. I thought that the vector was just an array of pointers, but the entities themselves are random locations in the heap. So in these two lines...

let empty_entity: &Option<Entity> = &None;
let mut entity_vec: Vec<&Option<Entity>> = vec![empty_entity; 10];

...the first sets up a entity struct on the heap, while the second sets up ten pointers to that one entity.

But what actually happens here? Does the entity_vec now have ten copies of the empty_vec? If so, is it actually cloning it?

The current prototype uses the vector method swap() to simulate the entities moving about the vector. I had understood this to be an efficient swapping of pointers, but is it actually swapping elements? I want this project to be able to scale up to thousands of entities or more which should be doable if it's just passing pointers around, but less likely if it is moving/cloning structs. I think now that my design is wrong.

@newpavlov The vector set here represents the universe that the entities live in and will never grow. Its size is set once at runtime (eventually according to user-defined parameters) and will always be the same size. Also the rules for adding new entities are a little more complicated. Conceptually, the spawning entity chucks out an egg. If that egg lands on an empty slot a new entity is created, otherwise the egg is lost.

However, you code has given me some ideas. I think I can adapt your code to my purposes. I just need to read up more on the impl keyword. Thanks!

It's a common beginner strategy to think "rust references are like C pointers", but as you are discovering, this becomes difficult quickly, due to the lifetimes involved.

One of the usual approaches in such scenarios is to abandon references completely, and use offset-indices into some arena-like structure instead.

So you'd have one densely packed vec that holds all currently living entities, so has all the memory allocations (basically an "object pool")
Then a second vec for your "world"; this is the sparse one. This contains only usizes, or Option<usize>, where each usize is the index into the dense vec.

By splitting the allocations and the position into two different because, you guarantee you never have to move the structure.

vec::swap is indeed a great way to move items around with minimal copying.

2 Likes

Rust automatically manages memory. Although leaks are possible in theory, in practice you never need to worry about freeing memory. When you assign or replace something, Rust will free the previous value automatically.

2 Likes

Moving is as fast with small structures as it is with pointers. Wrapping in Box is how you would change to a heap allocation. Using a pointer could slow down code as it needs dereferencing. Optimising the structure for small size could be a better option than trying other methods, but you only do this after getting code that works.

1 Like

I wonder if using a Vec is the best solution. If you just have a set of Entitys, you may use std::collections::HashSet<Entity>, which would spare you the pain of dealing with memory menagement. If you want to refer to your Entitys by an Identifier, then maybe you could use a std::collections::HashMap<Identifier, Entity>. I don't know if this approach would perform better than yours, but it looks to me that the implementation would be more straightforward.

The HashSet and HashMap might incur too much hashing overhead (at least compared to a usize index) if he ever gets to the "thousands" of entities he (briefly) mentioned; especially with the default cryptographically secure, DoS resistant hash method (using other hashers might be worth a shot though)

Another common data structure for sparse datasets is good old Vec<(Position, Data)> (or his SoA cousin (Vec<Position>, Vec<Data>)).

Thanks for all the great suggestions guys. I really appreciate all the help.

@juleskers You're right that I had references and C pointers equated which is one way I was going wrong. I had actually considered an approach like the one you suggest, but I couldn't figure out how to map the World vector back to the entity pool. I was thinking it should be an array of pointers, but that's thinking in C again. An array of Option<usize> seems like an elegant and simple solution.

On the hash map suggestions, I'm going to have to look more into all that. I've a long way to go here...

Though the usual "premature optimisation" warnings apply, you might also want to experiment with an array of "just" usize, and use 0 as your personal "empty" value (i.e. put a None as first item in your entity vec, and keep it there forever).

The memory layout of Option<T> (and effectively all other Rust enums too) has two possibilities, it is either

  • sizeof(tag) + maybe-padding + sizeof(T) (i.e. a "tagged union" in C parlance)
  • or, if the compiler can guarantee that T is non-null:
    sizeof(T)

In the latter case the compiler 'condenses' the tag and the T together, and uses the null-pointer as the tag for None. This is safe in Rust, because the type system enforces that you can only get at the T after checking for None/nullpointer (via if let or match)

Since usize has 0 as a legal value, the optimisation doesn't apply; So an Option<usize> would always need an extra tag of at least one byte. I'm not sure about the alignments involved, but I wouldn't be surprised if Option<usize> is double the size of a plain usize.
If your world becomes huge, that doubling of every cell can easily become an issue.

Further reading:

2 Likes