Possible memory leak?

Does this code have any memory leaks? With a large input string, I see that after 10 mins, when the partition is not writeable, some of the memory is reclaimed but not all.
For example, if my input string is 1MB, and in 10 mins, I write 100k entries, consuming 100GB of memory, after 10mins, I see that around 80GB is reclaimed, but the remaining 20GB is held on by the process.

use std::collections::HashMap;
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;
use std::rc::Rc;
use std::sync::Arc;

pub struct Partition {
    pub index: HashMap<Rc<String>, u64>,
    pub inverted: HashMap<u64, Rc<String>>,
    pub writeable: Arc<AtomicBool>,
}

impl Partition {
    pub fn new() -> Self {
        let partition = Self {
            index: HashMap::default(),
            inverted: HashMap::default(),
            writeable: Arc::new(AtomicBool::new(true)),
        };
        let writeable = partition.writeable.clone();
        thread::spawn(move || {
            thread::sleep(tokio::time::Duration::from_secs(10 * 60));
            writeable.store(false, Ordering::SeqCst)
        });
        partition
    }
}

pub fn get_partition(partition: Partition) -> Partition {
    let writeable = partition.writeable.load(Ordering::SeqCst);
    if !writeable {
      println!("inserted {:?} entries", partition.index.len());
        return Partition::new();
    } else {
        return partition;
    }
}

fn main() {
    let mut partition = Partition::new();
    let mut i = 0;
 	loop {
      let mut term:String = String::from("some large input string"); 
        term.push_str(&i.to_string());
        let rc_term = Rc::new(term);
        partition = get_partition(partition);
        partition.index.insert(rc_term.clone(), i as u64);
        partition.inverted.insert(i as u64, rc_term.clone());
        i += 1;
    }
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cd66964df08a266400a823ad0a5e5bbd

This code can't technically leak, because it doesn't use any deliberately-leaking functions, and its Rc/Arc types aren't recursive, so they have no way of creating reference cycles.

What you may be seeing is memory fragmentation — the allocator takes memory from the system as pages (somewhere between 4KB to 2MB) and can only return whole pages if they're completely empty. If a page happens to contain any allocation at all, even if it's a single string, it will keep belonging to your process. Allocators like jemalloc have some mitigations for it, but generally you should always expect a little bit of overhead.

You can use some heap profiler. For example, on macOS there's Xcode's Instruments that has Allocations profiling template. It works with Rust when it uses system default allocator.

BTW: consider using Rc<str> instead of Rc<String>. It's a fat pointer, but avoids a double indirection.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.