Memory Leak from passing Vec of Pathbufs over a channel?

While working on a multi-threaded Rust process that passes work from one thread to another over a channel, I've noticed a memory leak that (the initial allocation at least) appears to be occurring as a result of building a Vec of PathBufs.

Here's a stripped down example:

The first thread collects the Vec of PathBufs from the values of a HashMap that is protected by a RwLock (Arc<RwLock<HashMap<WatchDescriptor, PathBuf>>>):

  pub fn handle_inotify(&mut self,
                        paths_tx: mpsc::Sender<Vec<PathBuf>>) {
    ...
    loop {
      ...
      { // Block to ensure rwlock guard is released
        let paths_hashmap = self.paths_rwlock.read().unwrap();
        // This variable never seems to be freed between loop iterations
        let paths: Vec<PathBuf> = paths_hashmap.values().cloned().collect::<Vec<PathBuf>>();
        ...
        paths_tx.send(paths)
      }
    ...
    }
    ...
  }

When using bytehound to trace the source of the memory leak, it points out the line above that clones and collects the PathBufs in a Vec.

This is the thread that receives the above Vec:

pub fn start_stats_thread(paths_rx: mpsc::Receiver<Vec<PathBuf>>)
  -> (mpsc::Receiver<Stats>, thread::JoinHandle<()>)
{
    let handle = thread::spawn(move || {
      // The intent with this variable is to overwrite it repeatedly with
      // each update received over the channel.
      let mut current_paths = Vec::new();                                                                                                                     
      loop {
        match paths_rx.try_recv() {                                                                                                                               
          Ok(updated_paths) => {
          // This is another place where the leak might be getting perpetuation, but I can't see how...                            
          current_paths = updated_paths;
        }
        // If channel disconnected, end thread
        Err(mpsc::TryRecvError::Disconnected) => break,
        // If anything else (like the channel is Empty due to no message
        // pending), do nothing, leave current_paths as it is,
        // as there's been no update
        _ => {}
      }

      for cgroup_path in &current_paths {
        ...
      }
    }
  });

  (receiver, handle);
}

Any ideas on understanding how I'm managing to leak this memory, or how to eliminate the leak?

Thanks for any leads.

The Vec allocated on that line eventually ends up stored in the variable current_paths. Then, if the program exits without explicitly waiting for the stats_thread, that vector will not be freed — its memory will only be reclaimed by the OS's process termination which deallocates the process's entire memory space — which is “a leak” by the definition of leak detection tools.

Thus, it is not actually a problem because the “leaked” data is data that is expected to be used up until the end of the program. If you want a clean report, join() the stats thread, but this will not actually improve your program's performance in any way.

1 Like

All of the threads are join()ed by the main thread, when the program exits, as the intent is for the threads to run continuously, but to overwrite current_paths with each update to updated_paths, thus avoiding a memory leak altogether.

Perhaps just assigning updated_paths to current_paths on each iteration through the loop isn't appropriate? Would doing a current_paths.clear(); be more appropriate in this case? Or something else to ensure the Vec is freed on each iteration through the loop?

It should be. Assigning to a variable drops the old value. This should not in itself result in a leak.

(It doesn't explain why you're seeing a leak if everything else that has been said is accurate, but) there's nothing in the example that implies to me you need to or benefit from declaring current_paths outside of the loop. This would drop the variable at the bottom of the loop instead of on reassignment and after function completion.

loop {
    let current_paths = match paths_rx.try_recv() {
        Ok(paths) => paths,
        Err(mpsc::TryRecvError::Disconnected) => break,
        _ => Vec::new(),
    };

    for cgroup_path in &current_paths { /* ... */ }
}

Ah, yes - the reason I keep current_paths outside the loop is that there could be many loop iterations where the paths_rx.try_recv() doesn't return anything, as the previous thread in the pipeline didn't have an update to the vector of paths - so we need to keep using the same current_paths value over and over until an update arrives.

1 Like

Ah ok, I did misinterpret that as a one-shot consumption.

Thank everyone for the helpful input - the ultimate issue turns out to be programming logic, not a memory leak at all.

Had to put an additional inner loop around the try_recv(), as the "memory leak" actually turned out to be many intermediate updated copies of the Vec queuing up in the channel. I only care about the most recent update to current_paths to come over the channel, so if there are many queued up, I now just drain the channel, then break out of that inner loop when the channel is empty, with the very latest update to current_paths - the intermediate updates aren't interesting.

I could have also put a limit on the number of queued items in the channel, which would have alerted me to the fact that too much was queueing up and not being handled in a timely manner.

I think whether or not such kind of cases (I haven't looked at your particular case) can be considered "memory leaks" is at least arguable, in my opinion.

From another thread:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.