Returning a list of references to members of a nested data structure

I have struct "M" which has member "children" which is a HashMap of more Ms. Ms full of other Ms make up the entire data structure. While traversing the data structure looking for specific values, I would like to collect a list of all the nodes that were passed through and return those to the caller. Ideally, these would be references to the existing Ms rather than me making copies of the data structure or portions of the data structure. The caller could then use more of M's methods on those returned items. Some of M's methods mutate, so that is a consideration.

I have code.
The Playground: Rust Playground
The Gist: Code shared from the Rust Playground · GitHub

use std::collections::HashMap;
// use std::rc::Rc;
// use std::cell::{RefCell, RefMut};

fn main() {
    // This code simulates an insert() function.
    let mut m = M { children: HashMap::new() };
    let mut mb = M { children: HashMap::new() };
    let mut mc: M = M { children: HashMap::new() };
    let mut md: M = M { children: HashMap::new() };
    let me: M = M { children: HashMap::new() };
    md.children.insert("weld".to_string(), me);
    mc.children.insert("nut".to_string(), md);
    mb.children.insert("washer".to_string(), mc);
    m.children.insert("bolt".to_string(), mb);
    ////////

    // Prove to ourselves that the normal find() function works.
    let a_success = m.find(vec!["bolt", "washer", "nut"]);
    match a_success {
        Ok(child) => println!("Successful success: {:?}", child),
        Err(msg) => println!("Failed success: {}", msg)
    }
    let another_success = m.find(vec!["bolt", "washer", "nut", "weld"]);
    match another_success {
        Ok(child) => println!("Successful success: {:?}", child),
        Err(msg) => println!("Failed success: {}", msg)
    }
    let a_failure = m.find(vec!["nut", "washer", "bolt"]);
    match a_failure {
        Ok(child) => println!("Failed failure: {:?}", child),
        Err(msg) => println!("Successful failure: {}", msg)
    }
    let another_failure = m.find(vec!["weld"]);
    match another_failure {
        Ok(child) => println!("Failed failure: {:?}", child),
        Err(msg) => println!("Successful failure: {}", msg)
    }
    
    // Desired usage pseudocode
    // let mut m = M { children: HashMap::new() }
    // m.insert(vec!["bolt", "washer"]);
    // let blah = m.find_with_path(vec!["bolt", "washer"]);
    // let new_m = blah.last(); // Washer M
    // new_m.insert(vec!["nut", "weld"]);
    // new_m.find(vec!["nut", "weld"]);
    // m.find(vec!["bolt", "washer", "nut", "weld"]);
    // Yay
    // let another_m = blah[0]; // Bolt M
    // another_m.find(vec!["washer"])
    // Yay
    // Profit
}

#[derive(Debug)]
struct M {
    children: HashMap<String, M>
}

impl M {
    fn find(&mut self, key: Vec<&str>) -> Result<&mut M, String> {
        let mut current = Some(self);
        
        for key_part in key[..].to_vec() {
            match current.and_then(|item| item.children.get_mut(key_part)) {
                Some(child) => current = Some(child),
                None => current = None
            }
        }
        
        match current {
            Some(child) => Ok(child),
            None => Err(format!("Unable to locate key [{}]", key.join(", ")))
        }
    }
    
    fn find_with_path(&mut self, key: Vec<&str>) /* -> Some list of Ms */ {
        let mut current = Some(self);
        // let path: Vec<something> = Vec::new()
        // ^^ Some structure to hold references to
        // each M we successfully find, to be
        // returned to caller
        
        for key_part in key[..].to_vec() {
            match current.and_then(|item| item.children.get_mut(key_part)) {
                Some(child) => {
                    // Similar to other find function, but every time we
                    // successfully find another child M, we push it
                    // onto a list of some kind that can be returned to
                    // the caller. We then move onto the child and the
                    // next piece of the key, as in the other find
                    // function.
                    //
                    // Magic goes here. Somehow store reference in a
                    // list or other structure that will allow me to
                    // return a group of Ms or references to Ms to
                    // the caller. The caller must be able to get
                    // his hands on mutable versions of the Ms so
                    // as to call additional methods on them.
                    //
                    // I've tried various approaches. For a minute,
                    // I thought Rc/RefCell might save me, but I
                    // couldn't figure out how to store
                    // Rc<RefCell<&M>>s from current or child,
                    // because those belong to this function and/or
                    // this match arm.
                    current = Some(child)
                },
                None => current = None
            }
        }
    }
}

There are comments in the code giving additional details about what I'm trying to achieve. I have left off the insert() method. I have also written another find() which uses recursion rather than the loop shown in this example.

Thanks!

// Magic goes here. Somehow store reference in a
// list or other structure that will allow me to
// return a group of Ms or references to Ms to
// the caller. The caller must be able to get
// his hands on mutable versions of the Ms so
// as to call additional methods on them.

Well, that's a nontrivial goal. You say you want to create a list of references for Ms that are contained within each other but also need mutable access: Since the outermost Ms contain/own the ones further inside, mutable access to the outermost ones entails mutable access to the contained ones; TL;DR this way you'd get mutable access to the inner Ms from multiple references, i.e. shared mutable access, so interior mutability is a must.

How would one go about introducing interior mutability? You mentioned you tried Rc<RefCell<&M>>, but you probably didn't go far enough to make it actually work. The problem is that if you want interior mutability here, you need to bake it into the data structure itself. E.g. something like

struct M {
    children: RefCell<HashMap<String, Rc<M>>>
}

impl M {
    fn find(self: &Rc<M>, key: Vec<&str>) -> Result<Rc<M>>, String> {
        // ...
    }
    
    fn find_with_path(self: &Rc<M>, key: Vec<&str>) -> Vec<Rc<M>> {
        // ..
    }

    fn insert(self: &Rc<M>, key: Vec<&str>) {
        // ...
    }
}

This is a starting point of how to implement that, based on your original playground: Rust Playground

Or alternatively, you can make the Rc part of the M and interpret M as a reference-like type / handle, so e.g. something like

struct M {
    children: Rc<RefCell<HashMap<String, M>>>
}

impl M {
    fn find(&self, key: Vec<&str>) -> Result<M>, String> {
        // ...
    }
    
    fn find_with_path(&self, key: Vec<&str>) -> Vec<M> {
        // ...
    }

    fn insert(&self, key: Vec<&str>) {
        // ...
    }
}
4 Likes

As another alternative without any Rc/RefCell, you can "temporarily" change the data structure in order to "rip out" the path of Ms you'd like to present, and restore it upon destruction. However, this would lock (i.e. mutably borrow) the original m for as long as the return value of find_with_path exists, and it would also block access to each subsequent element of the extracted path through its parent. (So you cannot retrace the same path as passed to find_with_path even a single step.) Your use case / example code doesn't seem to fit either of these conditions, but then again it seems to be a toy example, and it isn't clear what you actually need that find_with_path-style function for in practice. The API would look roughly like

struct M {
    children: Option<HashMap<String, M>>, // only temporarily `None` when becoming part of path
    // unwrapped implicitly where necessary with useful error message on failure
}
struct Path {
    original: &'a mut M,
    contents: Vec<M>,
}
impl Drop for Path { /* restore the original */ }
impl ops::Deref for Path { type Target = Vec<M>; /* ... */ }
impl ops::DerefMut for Path { /* ... */ }

impl M {
    fn find(&mut self, key: Vec<&str>) -> Result<&mut M>, String> {
        // ...
    }
    
    fn find_with_path(&mut self, key: Vec<&str>) -> Path<'_> {
        // ...
    }

    fn insert(&mut self, key: Vec<&str>) {
        // ...
    }
}

Another approach again is to instead of returning a Vec of references/pointers, just provide &mut-access to the visited Ms while the path is traversed via a callback. This, too, allowed less flexible access, but perhaps it's enough. It would allow full access to each node in the callback, including to the yet-to-be-visited children along the same path. But your "Desired usage pseudocode" has more complicated access than this, too, so it might not be a reasonable option.

struct M {
    children: HashMap<String, M>,
}

impl M {
    fn find(&mut self, key: Vec<&str>) -> Result<&mut M>, String> {
        // ...
    }
    
    fn find_with_path(&mut self, key: Vec<&str>, callback: F)
    where
        F: FnMut(&mut M), // or perhaps with some extra argument indicating how in
        // we already are, e.g. whether or not it's the last element, etc
    {
        // ...
    }

    fn insert(&mut self, key: Vec<&str>) {
        // ...
    }
}

Whooo, don't I know it. I've been fiddling with this for days.

You really dug in, here. I appreciate that. Thanks for the thorough answers and the descriptions of your reasoning. I'm still looking over everything. I'm not sure yet if there's enough for me to successfully implement find_with_path() based on your helpful replies, but I'm going to take a shot at it.

I tried to do that, but I couldn't get it right. It's been a couple of days, and I don't remember what problem I was having with that approach, but looking at your example, I have a question: Is the idea here to clone the nodes you want to keep track of? That seems to be implied by the function signatures, and that's something I've been hoping to avoid. Not only because the nodes being returned could branch into tens of thousands of children, but also because my ideal scenario looks like this:

let blah = m.find(vec!["some", "path"]);
m.insert(vec!["some", "path", "here"]); // "Here" gets appended to the existing "some" -> "path" path.
blah.last().find(vec!["here"]); // Yay

I definitely don't want to lock up any of the nodes if I can avoid it. The use case is finding nodes-in-common between find() operations, identifying shared prefixes, determining whether a given path contains a specific child, saying, "I want to do an insert/find/whatever operation starting at the xth node returned by find_with_path()," and things along those lines.

Thanks again for the thoughtful replies. I appreciate the time you took on this.

I'll assume you're referring to the version where M stores an Rc<RefCell<...>>. I believe steffahn's idea here is that creating a new M comes at no cost this way: cloning an Rc only creates a new reference to the inner value, performing a shallow copy rather than a deep copy. Looking at your ideal scenario, it's probably the version that would work best. (But note that it comes with the caveat that none of an M's children can be dropped until all copies have been deleted. Also, one have to be careful not to borrow from the RefCell recursively, which will crash the program. Instead, one must clone the inner Rcs out of it before returning or running a callback.)

1 Like

I was referring to the second code block, where the function signatures look like

and

in which the Ms are Ms, and not &Ms. Presumably, this means I am clone()ing the sub-graphs I want to return.

That's the part I was trying to correct: when you clone an M that contains an Rc, it does not create a deep copy of its subgraph. Instead, it creates a new reference to the same subgraph. Effectively, cloning an Rc<T> is equivalent to copying a &T; the only difference is in the ownership semantics. So to create an M from an &M, one just needs to write M { children: Rc::clone(other.children) } (or equivalent), which does not produce a new version of the HashMap.

Oh, I understand. I was still thinking of the M that has a HashMap in children, rather than an Rc<RefCell<HashMap>>.

Thanks to both of you, and especially to steffahn who put so much effort into his replies. Making find_with_path() work was trivial after steffahn showed me how to properly apply Rc and RefCell. Thumbs-up all around. Now I'm off to integrate this into my "real" code and write some tests. I'm considering now whether I want to try grafting this onto the recursive version of find() that I've also written, or just drop that function, altogether.