Advice for designing a struct that can iterate over its fields

There are a number of posts on this forum about iterating over the fields of a struct. I'm looking for advice on what the best thing to do in my situation is.

My project uses the procfs crate. In particular, for each process, there is a Vec<MemoryMap> associated with it (MemoryMap in procfs::process - Rust):

pub struct MemoryMap {
    pub perms: MMPermissions,
    pub pathname: MMapPath,
    pub pss: u64
} // simplified for clarity

pub enum MMapPath {
    Path(PathBuf),
    Heap,
    Stack,
    TStack(u32),
    Vdso,
    Vvar,
    Vsyscall,
    Rollup,
    Anonymous,
    Vsys(i32),
    Other(String),
}

Each MemoryMap represents an entry inside the /proc/<pid>/smaps file for that process. Now, I want to aggregate all of these maps into one single struct that stores a process's memory usage by type of memory. For instance, Stack and Heap are different categories. Additionally, each unique (Path(path), MMPermissions) gets its own category. If in the Vec<MemoryMap> we encounter two maps with the same (Path(path), MMPermissions), we combine them into one map whose pss field is the sum of the two. Overall, in my first design, each process has a struct that looks like this:

pub struct MemoryExt {
    pub stack_pss: u64,
    pub heap_pss: u64,
    pub thread_stack_pss: u64,
    pub file_map: HashMap<(PathBuf, MMPermissions), u64>,
    pub anon_map_pss: u64,
    pub vdso_pss: u64,
    pub vvar_pss: u64,
    pub vsyscall_pss: u64,
    pub vsys_pss: u64,
    pub other_map: HashMap<String, u64>,
}

This works fine. Adding two of these is a meaningful operation for me, e.g., adding the memory usage of two child processes. Implementing it is a bit cumbersome, but it works:

impl Add for MemoryExt
impl Add<&MemoryExt> for MemoryExt {
    type Output = MemoryExt;

    fn add(self, rhs: &MemoryExt) -> MemoryExt {
        MemoryExt {
            stack_pss: self.stack_pss + rhs.stack_pss,
            heap_pss: self.heap_pss + rhs.heap_pss,
            thread_stack_pss: self.thread_stack_pss + rhs.thread_stack_pss,
            file_map: add_maps(self.file_map, &rhs.file_map), // this is a function that works correctly
            anon_map_pss: self.anon_map_pss + rhs.anon_map_pss,
            vdso_pss: self.vdso_pss + rhs.vdso_pss,
            vvar_pss: self.vvar_pss + rhs.vvar_pss,
            vsyscall_pss: self.vsyscall_pss + rhs.vvar_pss,
            vsys_pss: self.vsys_pss + rhs.vsys_pss,
            other_map: add_maps(self.other_map, &rhs.other_map),
        }
    }
}

One thing I am doing with this data is plotting the stack, then the heap, etc. for each field in the struct in a predefined order. (For each HashMap field, I can either plot the sum of its entries or each entry individually). This struct works fine for that. However, the next thing I want to do is sort all of the fields from greatest to least memory consumption. This is where I may need to rethink the design. Here is my first attempt:

pub struct MemoryExt(HashMap<MemCategory, u64>);

pub enum MemCategory {
    File(PathBuf, MMPermissions),
    Heap,
    Stack,
    TStack,
    Vdso,
    Vvar,
    Vsyscall,
    Anonymous,
    Vsys,
    Other(String)
}

This has the following advantages (I think):

  • constant time access when I know which category I'm looking for
  • more concise implementation of Add
  • iterable for free

But the following disadvantage:

  • When I want to iterate through all of the File keys, for example, to aggregate the usage of all memory-mapped files, I will have to iterate through all of the other keys, too. (Probably negligible performance cost in practice, but still bugs me a little bit.)

Another option is to make an Iter for my original struct that will visit each field in order, generating (MemCategory, u64) tuples. This seems like a good way to do it, but would also add more code.

A third option is in between the first two where I have something like this:

pub struct MemoryExt {
    pub const_map: HashMap<MemCategory, u64>,
    pub file_map: HashMap<(PathBuf, MMPermissions), u64>,
    pub other_map: HashMap<String, u64>,
}

pub enum MemCategory {
    Heap,
    Stack,
    TStack,
    Vdso,
    Vvar,
    Vsyscall,
    Anonymous,
    Vsys,
}

I think this would give me the usage characteristics I want while still cutting down on the implementation of Add. But it feels wrong to use a HashMap with an enum as the key, when that's pretty much equivalent to a struct up to the ability to iterate.

In summary: I'm stuck and would like to know what the community thinks is the best way to do this.

You could have a const_map: [u64; NumberOfMemCategoryVariants] (or Box<[_; _]> or Box<[_]>) which can be indexed by MemCategoryVariants, create an iterator of (MemCategory, u64), and so on. It'd be a lot of boilerplate on your own but there are a selection of crates that will do most or all of it for you. (strum is one but there are others, and I'm not sure which would be best for this particular case offhand.)

New to Rust but i will advice on " Hybrid Struct + Iterator Adapter".

  • Type safety and clarity (explicit fields instead of a catch-all map)
  • Fast field access (you know where things are)
  • Easy addition via Add impl (explicit and readable)
  • Iteration/sorting flexibility (using your custom iterator)
impl MemoryExt {
    pub fn iter(&self) -> impl Iterator<Item = (MemCategory, u64)> {
        let mut items = vec![
            (MemCategory::Stack, self.stack_pss),
            (MemCategory::Heap, self.heap_pss),
            (MemCategory::TStack, self.thread_stack_pss),
            // REST OF THE FIELDS
        ];

        items.extend(self.file_map.iter().map(|((path, perms), val)| {
            (MemCategory::File(path.clone(), perms.clone()), *val)
        }));

        items.extend(self.other_map.iter().map(|(s, val)| {
            (MemCategory::Other(s.clone()), *val)
        }));

        items.into_iter()
    }
}

// Syntex etc could be wrong above but hope you get my point :)

Question: wondering if it is important to "split MemCategory" into:

  1. A simple MemKind enum (for heap, stack, etc.)
  2. The rest (dynamic keys like file-backed mappings or arbitrary names)

Will above suggestion follow that?

My suggestion was based on their "third option" where their file_map and other_map have already been split off from the rest. So it does follow that.

1 Like