There are a number of posts on this forum about iterating over the fields of a struct. I'm looking for advice on what the best thing to do in my situation is.
My project uses the procfs
crate. In particular, for each process, there is a Vec<MemoryMap>
associated with it (MemoryMap in procfs::process - Rust):
pub struct MemoryMap {
pub perms: MMPermissions,
pub pathname: MMapPath,
pub pss: u64
} // simplified for clarity
pub enum MMapPath {
Path(PathBuf),
Heap,
Stack,
TStack(u32),
Vdso,
Vvar,
Vsyscall,
Rollup,
Anonymous,
Vsys(i32),
Other(String),
}
Each MemoryMap
represents an entry inside the /proc/<pid>/smaps
file for that process. Now, I want to aggregate all of these maps into one single struct that stores a process's memory usage by type of memory. For instance, Stack
and Heap
are different categories. Additionally, each unique (Path(path), MMPermissions)
gets its own category. If in the Vec<MemoryMap>
we encounter two maps with the same (Path(path), MMPermissions)
, we combine them into one map whose pss
field is the sum of the two. Overall, in my first design, each process has a struct that looks like this:
pub struct MemoryExt {
pub stack_pss: u64,
pub heap_pss: u64,
pub thread_stack_pss: u64,
pub file_map: HashMap<(PathBuf, MMPermissions), u64>,
pub anon_map_pss: u64,
pub vdso_pss: u64,
pub vvar_pss: u64,
pub vsyscall_pss: u64,
pub vsys_pss: u64,
pub other_map: HashMap<String, u64>,
}
This works fine. Adding two of these is a meaningful operation for me, e.g., adding the memory usage of two child processes. Implementing it is a bit cumbersome, but it works:
impl Add for MemoryExt
impl Add<&MemoryExt> for MemoryExt {
type Output = MemoryExt;
fn add(self, rhs: &MemoryExt) -> MemoryExt {
MemoryExt {
stack_pss: self.stack_pss + rhs.stack_pss,
heap_pss: self.heap_pss + rhs.heap_pss,
thread_stack_pss: self.thread_stack_pss + rhs.thread_stack_pss,
file_map: add_maps(self.file_map, &rhs.file_map), // this is a function that works correctly
anon_map_pss: self.anon_map_pss + rhs.anon_map_pss,
vdso_pss: self.vdso_pss + rhs.vdso_pss,
vvar_pss: self.vvar_pss + rhs.vvar_pss,
vsyscall_pss: self.vsyscall_pss + rhs.vvar_pss,
vsys_pss: self.vsys_pss + rhs.vsys_pss,
other_map: add_maps(self.other_map, &rhs.other_map),
}
}
}
One thing I am doing with this data is plotting the stack, then the heap, etc. for each field in the struct in a predefined order. (For each HashMap
field, I can either plot the sum of its entries or each entry individually). This struct works fine for that. However, the next thing I want to do is sort all of the fields from greatest to least memory consumption. This is where I may need to rethink the design. Here is my first attempt:
pub struct MemoryExt(HashMap<MemCategory, u64>);
pub enum MemCategory {
File(PathBuf, MMPermissions),
Heap,
Stack,
TStack,
Vdso,
Vvar,
Vsyscall,
Anonymous,
Vsys,
Other(String)
}
This has the following advantages (I think):
- constant time access when I know which category I'm looking for
- more concise implementation of Add
- iterable for free
But the following disadvantage:
- When I want to iterate through all of the
File
keys, for example, to aggregate the usage of all memory-mapped files, I will have to iterate through all of the other keys, too. (Probably negligible performance cost in practice, but still bugs me a little bit.)
Another option is to make an Iter
for my original struct that will visit each field in order, generating (MemCategory, u64)
tuples. This seems like a good way to do it, but would also add more code.
A third option is in between the first two where I have something like this:
pub struct MemoryExt {
pub const_map: HashMap<MemCategory, u64>,
pub file_map: HashMap<(PathBuf, MMPermissions), u64>,
pub other_map: HashMap<String, u64>,
}
pub enum MemCategory {
Heap,
Stack,
TStack,
Vdso,
Vvar,
Vsyscall,
Anonymous,
Vsys,
}
I think this would give me the usage characteristics I want while still cutting down on the implementation of Add
. But it feels wrong to use a HashMap
with an enum as the key, when that's pretty much equivalent to a struct up to the ability to iterate.
In summary: I'm stuck and would like to know what the community thinks is the best way to do this.