Efficiently joining child->parent vectors or iterators by key

Suppose I have obtained relational data from a SQL database (or other source) and have two vectors of the form:

parent: Vec<P> = [a, b, c, ...];
children: Vec<C> = [a1, a2, a3, ..., b1, b2, b3, ..., c1, 2, 3, ...];

where {a1, a2, ...} belong to a, etc.; and with P having element id and C having element parent_id.

I would like to output a nested data structure of the form Vec<Pc> where Pc has a vector field children (for example this might be serialized to json something like:)

[
    {
        "id": 1,
        "children": [ { ... }, ... ]
    },
...
]

What is the most idiomatic and parsimonious way to join the children with parents, assuming children is already sorted by parent_id ?

One possibility is:

for p in parents.iter() {
    parents_with_children
        .push(Pc::new(p, children.iter().filter(|c| c.parent_id == p.id));
}

But the disadvantage is that the entire list of children needs to be repeatedly scanned p.len() times.

Is there another way to do this , ideally using some standard iterator function(s), that scans the children list only once at most?

(I acknowledge that the SQL layer itself could do the joining but please take it as a requirement for purposes of this question that this is not possible or desireable due to other constraints)

If you don't need to do this streaming, the simplest way is probably by building a map instead of an iterator:

let mut joined: BTreeMap<usize, (P, Vec<C>)> = parent.into_iter().map(|p| (p.id, (p, vec![]))).collect();
for c in children {
    joined[c.parent_id].1.push(c);
}

Alternatively, it looks like itertools::GroupingMap has useful tools for this sort of thing, but I've never used it.

1 Like

This would be a good use case for group_by, but that's not stable yet. You would need extra logic to include parents not present amongst the children (if desired). Playground.

You can find the subslices yourself, though it is somewhat verbose.

You can peek and consume, though the repeating pushing and unwrapping isn't the best. Hopefully optimization would see through them.

1 Like

@quinedot Thanks for the link to group_by -- I was looking for something just like this, as I have been a frequent user of D's excellent range functions that include chunkBy, group, etc. (std.algorithm.iteration - D Programming Language).

I am surprised to see the Rust group_by with Vector as it seems a little bit like an outlier, as most of the functional-style collection manipulation functions belong to Iterator. Nevertheless, this looks like the right solution.

@2e71828 Thanks for the pointer to itertools!

If I understand correctly, the point is that group_by by definition cannot be lazy, it must process the whole collection to yield anything. Iterator adapters, on the other hand, are designed to read exactly what they need for the next element of output, without collecting to intermediate structures.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.