Grouping and Totalling over a vec of struct

Let's say I've this playground code:

#[derive(Clone, Debug)]
struct Delivery<'a> {
    customer: &'a str,
    location: &'a str,
    item: &'a str,
    date: &'a str,
    quantity: u64
}

fn main() {
    let mut delivery: Vec<Delivery> = Vec::new();
    delivery.push(Delivery { customer: "Hasan", location: "JO",
                            item: "xyz", date: "1/1/2019", quantity: 20});
    delivery.push(Delivery { customer: "Hasan", location: "SA",
                            item: "xyz", date: "1/1/2019", quantity: 30});
    
    let hasan_deliveries: Vec<Delivery> = delivery
                            .into_iter()
                            .filter(|d| d.customer == "Hasan")
                            .collect();
                            
        println!("Total deliveries to: {}, are: {}", 
                        hasan_deliveries[0].customer,
                        hasan_deliveries[0].quantity);
}

The output will be:

Total deliveries to: Hasan, are: 20

I want to be able to get:

Total deliveries to: Hasan, are: 50

And avoid collecting hasan_deliveries as a Vec, as actually the returned type should be Delivery where a kind of grouping is done,

I was able to return the total only, as:

    let sum: u64 = delivery.into_iter()
                           .filter(|d| d.customer == "Hasan")
                           .fold(0, |mut sum, d| {sum += d.quantity; sum});
    println!("Total deliveries are: {}", sum);

This returned:

Total deliveries are: 50

But I need a kind of mix, I need the return to be a struct of proper grouping, and summation, as SQL

In functional programming, when collapsing a collection of items into a smaller collection or a single item, fold (sometimes also called reduce) is a very friendly way.

let init = Delivery {
    customer: "Hasan",
    location: "JO",
    item: "xyz",
    date: "1/1/2019",
    quantity: 0,
};

let hasan_deliveries = delivery.into_iter().fold(init, |mut acc, nxt| {
    if nxt.customer == "Hasan" {
        acc.quantity += nxt.quantity;
    }
    acc
});

println!(
    "Total deliveries to: {}, are: {}",
    hasan_deliveries.customer, hasan_deliveries.quantity
);

Note that the above code has a problem: while it aggregates upon the quantity, the location information (whether "JO" or "SA" is lost). Since the request was for a single Delivery object, I took the liberty of using any one of the locations. You can change the code to aggregate on location and quantity both but it will return a collection instead of a single instance.

2 Likes

Thanks slot.
What about if I want to do it for all, without filtering, like foreach customer, foreach location?

How about using group_by from itertools crate?

(Note: I have taken liberty to alter your original code to provide a minimal working example. You can always adapt it.)

use itertools::Itertools;

#[derive(Debug)]
struct Delivery {
    customer: String,
    location: String,
    quantity: u64,
}

pub fn main() {
    let deliveries = vec![
        Delivery {
            customer: String::from("H"),
            location: String::from("JO"),
            quantity: 10,
        },
        Delivery {
            customer: String::from("H"),
            location: String::from("SA"),
            quantity: 35,
        },
        Delivery {
            customer: String::from("H"),
            location: String::from("SA"),
            quantity: 10,
        },
        Delivery {
            customer: String::from("Q"),
            location: String::from("JO"),
            quantity: 5,
        },
    ];
    
    let result: Vec<((String, String), Delivery)> = deliveries
        .iter()
        // We are grouping by del.customer, del.location
        .group_by(|del| (del.customer.clone(), del.location.clone()))
        .into_iter()
        .map(|(group, records)| {
            let init = Delivery{customer: group.0.clone(), location: group.1.clone(), quantity: 0};
            let agg_qty = records.into_iter().fold(init, |mut acc, nxt| {
               acc.quantity += nxt.quantity;
               acc
            });
            (group, agg_qty)
        })
        .collect();
        
    println!("{:?}", result);
}

The above implementation involves multiple passes (loops) over the data. If needed, you can use a HashMap<(customer, location), Delivery> to implement the grouping and aggregating logic yourself in a single pass.

3 Likes

Eh no. itertools group_by is different from SQL GROUP BY. Specifically, consider AA BA BB AB, grouping by the first letter. itertools create 3 groups, {AA}, {BA, BB}, {AB}. You want 2 groups, {AA, AB}, {BA, BB} for SQL GROUP BY.

That is correct. Though I did not mention in my reply, it is a documented behaviour. Specifically the part:

Consecutive elements that map to the same key (“runs”), are assigned to the same group.

Therefore, group_by does require group keys to be sorted.

Thanks, I'll check it once I be home

How?

Similar to itertools approach we will use the pair (name, location) as the key of the hash-map. For each record in your dataset, if the hash-map contains the above key, we retrieve the record associated with the key and update the quantity. If it does not exist, we create a new record against that key.

use std::collections::HashMap;

#[derive(Debug)]
struct Foo {
    name: String,
    location: String,
    quantity: i32,
}

pub fn main() {
    let data = vec![
        Foo {
            name: String::from("P"),
            location: String::from("A"),
            quantity: 15,
        },
        Foo {
            name: String::from("P"),
            location: String::from("A"),
            quantity: 35,
        },
        Foo {
            name: String::from("Q"),
            location: String::from("A"),
            quantity: 5,
        },
        Foo {
            name: String::from("Q"),
            location: String::from("B"),
            quantity: 10,
        },
        Foo {
            name: String::from("R"),
            location: String::from("C"),
            quantity: 20,
        },
        Foo {
            name: String::from("R"),
            location: String::from("A"),
            quantity: 7,
        },
        Foo {
            name: String::from("Q"),
            location: String::from("B"),
            quantity: 12,
        },
    ];

    // The keys of this map will be groups
    let mut map: HashMap<(String, String), Foo> = HashMap::new();
    for d in data {
        let record = map
            .entry((d.name.clone(), d.location.clone()))
            .or_insert(Foo {
                name: d.name.clone(),
                location: d.location.clone(),
                quantity: 0,
            });
        record.quantity += d.quantity;
    }

    println!("Grouped by location and name: {:?}", map);
}
2 Likes

Thanks a lot, appreciated.

IterTool's into_group_map or something like that is what you are referencing as the SQL GROUP BY for note.