Itertools group_by confusion

dingari · December 10, 2019, 11:58pm

Hi everyone, I've decided to stop banging my head against the wall on this one.

I'm trying to use the group_by method from itertools. I thought I understood how it works, but I must be doing something wrong...

I'm iterating over some data on the form of ((x: i32, y: i32), z: i32), and I'd like to use group_by on the z element. This is where it all falls apart. Sometimes, the output is correctly grouped, but sometimes elements with the same key value end up in different groups. If you take a look at the output below you'll understand. Try running the snippet a few times in the Playground. I'm guessing you'll get various results.

So my question is: What am I doing wrong?

Edit: Reading through the docs it seems like the magic words are consecutive elements.

use std::collections::HashSet;
use itertools::Itertools;

fn main() {
    let s = vec![((0, 1), 1), ((0, 2), 2), ((1, 2), 2), ((3, 1), 1)].into_iter().collect::<HashSet<_>>();
    
    println!("{:?}\n", s);
    
    s.iter()
        .group_by(|&x| x.1)
        .into_iter()
        .map(|(k, r)| (k, r.collect::<Vec<_>>()))
        .for_each(|x| println!("Key: {:?} | Group: {:?}", x.0, x.1));
}

(Playground)

Output:

{((3, 1), 1), ((0, 2), 2), ((0, 1), 1), ((1, 2), 2)}

Key: 1 | Group: [((3, 1), 1)]
Key: 2 | Group: [((0, 2), 2)]
Key: 1 | Group: [((0, 1), 1)]
Key: 2 | Group: [((1, 2), 2)]

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.31s
     Running `target/debug/playground`

uberjay · December 11, 2019, 12:31am

I got bit by this once a while back, probably due to my familiarity with ruby's Enumerable method of the same name.

I suspect the reason for this is that haskell's groupBy groups consecutive elements as well.

Also, restructuring the input data as a HashMap of vectors is also a pretty heavyweight operation that can't be done lazily. (so it can't offer a simple iterator adapter)

itertools does have an option for you, however: into_group_map.

EDIT: playground example.

uberjay · December 11, 2019, 12:48am

Related: RFC for adding group_by to rust std. I sort of regret the bike-shedding in there -- I think it'd be a great addition to the standard library, despite its name colliding with ruby's!

H2CO3 · December 11, 2019, 6:00am

I'd say there's a good reason for that in itself, other than "because Haskell". A general take-it-all group_by would need to allocate and exhaust the whole iterator eagerly, in order to put the elements in a map of some sort so as to group all of them. This:

wastes memory if you already have pre-sorted/pre-grouped data;
wastes time if you only want to consume part of the iterator;
imposes an additional Ord or Hash constraint on the key type.

In contrast, if it works on consecutive elements, it can still operate lazily, with O(1) memory and only a K: PartialEq bound, and it's trivial to opt-in to the globally-grouping and allocating behavior by collecting the iterator into a vector then sorting it.

system · March 10, 2020, 6:00am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting data back out of itertools group_by method help	3	663	January 12, 2023
Using an Iterator to group structured data	3	839	July 7, 2019
Group by keys of several hashmaps help	2	515	November 26, 2022
Group_by challenge	7	1696	April 29, 2020
Is there an itertools fn for this?	16	568	January 18, 2023

Itertools group_by confusion

Related Topics