How to group elements in vector, and then split the groups into further groups

I have the following problem and I want to solve it in idiomatic Rust.

Given a set of lines, which are directed upwards. Action must be taken whenever an old line ends or a new line appears. I call this event.

I have created enum EventType { Begins(Rc<L>), Ends<Rc<L>> } and now have a vector of EventTypes, sorted in the order that I need the events to be processed by my algorithm.

Here is the question. Given this sorted vector of EventType, how to

  1. group it by collecting the EventTypes that are Equal as seen by my overload function
  2. for each of the previous groups, split it into two groups: the ones that have EventType equal to Begins, and the ones that have EventType Ends.

Simple example of the problem

I'm curious to know why cutting down on typing is the goal here?

Personally I would rather aim for code that is as obvious as possible to future readers. Which includes myself when I read the thing a year later and have forgotten all the tricks I used to minimize character/line count.

1 Like

Well, maybe I feel like a monkey that has just got powerful tools to play with (so many iterators in the standard library and itertools!), probably took it too far :smile:

1 Like

I would suggest showing some actual code or better yet linking to a working (if unoptimized) example on Rust Playground. I'm having a hard time following what your program is doing by your description

2 Likes

If you just want to sort things by some predicate, you can use slice::sort_by

1 Like

I have updated the post, maybe that helps a bit now to understand the problem?

You can use existing implementations of Ord/PartialOrd

starts.sort_by(|x, y|  x.line.start.y.cmp(&y.line.start.y));
// or if `Ord` isn't implemented
starts.sort_by(|x, y|  x.line.start.y.partial_cmp(&y.line.start.y).unwrap());

and you can use match to extract things from enums

match x {
    Event::Begin(value) => lhs = value.start.y,
    Event::End(value) => rhs = value.start.y,
}
3 Likes

In Rust syntax this would be

events.sort_by(|x, y| match (x, y) {
    (Event::Begins(lhs), Event::Ends(rhs))
    | (Event::Ends(rhs), Event::Begins(lhs)) => {
        lhs.start.y.cmp(rhs.end.y)
    }
    _ => panic!("double begins or double ends, what do?"),
});

I've included a panic, as your implementation didn't specify what to do if comparing two begins or two ends... I don't really understand what you want to do in that case, either?

It's still pretty confusing, though, what you want to do.

Could you give some example input datasets and what the output should be for each one? I think that could help us understand what you intend with this code.

Wow, I did nested match-expressions before, now using the construct that u pointed out!

I edited the post once again. Hope the question itself is now clear.

Sorry, I think it's slightly helpful but not exactly what I meant. Could you write out an exact input case - say, 4 elements that that input Vec<EventType> would be, and then the exact list of output EventTypes with inner data that you would expect? Like the exact data, if I were to have called eprintln!("{:#?}", x); on the input and outputs.

It'd just be helpful for thinking about the algorithm - an example case to demonstrate how it should ideally behave.

I see you've also put some compiling code in the original post now, though! I really appreciate that. Does that code produce the output you want now, or is it still wrong? If it's working, I would have some suggestions going off of @RustyYato's post for how to improve it and reduce the code. But I don't want to do that without knowing if it is behaving as you want it to.

Sure, here is my example. It is simple, and it compiles, and represents the problem that I am trying to solve. As you can see, I hardcoded (see TODO comment) the place that I need to implement. And also the code now is quite big, while it should be fine, I still would appreciate advice on reducing its size.

Regarding the example in the first post, I guess we can forget about it :slight_smile: , I will edit it when the problem is solved. I have abstracted away all the details in the recent example.

And yes, the example in the original post is what I currently have and what works as expected except that I need to finish the fn preprocess(...)

Awesome, thanks!

If it seems reasonable to implement PartialEq matching your PartialOrd, then I think what you want is itertools::group_by, from the itertools crate. It's an extra dependency, but I think it's worth it for how simple it makes this problem. In your latest example, if you add a PartialEq impl to Fruit which just calls your order function and checks if its equal, then your TODO line can be implemented as


let fruit_sorted_by_price_grouped: Vec<Vec<Fruit>> = fruit
    .into_iter()
    .group_by(|f| f)
    .into_iter()
    .map(|(f, items)| items.collect())
    .collect();

If you don't want to use that, or it isn't reasonable to implement PartialEq, we can duplicate its behavior pretty easily. You'll want to pull the comparator function out so we can use it down below as well. If you do that, then we can implement this with:

let fruit_sorted_by_price_grouped: Vec<Vec<Fruit>> = {
    let mut groups = Vec::new();
    let mut this_group = Vec::new();
    for f in fruit {
        if this_group.is_empty() || fruit_comparator(&this_group[0], &f) == Ordering::Equal {
            this_group.push(f);
        } else {
            groups.push(this_group);
            this_group = vec![f];
        }
    }
    groups
};

Here's a playground link adding this method into your new example: Rust Playground

Let me know if that captures the logic you're looking for, or if you have any questions about parts of it!

1 Like

Wow, it makes much more sense to me now. So the group_by() uses equliaty relation, which delegates to the ordering function. Cool! The result of group_by() can be turned into an iterator itself by calling into_iter() so we can iterate over the (key, group) tuples, extract the group with map() and move it to the vector.
Correct?

I would go with the first option, but the latter is interesting in that it uses the closure - I would not write this myself.

However, I get a lifetime error which I cannot explain. Here is my final code, this is the program so far and I would say that it does what I want (at least it should), but there is a lifetime error which I cannot explain. What is wrong with preprocess()? For the sake of complete clarity and to see the compiler errors, I pasted the whole program now. But the problem should be somewhere in preprocess().

code

My understanding of lifetime error messages is pretty shaky...

1 Like

Yep! Looks right to me.

Fair! If you have an equality relation, I'd probably use that rather than the closure in the second half too. It just uses the closure since that's what you had in your example implementing the ordering relation.

I think my example was incorrect here, too. It's due to using a reference as the key to group_by - I think it might need to be copied or cloned rather than just given back.

The function to group_by is usually for extracting some sort of "key" to sort elements by. But it has to be copied - it can't actually borrow from the elements, since group_by will move the elements out into the iterator results, and it still needs the key to use it later & compare with later elements...

If your events can be cheaply cloned, then I recommend replacing |x| x with something like |x| x.clone() or types::EventType::clone. If not, then I recommend going with the manual non-itertools version.

I agree that using the closure in the second version is icky. But it should be easy to change it to use an Ord implementation, or anything else, rather than the closure.

1 Like

As you said, after implementing Clone for my EventType, it works. Wow, learned so much from this thread, thanks everyone for participation.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.