Returning an iterator - cannot express the return type

I have a function that takes a map from string to count, and returns an iterator on those values in decreasing order of count. The function is relatively simple (advice on how to improve it is, of course, welcome :slightly_smiling_face:) but I don't know how to annotate the return type.

fn sorted(words: HashMap<String, i32>) {
    let vec: Vec<(i32, String)> = words.into_iter().map(|(s,n)| (n,s)).collect();
    let mut heap: BinaryHeap<(i32, String)> = vec.into();
    std::iter::from_fn(|| heap.pop())
}

The compiler isn't much help here, it says "note: expected unit type () found struct FromFn<[closure@src\main.rs:30:24: 30:37]>". But of course, I can't express that type statically in my code. I assume I should probably use a generic somehow, to let Rust infer the type, but I don't know how to do that. Can anyone help?

You should use the feature described under Returning Types that Implement Traits in chapter 10.2 of the Rust book. It looks like this:

fn sorted(words: HashMap<String, i32>) -> impl Iterator<Item = (i32, String)> {
    let vec: Vec<(i32, String)> = words.into_iter().map(|(s,n)| (n,s)).collect();
    let mut heap: BinaryHeap<(i32, String)> = vec.into();
    std::iter::from_fn(|| heap.pop())
}

As a side-note, it would be faster to just sort the vector unless you only actually consume a few of the items and throw away the iterator before reading it to the end.

2 Likes

Ah, thanks! I thought there would probably be something simple, but I didn't know what terms to use to search for it.

Yes, I realise that, thanks. In my caller, I'm doing .take(n) where n is a user-specified limit on how many to report. The data is word counts from a number of books' worth of data, so I'm assuming it's reasonably big.

In reality, though, this is a toy example. I found it in a post that was describing how you'd code a somewhat real-world problem in modern C++, and I wanted to see how it looked in Rust, if nothing else as an exercise to learn from something a little more complex than a tutorial example.

Note that you can achieve the same in this way:

fn sorted_truncated(words: HashMap<String, i32>, len: usize) -> Vec<(i32, String)> {
    if len == 0 {
        return Vec::new();
    }

    let mut vec: Vec<(i32, String)> = words.into_iter().map(|(s,n)| (n,s)).collect();
    
    vec.select_nth_unstable(len - 1);
    vec.truncate(len);
    vec.sort_unstable();
    
    vec
}
3 Likes

Just a minor note that the closure needs to be move as the heap is created locally.

-    std::iter::from_fn(|| heap.pop())
+    std::iter::from_fn(move || heap.pop())

Playground.

2 Likes

Oh wow, that's much better - I wasn't aware of select_nth_unstable. And thete's even a _by_key version that means I can skip that awkward swap of the elements of the map.

I wondered about that. I'm not 100% clear on when move is needed yet, and the version without didn't give any errors. So for my education what is actually going to go wrong with the version without move? It's working in my tests...

Edit: Oh wait, no it's not :slightly_frowning_face: Maybe I didn't test again after fixing the return type (before I needed the return, I was doing the .take and the printing of the results within the function, so nothing outlived the heap variable...) Thanks for the fix!

If you remove the move in my Playground link, you can see the error that results - the function owns the heap, the closure captures a reference to it, and then the closure (containing the reference) is returned. But then the heap is dropped at the end of the function, hence the borrow check error.

With move, the closure captures and owns the entire heap instead. You return the closure, including the owned heap, and so it doesn't drop until later.

Without the move, the compiler generates the following struct for the closure:

struct MyClosure<'a> {
    heap: &'a mut BinaryHeap<(i32, String)>,
}
impl<'a> MyClosure<'a> {
    fn call(&mut self) -> Option<(i32, String)> {
        self.heap.pop()
    }
}

If you add a move, then it takes ownership instead:

struct MyClosure {
    heap: BinaryHeap<(i32, String)>,
}
impl MyClosure {
    fn call(&mut self) -> Option<(i32, String)> {
        self.heap.pop()
    }
}

The version with the reference doesn't compile because the BinaryHeap is destroyed when you return from the function, which invalidates the reference.

4 Likes

If I want to use _by_key, I need vec.select_nth_unstable_by_key(len - 1, |(_,n)| *n). That took me a while to understand:

  1. I got an error saying that just using n wouldn't work, because the return value needed to outlive the argument. OK, I understood that, although I was a little confused because n is a number, so who cares[1]?
  2. So my first thought is that I need to take a copy. But .copy() doesn't work, oh wait there's .clone(). And that works, and I might have left it at this point. But digging around in the docs I see that copy is the bit-by-bit one, so why don't numbers implement copy? It's at this point that I notice (by looking at the tooltip in VS Code) that n is actually &i32, and then I remember that there's *n to deal with this. And that works, so I'm good.
  3. And I even understand the reason that the closure takes an &(String, i32) is because there's no need to copy here, just pass a reference. And then there's something about parts of a reference are themselves references (which I vaguely recall has a name, but can't remember much more than that) so that's why I have an &i32.

So all of that is more or less good. But that * bothers me a bit. I don't see a lot of * operators in other people's Rust code that I'm reading, so is that really good practice? Is there a better way that I could have done this? What I have works, and isn't notably verbose, so I don't need anything better, but as I said, this is about learning, so I'm interested in how I should think about this sort of thing, and what is idiomatic. Would I have been better not destructuring the argument, and using |e| e.1?

Also, how should I have thought about what was happening with the key function to simplify that convoluted train of thought I described above?

Thanks for any help, and for what's been given so far. I'm finding this very illuminating :slightly_smiling_face:


  1. It's actually a reference, I get that point later... ↩ī¸Ž

1 Like

That's a bunch of questions. Here are a bunch of answers.

  1. There's no such thing as .copy(). The way you invoke the bitwise copy of the Copy trait is exactly what you're doing by writing *n.
  2. The .clone() method does the same thing as *n for types that are Copy.
  3. The reason that n is a reference is because when you write |(_,n)|, this is a pattern. You are matching on the type &(String, i32) with a tuple pattern. This has the effect of giving you a reference to each field of the tuple, which is why you get n being an &i32.
  4. Not being able to return a reference from the *_by_key methods is a limitation of the way generic closures work. If you run into a situation where using Copy doesn't work, then you can use *_by instead which gives you two elements to compare. In your case that would be select_nth_unstable_by(len-1, |((_,n1), (_,n2))| n1.cmp(n2))
  5. The reason you rarely explicitly write *n is that many operators automatically dereference for you. For example. when calling a method with the dot operator (such as in n.clone()), then the compiler automatically inserts the appropriate number of * or & operations on n. Using it explicitly in situations like this one is perfectly fine, and you don't need to do |e| e.1 instead.
4 Likes

You might be interested in Itertools::k_smallest, then -- it only needs O(k) memory instead of O(n) memory of the select_nth_unstable+sort approach, though it's O(n 𝘭𝘰𝘨 k) runtime complexity instead of O(n + k 𝘭𝘰𝘨 k).

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.