Question about Iterators, Closures, Borrowing, and Lifetime

I'm working through the rust book and I'm on chapter regarding closures. I'm refactoring the minigrep code with what I've learned and ran into a compilation issue that I don't understand.

// Fails with "error[E0373]: closure may outlive the current function, but it borrows `query`, which is owned by the current function"
pub fn search_iter<'a>(
    query: &str,
    contents: &'a str
) -> impl Iterator<Item = &'a str> {
    contents
        .lines()
        .filter(|line| line.contains(&query))
        //      ^^^^^ add a move before this to fix
}

However, this builds fine:

pub fn search_case_insensitive_iter<'a>(
    query: &str,
    contents: &'a str
) -> impl Iterator<Item = &'a str> {
    contents
        .lines()
        .filter(|line| line.to_lowercase().contains(&query.to_lowercase()))
}

I understand that query.to_lowercase() will return a string, but I don't understand how that changes things.

My understanding is as follows:

  1. "query" is passed in as a &str, and is therefore a reference (string slice reference).
  2. By default, closures capture by reference. Adding move will cause the closure to take ownership of the captured item.
  3. Iterators are lazy, so the filter isn't even executed at the time a caller invokes either of these functions, but while iterating over data afterward.

If query is a string reference, why does adding "move" do anything? The closure outlives the "search" function, but it also outlives it in the function with to_lowercase.

Why does the to_lowercase returning a temporary string change this? I would assume that it's based on the fact that the string would be owned by the closure since it was created within its scope, but why then is the reference to query valid?

2 Likes

This also works. I don't have a complete explanation, but apparently not the borrow checker, but the type inference is the issue.
If you look at contains it has a generic bound on the nightly only trait Pattern. It has both a impl for &'b str and &'c &'b str. And i guess if it picked the second one it would lead to this error. This also explains why move fixes this.
By specifying the generic explicitly i force the compiler to use the fact that references are copy to make a copy of it.

edit:
This doesn't compile:

pub fn search_case_insensitive_iter<'a>(
    query: &str,
    contents: &'a str
) -> impl Iterator<Item = &'a str> {
    let lowercase: String = query.to_lowercase();
    contents
        .lines()
        .filter(|line| line.to_lowercase().contains(&lowercase))
}

So for me it seems like in the second example it may recompute the to_lowercase every time the closure is called? But i'm not sure about that. I would probably compute the String before and use move just to prevent that.

2 Likes

The issue is the closure, by default, captures a reference to the string reference. The underlying data (the raw str) still exists, but the reference to the data (&str) only lives for as long as the function, so the reference to that reference (&&str) will stop being valid when the function returns. Adding move makes it capture the string reference (&str) instead.

For search_case_insensitive_iter, adding move actually does nothing at all, because query is already an owned capture. That's because the closure can see the receiver of to_lowercase is the owned version instead of a reference (&str instead of &&str).

This may be confusing since &str is a Copy type, so going from &&str to &str is free, but the closure capturing rules don't distinguish between Copy and !Copy types.

1 Like

So to rephrase, you're saying that the move serves to establish a reference the original string slice, instead of a double reference which will go out of scope at the end of the function? If so, that makes sense I think!

Could you elaborate on what you mean by "the receiver"? When you say the "owned version", are you referring to a reference to the String returned by to_lowercase (a string slice through the addition of the ampersand)?

Additionally, how does that square with @increasing 's playground link where the following line satisfies the compiler:

        .filter(|line| line.contains::<&str>(query))

Does specifying the type have an impact on ownership?

Capture analysis prefers to capture by shared reference, then exclusive reference, then moves. The failure cases capture query by reference. There have been two types of succeeding cases presented: Those that force query to move, and those that capture &*query by reference.

Here's some non-move code samples, what they call, and whether or not they fail.

// The OP -- would call `contains::<&str>`, fails
.filter(|line| line.contains(&query))

// Getting rid of an unnecessary `&` would call `contains::<&str>`, fails
.filter(|line| line.contains(query))
// Fails with *mismatched types* (illustrating a lack of auto-ref)
// i.e. demonstrating that the above is not trying to call `contains::<&&str>`
.filter(|line| line.contains::<&&str>(query))

// Turbofishing `contains::<&&str>` -- fails
.filter(|line| line.contains::<&&str>(&query))

// Turbofishing `contains::<&str>`, but relying on deref coercion -- fails
.filter(|line| line.contains::<&str>(&query))

// Turbofishing `contains::<&str>` -- *succeeds*
.filter(|line| line.contains::<&str>(query))

// Inserting manual reborrow: calls `contains::<&str>` -- *succeeds*
.filter(|line| line.contains(&*query))

// Manual reborrow and a temp reference: calls `contains::<&&str>` -- *succeeds*
.filter(|line| line.contains(&&*query))

About the type mismatch example: The closure captures a field = &query and then calls line.contains(*field). It's still enough to capture a shared reference, like closure analysis prefers to do, because query is Copy.

For the other examples, what's the difference internally? The failing ones are capturing a reference to query, whereas the succeeding ones are capturing a reference to *query. That's RFC 2229 in action. You can see this in the playground using some unstable attributes.

// Capturing &query
note: Min Capture query[] -> Immutable

// Capturing &*query
note: Min Capture query[Deref] -> Immutable

If you try the compiling versions on Edition 2018, they don't use RFC 2229 rules, and they become errors. Capturing *query by reference isn't an option on that edition.


Likewise, search_case_insensitive_iter fails on edition 2018 without move.

On edition 2021+, it captures *query by shared reference, not by value / move. (Add move to see a difference.)

With query.to_lowercase(), the capture analysis can tell that it is enough to capture *query, whereas with .contains(query), it cannot. And I agree the reason is that contains takes a generic. This is probably similar to how a generic can make the compiler not do a reborrow of a &mut _.


So to clarify:

  • search_case_insensitive_iter does capture by shared reference
  • The OP error may remain when the receiver of contains is &str, not just &&str

They do; note the difference in the first pass.

It has an impact on the capture analysis, which could sometimes mean borrowing some difference place (as above) or reborrowing instead of moving.

3 Likes

That's quite a lot of information and I admit most of it went over my head.

I've never heard of reborrowing, but I think I get the gist of it from your reply. If I understand what you're saying, by dereferencing and then re-referencing, I "renew" the borrow, essentially creating a new reference to query in the closure that isn't tied to the scope of the outer function.
For whatever reason, calling query.to_lowercase() causes a capture of the dereferenced query by reference, which is sort of like an implicit reborrow? However, with .contains(query) it's taking a generic, which is not triggering a reborrow.

In this case, would it be more idiomatic to add a "move" to search_iter or to reborrow? If I add the "move", would it not be better to make the case_insensitive version more explicit like:

pub fn search_case_insensitive_iter<'a>(
    query: &str,
    contents: &'a str
) -> impl Iterator<Item = &'a str> {
    let query = query.to_lowercase();
    contents
        .lines()
        .filter(move |line| line.to_lowercase().contains(query))
}

The “receiver” of a method is the self parameter, or its type. So, for example, in the case of str::to_lowercase(), we have:

impl str {
    pub fn to_lowercase(&self) -> String

&self is a shorthand; let's replace it with the explicitly typed version:

pub fn to_lowercase(self: &Self) -> String

Let's get rid of the Self type alias too; to do this, you take whatever type is in the impl header (in this case, str) and replace Self with it exactly

pub fn to_lowercase(self: &str) -> String

So, the receiver of str::to_lowercase has type &str. This is the value that will be actually passed to the method; when you use the . method call operator, it will automatically reference or dereference the value on the left to make it fit.

Moving is sufficient for the OP. But reborrowing is what allows things like this to work:

fn example(vec: &mut Vec<String>) -> &str {
    // `&mut _` are not `Copy`, but somehow...
    vec.push("Hi".to_owned());
    // ...we didn't give it away to `push`...
    println!("{vec:?}");
    // ...and can even return a borrow based on `vec`...
    &*vec[0]
}
// ...even though it goes out of scope at the end of the function.

In brief, you can borrow some place through a reference without borrowing the reference itself, and (in the case of going through &mut _) it is a sort of "sub-borrow" -- you can't use vec until push returns, and the caller of example can't use the Vec until they stop using the return value. But the original vec reference itself can go away.

Right; reborrows happen automatically when you call functions with &_ or &mut _ annotations, but sometimes generics inhibit automatic reborrowing and a move happens instead. And something similar is going on with the closure capture analysis: it's deciding to capture &query -- a reference to the reference itself -- instead of a reference to its referent (&*query).

Probably move, if just because that's what the compiler suggests. I don't think it matters much.

String: Pattern doesn't hold so that doesn't compile. I don't think the case insensitive version needs updating. I view that version as working as we want it to, and the one that requires move or &*query as requiring a workaround because it doesn't work as we want it to.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.