Type inference of generic parameters

I'm currently working on a simple implementation of grep to get to know the language. However, when introducing some generics, the compiler throws an error about it not being able to infer the type of a generic parameter. Here's a minimum working example without any of the real functionality:

use ignore::WalkBuilder;
use std::path::{Path, PathBuf};
use std::io;
use std::fs::File;
use clap::Parser;

mod parser {
    use super::*;

    #[derive(Parser)]
    #[command(name = "grep", version, about, long_about = None)]
    pub struct Grep {
        /// Path/paths to search for expression
        pub path: PathBuf,
    
        /// Search recursively in current directory
        #[arg(short, long, default_value_t = false)]
        pub recursive: bool,
    }
}

fn main() -> io::Result<()> {
    let args = parser::Grep::parse();
    find_matches(args.recursive, &args.path, &args, find_match_in_file)?;
    Ok(())
}

fn find_matches<F, P, S>(recursive: bool, path: P, args: &parser::Grep, cb: F) -> io::Result<()>
where P: AsRef<Path>,
      F: Fn(S, &parser::Grep) -> io::Result<()>,
      S: AsRef<Path>,
{
    let walker = if recursive {
        WalkBuilder::new(path).max_depth(None).build()
    } else {
        WalkBuilder::new(path).max_depth(Some(1)).build()
    };

    for entry in walker {
        let entry = match entry {
            Ok(entry) => entry,
            Err(_error) => {
                eprintln!("Skipping path");
                continue;
            }
        };
        let p = entry.path();

        if p.is_file() {
            println!("Scanning file: {}", p.to_str().unwrap_or("Unknown file"));
            match cb(p, args) {
                Ok(()) => { /* Do nothing */ },
                Err(error) => eprintln!("Couldn't search file: {:?}", error),
            }
        }
    }
    Ok(())
}

fn find_match_in_file<S>(path: S, _args: &parser::Grep) -> io::Result<()>
where S: AsRef<Path>,
{
    let _file = File::open(path)?;
    Ok(())
}

I get the following error when compiling:

error[E0283]: type annotations needed
  --> src/bin/test-clap.rs:24:53
   |
24 |     find_matches(args.recursive, &args.path, &args, find_match_in_file)?;
   |                                                     ^^^^^^^^^^^^^^^^^^ cannot infer type of the type parameter `S` declared on the function `find_match_in_file`
   |
   = note: multiple `impl`s satisfying `_: AsRef<Path>` found in the following crates: `clap_builder`, `std`:
           - impl AsRef<Path> for OsString;
           - impl AsRef<Path> for Path;
           - impl AsRef<Path> for Str;
           - impl AsRef<Path> for clap::builder::OsStr;
           - impl AsRef<Path> for std::ffi::OsStr;
           - impl AsRef<Path> for std::path::PathBuf;
           - impl AsRef<Path> for std::string::String;
           - impl AsRef<Path> for str;
note: required by a bound in `find_matches`
  --> src/bin/test-clap.rs:31:10
   |
28 | fn find_matches<F, P, S>(recursive: bool, path: P, args: &parser::Grep, cb: F) -> io::Result<()>
   |    ------------ required by a bound in this function
...
31 |       S: AsRef<Path>,
   |          ^^^^^^^^^^^ required by this bound in `find_matches`
help: consider specifying the generic argument
   |
24 |     find_matches(args.recursive, &args.path, &args, find_match_in_file::<S>)?;
   |                                                                       +++++                                                                                        +++++  

According to the docs for the ignore crate, and it's structs, iterating over ignore::Walk should yield ignore::DirEntry, and calling .path() on those should return &Path. So why can't rust infer that the type parameter S is &Path?

Let's hold off on the ambiguity; there's a different problem here.

This says, "the caller can choose any S: AsRef<Path>". Then you...

        let p = entry.path();

        if p.is_file() {
            println!("Scanning file: {}", p.to_str().unwrap_or("Unknown file"));
            match cb(p, args) {

...try to pass it a &Path. But the callback doesn't accept &Paths, it only accepts some S of the caller's choosing.

So you should just require that F: Fn(&Path, &parser::Grep) -> io::Result<()> instead.

fn find_matches<F, P>(recursive: bool, path: P, args: &parser::Grep, cb: F) -> io::Result<()>
where P: AsRef<Path>,
      F: Fn(&Path, &parser::Grep) -> io::Result<()>,

However, that will lead to a new error:

42 |     find_matches(args.recursive, &args.path, &args, find_match_in_file)?;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `Fn` is not general enough
   |
   = note: `for<'a> fn(&'2 Path, &'a Grep) -> Result<(), std::io::Error> {find_match_in_file::<&'2 Path>}` must implement `Fn<(&'1 Path, &Grep)>`, for any lifetime `'1`...
   = note: ...but it actually implements `Fn<(&'2 Path, &Grep)>`, for some specific lifetime `'2`

The problem is related: like find_matches before the above change, find_match_in_file is designed so that the caller chooses some S: AsRef<Path>. But the type you need to be able to pass to this callback is a &'local_borrow Path -- a type which is not nameable (choose-able) outside of the function body of find_matches. There's nothing you could fill in to the turbofish of find_match_in_file::<_> that would work.

There are a few ways forward from here:

  1. Use a closure in the call to find_matches.

    find_matches(
        args.recursive,
        &args.path, 
        &args, 
        |p, g| find_match_in_file(p, g)
    )?;
    
  2. Make find_match_in_file less generic.

    fn find_match_in_file(path: &Path, _args: &parser::Grep) -> io::Result<()>
    
  3. Make find_match_in_file take a reference of any lifetime, but still be generic.

    fn find_match_in_file<S>(path: &S, _args: &parser::Grep) -> io::Result<()>
    where S: AsRef<Path> + ?Sized,
    

(Incidentally, this last form is what function generic over AsRef<Path> should always be.[1] It's more general, as this post illustrates, and it requires less indirection in the "best case" -- when passing in a &Path.[2] And the only thing you can do when AsRef<_> is the only bound is call as_ref, which takes a reference.)


  1. When you just need to use the path and discard it anyway. ↩︎

  2. std used to use this form pre-1.0, but tragically changed to the form we're all more familiar with for purely aesthetic reasons (!) ↩︎

The answer to this was sort of buried in my response: generics parameters on a function are chosen by the caller, not by looking at the function body. The function body has to work for any valid choice of the generic parameter (where valid means "meets the bounds" basically).

The original ambiguity error was basically because there's nothing at the call site to drive inference to choose any particular S: AsRef<Path>. Even if function bodies could drive inference at the call site,[1] it wouldn't make the overall code work -- after all, a caller could just call find_matches<_, _, PathBuf>(..) explicitly. I.e. what I emphasized before still has to hold: the function body has to work for any valid choice of parameter.


  1. which would be an abstraction violation and probably cause a lot of inference-related breakage whenever someone updated a function body ↩︎

Yup, and Rust is not kidding here. It really means any type, even types that don't exist yet and may be written in the future.

It's not a matter of the compiler checking what could work, but the language wanting to give a guarantee that the functions are universal and only rely on the abstract trait bounds they defined.

Okay, thanks for the response! So, the main take away is that rust won't dig through function bodies to infer the concrete type of a generic type, but rather that has to be provided at the call site, either by providing a concrete type as an argument or by using turbofish-like syntax?

A follow up question on that is how the closure solution to the problem then works? You are still not specifying a concrete type here, since the parameters to the closure are not type annotated? Or what is special about closures in this regard?

That seems correct, but the main point is, the generic parameters are ... well ... parameters. Like the "normal" value parameters, they come from outside (from caller) and the function must be able to handle any valid "value" (which is a type in this case) of them.

Now your find_matches function cannot handle e.g. S = String, F = fn(String, &Grep) -> io::Result<()> (which is valid, it does satisfy your where bounds) so it cannot compile. The caller can[1] always call e.g. find_matches::<fn(...) -> ..., Path, String>(...) and the function has no power to avoid that.

Just like your function must be able to handle any value of e.g. recursive parameter and it cannot assume it is true and somehow force the caller to never pass false.

There are no[2] output generics in current Rust (and if there were, they would probably use different syntax).

Type inference is completely orthogonal to all of this.


  1. It's not important here that type inference does not look into function body. It is that the caller can always opt out of inference and provide generics explicitly. ↩︎

  2. except perhaps for impl Trait in return position ↩︎

It's not so much "the compiler chose not to dig through function bodies" so much as "digging through function bodies doesn't make sense here". The function body does not have a say in what the generic parameter ends up being. The caller decides that.

You were passing a specific type, a &Path, to the closure within the function body. That's a "choice" the function body made. The caller has no say in that, so it didn't make sense for the closure argument to be represented by a generic.

The closure was a solution to a follow-up problem (which was not an inference problem), and not the original (inference) problem. By that point I had removed the S parameter from find_matches.

It's still instructive, so I'll work through it. But it's not a problem about inference, so it's somewhat off-topic.


Anyway, let's start from here. The problem is that F: Fn(&Path, &parser::Grep) -> io::Result<()> means that there must be a single type F which implements Fn(&'lt Path, ...) for all lifetimes 'lt. But the notional implementation of Fn by find_match_in_file<S> cannot meet that bound. The notional implementation looks something like this:

// (Simplified for the sake of presentation, lmk if you want playground code)
impl<'grep, S: AsRef<Path>> Fn(S, &'grep parser::Grep) for find_match_in_file<S> {
    type Output = io::Result<()>;
    fn call(&self, path: S, _args: &'grep parser::Grep) -> Self::Output {
        ...
    }
}

You're trying to pass a &Path, and the bounds that are satisfied by the function are

find_match_in_file<&'a Path>: for<'x> Fn(&'a Path, &'x parser::Grep) -> io::Result<()>
// same thing:
find_match_in_file<&'a Path>: Fn(&'a Path, &parser::Grep) -> io::Result<()>

find_match_in_file<&'b Path>: Fn(&'b Path, &parser::Grep) -> io::Result<()>
find_match_in_file<&'c Path>: Fn(&'c Path, &parser::Grep) -> io::Result<()>
find_match_in_file<&'d Path>: Fn(&'d Path, &parser::Grep) -> io::Result<()>
// ...
find_match_in_file<&'static Path>: Fn(&'static Path, &parser::Grep) -> io::Result<()>

Like Vec<T> and Vec<U> are different types (unless T = U), the function items find_match_in_file<&'a Path> and find_match_in_file<&'b Path> are different types (unless 'a = 'b). So there is no single find_match_in_file<_> type which satisfies the bound on the callback -- no single type which meets the bound on the closure parameter F.

In an impl, if a generic lifetime shows up in the parameters and in the implementing type, then the impl is not creating an implementation that meets a F: Fn(&'_ ...)[1] bound. The lifetime has to show up in the parameters only, and not in the implementing type.

Or in terms of turbofish: it's impossible to write out a find_match_in_file::<X>(...) that can take a &'path Path with any lifetime, it can only take the lifetime in X.


Okay, so that's the problem. Now let's look at the solutions. You asked about the closure solution specifically:

    find_matches(args.recursive, &args.path, &args, |p, g| {
        find_match_in_file(p, g)
    })?;

Here, the closure meets the bound by having an implementation like this:

impl<'path, 'grep> Fn(&'path Path, &'grep parser::Grep) for Closure {
    type Output = io::Result<()>;
    fn call(&self, p: &'path Path, g: &'grep parser::Grep) -> Self::Output {
        // Here within the function body, the lifetimes have been "reified"
        // (to make up a word on the spot) to some concrete lifetimes,
        // similar to how a generic parameter `T` would represent a single
        // type *within* the function body.  (`T` is chosen by the caller.)
        //
        // So there's no requirement for `find_match_in_file` to meet the
        // higher-ranked `F: Fn(&Path, ...)` bound in this context.
        find_match_in_file/* ::<&'path Path> */(p, g)
    }
}

(You could also do this with another function that calls find_match_in_file, it's not closure-specific magic or anything.) Hopefully that explains the fix: the closure handles every possible input lifetime by calling a "different function type" file_match_in_file::<&'path Path> for every lifetime.


And though you didn't ask, the other two fixes work by changing how find_match_in_file<_>'s notional Fn implementations are defined.

// fn find_match_in_file(path: &Path, _args: &parser::Grep) -> io::Result<()>
//
// The function item type is no longer parameterized by a type.
impl<'path, 'grep> Fn(&'path Path, &'grep parser::Grep) for find_match_in_file {
    type Output = io::Result<()>;
    fn call(&self, path: &'path Path, _args: &'grep parser::Grep) -> Self::Output {
        ...
    }
}
// fn find_match_in_file<S>(path: &S, _args: &parser::Grep) -> io::Result<()>
// where S: AsRef<Path> + ?Sized,
//
// Now it's paramertized by a type again, but the lifetime of the `&Path`
// is not part of that type parameter.
impl<
    'path, 
    'grep, 
    S: AsRef<Path> + ?Sized,
> Fn(&'path S, &'grep parser::Grep) for find_match_in_file<S> {
    type Output = io::Result<()>;
    fn call(&self, path: &'path S, _args: &'grep parser::Grep) -> Self::Output {
        ...
    }
}

In both cases, the 'path lifetimes shows up in the parameters, but not in the implementing type.

In terms of turbofish for the generic version, you end up using find_matches_in_file::<Path>, which does not have the 'path lifetime in the <..>.


Finally, why did we need to handle every lifetime in the first place? That is, why doesn't this work.

//              vvvvv
fn find_matches<'path, F, P>(recursive: bool, path: P, args: &parser::Grep, cb: F) -> io::Result<()>
where
    P: AsRef<Path>,
    F: Fn(&'path Path, &parser::Grep) -> io::Result<()>,
    // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We're only requiring F to work with a single 'path lifetime now, which is compatible with the original find_matches_in_file siguature. But now we get a new error:

error[E0597]: `path_source` does not live long enough
  --> src/main.rs:33:15
   |
27 | fn find_matches<'path, F, P>(recursive: bool, path: P, args: &parser::Grep, cb: F) -> io::Result<()>
   |                 ----- lifetime `'path` defined here
...
32 |     let path_source = PathBuf::new();
   |         ----------- binding `path_source` declared here
33 |     let p = &*path_source;
   |               ^^^^^^^^^^^ borrowed value does not live long enough
...
37 |         match cb(p, args) {
   |               ----------- argument requires that `path_source` is borrowed for `'path`
...
43 | }
   | - `path_source` dropped here while still borrowed

And this final part is a bit more related to the conversation about generic parameters. The caller chooses the 'path lifetime too, and again, the function body has to work with every possible choice that the caller can make -- the function body has no say in how long the lifetime is.

So the caller chould have chosen 'static for example.

In fact, it goes a bit further than that -- it's impossible for the caller to "name" or choose a lifetime that's shorter than the call to find_matches. Within the function body, generic lifetime parameters always represent some borrow duration longer than the function body. So it is never possible to borrow a local variable for as long as a lifetime parameter -- because all locals are moved or dropped by the time we exit the function body.

And we fix this problem by requiring the closure to work with all lifetimes -- including those shorter than the function body.

F: Fn(&Path, &parser::Grep) -> io::Result<()>
// AKA
F: for<'path,'grep> Fn(&'path Path, &'grep parser::Grep) -> io::Result<()>

These for<..>, "for all lifetime" bounds are called higher-ranked trait bounds (HRTBs).


  1. aka F: for<'x> Fn(&'x ...) ↩︎

Ok, thanks again, you predicted correctly that I was wondering about the other two solutions/cases as well! I didn't know that & was actually syntactic sugar for HRTBs, good to know! So, if I understand it correctly, a generic type parameter T can not coerce into for<'a> &'a<some-concrete-type>, but it can coerce into &'some_specific_lifetime ? And by specifying find_matches_in_file as

find_matches_in_file<S: AsRef<Path>>(path: &S, args: &parser::Grep)

it is now possible for S to coerce into &Path for any lifetime?

Right, a generic type parameter represents a single type, and can't expand to something which varies based on 'a underneath for<'a>.

Another scenario this comes up in is when you want a closure that returns a borrow from its inputs:

fn works<F: Fn(&str) -> &str>(_: F) {}
fn fails<R, F: Fn(&str) -> R>(_: F) {}

fn main() {
    works(str::trim);
    fails(str::trim); // error[E0308]: mismatched types
                      // one type is more general than the other
}

Because R can't become &'a str in for<'a> Fn(&str) -> R.

Not quite -- S can coerce into Path, which does not contain a lifetime. (You also need ?Sized to relax the implicit Sized bound, because Path is not Sized.)

I.e. S becomes Path in for<'a, 'b> Fn(&'a S, &'b parser::Grep) -> io::Result<()>.

Another way to think of it is that S is coming from somewhere outside the for<..>. Outside of the for<'a, 'b>, the lifetimes 'a and 'b don't exist, so the definition of S cannot rely on them.

Yes, you are right, I meant to say that &S coerces to &&Path. Also good to know Sized is an implied bound (for all generic types?). But other than that, I think I understand why it works the way it works now, so thanks! I'll mark the first reply as the answer.

I believe the only exceptions are the implementing type of traits (Self) in the trait definition, and type aliases (which don't enforce trait bounds generally).

trait Example {
    // By which I mean, `Self` here does not have an implict `Sized` bound.
    fn example(self: &Self);
}

// But this `T` introduced by `impl<..>` does have an implicit `Sized` bound.
impl<T> Example for T {}

// No implicit `Sized` bound.
type Indirect<T> = Box<T>;

More examples of implicit Sized bounds.

// Implicit `Sized` bound.
fn example<T>() {}

trait Other {
    // implicit `Sized` bound on `Assoc`.
    type Assoc;
    // implicit `Sized` bound on `T` and on `Gat<'_, T>`.
    type Gat<'t, T>;
}

// implicit `Sized` bound (`Box<impl Trait + Sized>`)
fn rpit() -> Box<impl Trait> { ... }

// implicit `Sized` bound (`&(impl Trait + Sized)`)
fn apit(_: &impl Trait) {}

// implicit `Sized` bound
struct S<T>(Box<T>);