How can I write a generic matcher?

I'm writing a small program which convert a few file formats, following a uniform configuration.

Configuration has simple expression to perform logical and/or of match & capture values using regex (or matching date or whatever).

  • I want to define trait Matcher wrapping regex::Regex, as some input have non-regex matching logic.
  • I want to let trait Matcher have a method check which gets their input and returns a reference containing with the same lifetime as the input (e.g. substring of input).

Now I have an issue related to GATs, I cannot implement my requirements without using generic_associated_types.

As the tool is my personal tool, I can be tolerate with building with nightly for a short period, but if I need to wait to see generic_associated_types comes to stable, I should try to find another way to achieve this...

How can I do the following codes without GATs?

#![feature(generic_associated_types)]

use std::convert::From;

trait Matcher: for<'a> From<&'a str> {
    type Input<'a>: Copy;
    fn check<'a>(&self, input: Self::Input<'a>) -> Option<&'a str>;
}

struct FooMatcher {}

#[derive(Clone, Copy)]
struct FooInput<'a>(&'a str);

impl<'a> From<&'a str> for FooMatcher {
    fn from(_: &'a str) -> FooMatcher {
        FooMatcher {}
    }
}

impl Matcher for FooMatcher {
    type Input<'a> = FooInput<'a>;
    fn check<'a>(&self, input: Self::Input<'a>) -> Option<&'a str> {
        if input.0 == "foo" {
            Some(&input.0)
        } else {
            None
        }
    }
}

#[derive(Clone, Copy)]
struct BarInput<'a>(&'a str);

struct BarMatcher {
    pattern: String,
}

impl<'m> From<&'m str> for BarMatcher {
    fn from(pattern: &'m str) -> Self {
        BarMatcher {
            pattern: pattern.to_string(),
        }
    }
}

impl Matcher for BarMatcher {
    type Input<'a> = BarInput<'a>;
    fn check<'a>(&self, input: Self::Input<'a>) -> Option<&'a str> {
        if input.0.contains(self.pattern.as_str()) {
            Some(&input.0)
        } else {
            None
        }
    }
}

struct OrMatcher<M: Matcher>(Vec<M>);

impl<'m, M, T> From<T> for OrMatcher<M>
where
    M: Matcher,
    T: AsRef<[&'m str]>,
{
    fn from(config: T) -> OrMatcher<M> {
        let mut ret = Vec::new();
        for cfg in config.as_ref() {
            ret.push((*cfg).into());
        }
        OrMatcher(ret)
    }
}

impl<M: Matcher> OrMatcher<M> {
    fn check<'a>(&self, input: M::Input<'a>) -> Option<&'a str> {
        self.0.iter().find_map(|m| m.check(input))
    }
}

fn main() {
    let m1 = FooMatcher {};
    println!("{:?}", m1.check(FooInput("foo")));
    println!("{:?}", m1.check(FooInput("bar")));
    let m2 = BarMatcher {
        pattern: "specified".to_string(),
    };
    println!("{:?}", m2.check(BarInput("foo")));
    println!("{:?}", m2.check(BarInput("==specified==")));
    let m3: OrMatcher<FooMatcher> = (&["cond1", "cond2"]).into();
    println!("{:?}", m3.check(FooInput("foo")));
    println!("{:?}", m3.check(FooInput("bar")));
    let m4: OrMatcher<BarMatcher> = (&["bar", "baz"]).into();
    println!("{:?}", m4.check(BarInput("foo")));
    println!("{:?}", m4.check(BarInput("bar")));
    println!("{:?}", m4.check(BarInput("baz")));
}

(Playground)

Output:

Some("foo")
None
None
Some("==specified==")
Some("foo")
None
None
Some("bar")
Some("baz")

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 2.56s
     Running `target/debug/playground`

This pattern can indeed be polyfilled without GATs:

trait MatcherInput<'a> { type T : Copy; }

/// Trait alias:
trait MatcherInputGat : for<'a> MatcherInput<'a> {}
impl<T : ?Sized> MatcherInputGat for T
where
    Self : for<'a> MatcherInput<'a>,
{}

trait Matcher : MatcherInputGat + FromStr {
    fn check<'a>(&self, input: <Self as MatcherInput<'a>>::T)
      -> Option<&'a str>
    ;
}

and implementors would have to explicitly spell out:

impl<'a> MatcherInput<'a> for Foo {
    type T = &'a u8;
}
impl Matcher for Foo { /* the rest */ }

rather than in one go

4 Likes

Here it is filled out, following the formula here (same approach mentioned above).

1 Like

Rust issue for the failing alias.

// Errors if you use this alias
// type Input<'a, This> = <This as HigherMatcher<'a>>::Input;

interesting.. but that’s a soundness issue, actually!!

use std::marker::PhantomData;

trait HigherMatcher<'a, _Outlives = &'a Self> {
    type Input: Copy;
}

type Input<'a, This> = <This as HigherMatcher<'a>>::Input;

impl HigherMatcher<'_> for &'_ () {
    type Input = ();
}
fn foo<'a>(y: &'a str) -> &'static str {
    let x: PhantomData<Input<'static, &'a ()>> = PhantomData;
    bar::<'a>(y, x)
}
fn bar<'a>(y: &'a str, x: PhantomData<Input<'static, &'a ()>>) -> &'static str
where
    'a: 'a,
{
    baz::<'a>(y, x)
}
fn baz<'a>(y: &'a str, x: PhantomData<<&'a () as HigherMatcher<'static>>::Input>) -> &'static str
where
    'a: 'a,
{
    qux(y)
}
fn qux(y: &'static str) -> &'static str {
    y
}

fn main() {
    let x = String::from("Hello World!");
    let y = foo(&x);
    drop(x);
    println!("{}", y);
}

You should update the issue (and/or #47511).

I was going to open a new issue. It’s a soundness issue on stable; seems like a different category of issue than the ones already open that you mentioned (although I only skimmed them so far).

1 Like

Turns out, you don’t even need the type synonym. Or a type argument on the trait. This works:

trait Trait {
    type Type;
}

impl<T> Trait for T {
    type Type = ();
}

fn extend_lifetime<'a>(s: &'a str) -> &'static str {
    use std::marker::PhantomData;
    fn helper<'a>(
        s: &'a str,
        _: PhantomData<<&'static &'a () as Trait>::Type>,
    ) -> &'static str {
        s
    }
    helper(s, PhantomData)
}

fn main() {
    let x = String::from("Hello World!");
    let y = extend_lifetime(&x);
    drop(x);
    println!("{}", y);
}

On a related note, this code fails with an error that’s probably not supposed to be emitted on an fn item

trait Trait {
    type Type;
}

impl<T> Trait for T {
    type Type = ();
}

type Type<T> = <T as Trait>::Type;

fn f<'a>(_: Type<&'a ()>) -> &'a str {
    ""
}

Opened

https://github.com/rust-lang/rust/issues/91068

1 Like

:grimacing:

So, the projection should be rejected but is not; is assumed to be well-formed; this implies 'a: 'static; lifetime gets extended. Do I have it right?

Good point, I know that error from playing with fn pointers.

(I'm sure you know, but fn pointers can't support 'a: 'b type relationships.)

I think what happens here is that the compiler thinks that fn f<'a>(_: Type<&'a ()>) -> &'a str has 'a in an argument type, so the lifetime is late-bound, but then the function’s type gets simplified into fn f<'a>(_: ()) -> &'a str and the lifetime no longer appears in an argument-type. You can make it compile by preventing late-bound lifetime with an additional where-clause like where 'a: 'a.

(Don’t ask me what exactly late-bound lifetimes are and what they’re good for; I have no idea, I only know how to prevent them with where-clauses whenever they become a problem.)

2 Likes

I'm hoping to find time to research this area, as APIT parameters may become late-bound type parameters. (And I also always wonder if I'm really comparing apples to apples when I have to add <'a: 'a> so that I can call f::<'a>().) In particular I want to suss out the relationship between higher-rankedness and late bound.

1 Like

I think the issue has to do with how closures / fn items get call sugar, so that there would technically be an ambiguity when turbofishing them:

f::<…>()
// is it:
(f::<…>).call(()) // early-bound parameters / type-bound / bound on instantiation (generic type)
// or
(f).call::<…>(()) // late-bound parameters / trait/impl-bound / bound on method call (generic method)

Now, since a "generic method" could be seen as a circular definition of this thing, let's imagine there not being generic methods, but still having generic traits:

(f).call::<…>(())
// can be seen as:
Call::<…>::call(f, ())

and if we are allowed to feed arbitrary … parameters to Call:

#![deny(elided_lifetimes_in_paths)]

let f: F = …;

if true {
    <F as Call<'a>>::call(f, ());
} else {
    <F as Call<'b>>::call(f, ());
}

then it means that the type of f —which doesn't need to be generic!— implements a higher-order Call. So, if we call F = typeof(f), we'd have:

F : for<'any> Call<'any>

On the other hand, for an type-bound lifetime parameter —modulo variance, but let's assume unrelated 'a and 'b lifetime parameters–, it would be impossible to write:

#![deny(elided_lifetimes_in_paths)]

let f: F<'_> = …;

// Error, `'_` can't be `'a` and `'b` at the same time.
if true {
    F::<'a>::call(f, ());
} else {
    F::<'b>::call(f, ());
}

So, a type-bound (lifetime) parameter can't be higher-order.

Thus, it is necessary for a (lifetime) parameter to be trait/impl-bound to be higher-order (and in practice, it is sufficient and thus equivalent).

So, we have this type-bound vs. trait/impl-bound distinction, and now, if we go back to closures / function items / ()-sugared-callables, since there is no syntactic way to distinguish these two kinds of parameters, I guess that Rust arbitrarily chose type-bound parameters, dubbed early-bound (since they're to be provided before we can even start talking of trait implementations for that type) as the ones that got the privilege of being turbofishable, and the higher-order parameters / method signatures got to be:

  • non-turbofishable;

  • dubbed late-bound, as an oposition to the early bound, since they're bound / chosen when called (at each call), which necessarily happens after the instantiation.

  • But if we think about this methaphorical <F as Call<'late>::call expression, it's yet again, another ZST function item for whom 'late is actually early-bound! So we should be able to feed it.

    #![feature(fn_traits)]
    
    fn assert_static (_: impl Copy + 'static)
    {}
    
    fn foo<'late> (_: &'late ())
    {}
    
    fn main<'lt> ()
    {
        let f = foo;
        assert_static(f); // OK: `typeof(f)` does not feature lifetime parameters
        let g = <_ as FnOnce<(&'lt (), )>>::call_once;
        if false {
            g(f, (&(), )); // Infer that `_` is `typeof(f)`
            loop {}
        }
        assert_static(g); // Error, `g` is an item with an early-bound lifetime.
    }
    

The final two things would be that:

  • the moment a lifetime parameter is involved in some kind of bound, even a trivially true one such as 'lt : 'lt, Rust decides to make that parameter "early-bound", i.e., "type-bound", i.e., part of the generic lifetime parameters of the type (of the ZST function item) rather than that of its impl of FnOnce.

  • for<'any> fn(&'any ()) and the like are thus, conceptually, the representation of a unified type that represents a higher-order (callable) interface. When you think about it, besides implementation optimizations (slim ptr w/ 1-indirection pointer vs. wide ptr w/ 2-indirections), such a type is analogous to:

    &'static (dyn for<'any> FnOnce(&'any ()))
    

    &'static (dyn Sync + RefUnwindSafe + for<'any> FnOnce(&'any ())) to be more exact

2 Likes

Thanks all! The polyfill works well for me.

I still have one minor question. I still don't get about check signature in the impl.

impl Matcher for BarMatcher {
    // OK
    fn check<'a>(&self, input: <Self as MatcherInput<'a>>::T) -> Option<&'a str> {
    }
    // FAIL
    fn check<'a>(&self, input: BarInput<'a>) -> Option<&'a str> {
    }
}
error[E0195]: lifetime parameters or bounds on method `check` do not match the trait declaration
  --> tests/test_matcher.rs:63:13
   |
15 |     fn check<'a>(&self, input: <Self as MatcherInput<'a>>::T) -> Option<&'a str>;
   |             ---- lifetimes in impl do not match this method in trait
...
63 |     fn check<'a>(&self, input: BarInput<'a>) -> Option<&'a str> {
   |             ^^^^ lifetimes do not match method in trait

For more information about this error, try `rustc --explain E0195`.

I'd imagine it's coming from for <'a> in the alias MatcherInputGat, and it's not significant issue, but still curious to know the reason.

I'm curious, since I followed a much more horrible formula in petgraph long ago, for how many Rust versions has this been possible, for how long has petgraph been missing out on this? :slightly_smiling_face:

(Welp, seems to compile from Rust 1.3 with explicit lifetime names :open_mouth: )

So in light of the bug I've found, I think the more proper approach is to use the type synonym and add an explicit Self: 'a bound

type Input<'a, This> = <This as HigherMatcher<'a>>::Input;

trait Matcher: for<'a> From<&'a str> + for<'a> HigherMatcher<'a> {
    fn check<'a>(&self, input: Input<'a, Self>) -> Option<&'a str> where Self: 'a;
}

Or -- actually -- you might not want the Self: 'a bound at all, so just remove the _Outlives = &'a Self argument that enforces it?

trait HigherMatcher<'a> {
    type Input: Copy;
}

type Input<'a, This> = <This as HigherMatcher<'a>>::Input;

trait Matcher: for<'a> From<&'a str> + for<'a> HigherMatcher<'a> {
    fn check<'a>(&self, input: Input<'a, Self>) -> Option<&'a str>
    where
        'a: 'a; // avoid late-bound lifetime and corresponding error
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.