Is `&'a Foo<'a>` a footgun?

Well, that's why the majority of the explanation is how it works in languages like C#, with a concrete example to work through?

Where's the "what's the problem" step?

I could make it longer and give more code examples, or draw an ASCII art diagram of the relations involved, but it's just the same information stated more verbosely. Which I'm happy to do if it's helpful, but it's already long enough to be a bit intimidating, so it's a balancing act (and I'm on my phone right now, so it took long enough already)

Yup. That's “a monad is a monoid in the category of endofunctors” step. You are bringing lots of notions from entirely different language with entirely different rules and different (albeit similar) typesystem.

Worse: you are invoking things that don't exist in Rust (and are also not a household items).

When you say this:

By casually bringing yet another layer of complexity in the process you come to the conclusion that this may help someone to understand covariance and contravariance.

Precisely. By bringing a whole pile of new information “stated more verbosely” you are not making things easier.

What you are describing is the next step: after you understand what Rust lifetime markups are, how they work and, yes, what are covariance and contravariance are in their use – you may want to learn why Rust rolls up all that into types and how that decision makes them similar to C# types and how that makes covariance and contravariance terms appropriate for different things that C# and Rust have.

That could be a nice thing to know… but believe me: teaching how multiplication and division works with negative numbers via the notions of abel groups and fields doesn't work. Really.

You have to go in opposite direction: first introduce addition and subtraction, then multiplication and division, then note how there are certain similarity… and then you can introduce more complicated math objects.

Teaching Rust's covariance and contravariance throught their C# siblings… just don't go there, it wouldn't work.

Err... I really don't want to make this into an argument about how to teach stuff ... that's severely off-topic, but suffice to say I feel you're completely misunderstanding the concept of the explanation. Which is concerning, given it's an explanation.

1 Like

Thanks for writing this up, I thought it was very good!

In case you're interested in feedback and possible improvements: The only part that didn't feel obvious was the transition from regular type params to Rust lifetime params. It leaves out what subtyping means for lifetime params: is the subtype the shorter lifetime or the longer, and why? And a short example with lifetimes and explanation is probably needed.

4 Likes

That's fair, that's about when I was getting a bit tired and was about to get ready to meet someone so I'm not too surprised :sweat_smile:

Hopefully it's at least implicit from general Rust usage that you can pass the longer lifetime in as a parameter, so the subtyping relationship is the shorter lifetime is the subtype, but then generalizing that to a (shared) reference with a lifetime parameter is as a field an output usage and therefore contravariant, but as a parameter type an input usage and therefore covariant, etc. is a bit much to expect someone to figure out without any hints.

The general rule of thumb of following which direction the value goes relative to the generic type will get you there eventually, at least, and the compiler will tell you which it is if you get it wrong, so hopefully the important part of understanding variance in general got through.

1 Like

Here are some random thoughts from the last few comments. But they are almost surely more confusing to learners than elucidating,[1] so maybe no-one should read them :slightly_smiling_face:.


Currently, self-referential structs are particularly problematic when it comes to relying on compiler suggestions.

Consider this playground: the error is basically saying, I should make the elided lifetime outlive 'a. Okay, fine:

 impl<'a> Snek<'a> {
-    fn set(&mut self, range: Range<usize>) {
+    fn set<'this>(&'this mut self, range: Range<usize>)
+    where
+        'this: 'a,
+    {

First of all, the compiler didn't let me know that this is effectively[2]

 impl<'a> Snek<'a> {
    fn set(&'a mut self, range: Range<usize>)

which is sugar for

 impl<'a> Snek<'a> {
    //           vvvvvvvvvvvvvvvv anti-pattern
    fn set(self: &'a mut Snek<'a>, range: Range<usize>)

and second of all, the compiler doesn't warn that &'a mut Thing<'a> is an anti-pattern which you never want.

So when you apply the "natural" fix, you end up with code that practically always is unworkable. The self-referencial struct is borrowed forever. While it's sort of amazing that the compiler is smart enough to allow the sound creation of a self-referential struct in the first place, because the result is unworkable,[3] the diagnostics should really be guiding you away from that instead of towards that.

And this does bite people: they don't realize self-referential (using references) types aren't supported by the type system in a usable way, so they try to implement them, and the compiler "helpfully" guides them into borrowing forever to achieve the self-referential value. (Then turns around and gives you a bunch of errors due to the severe restrictions.)

IMO the status quo is "fixable"[4] by making the diagnostics smarter, by having some warn-by-default lints around borrowed-forever type signatures, etc.


All subtyping in Rust is lifetime-related (so far), but type parameters also have variance.

It's not just lifetime parameters. For example, when you run into surprising invariance for the first time, it's probably due to &'a mut T. That type is covariant in 'a but invariant in T. So it's actually the variance of the type parameter that suprised you.

Higher-ranked types are subtypes of non-higher-ranked types; for example:

type Super = fn(&'static str);
type Sub = for<'a> fn(&'a str); // AKA fn(&str)

// This compiles
fn coerce(f: Sub) -> Super {
    f
}

For now you can only be higher-ranked over lifetimes. But we may get for<T: Some Bound> ... types eventually. If so, we'll probably have subtyping that isn't lifetime-related.

Sometimes people think they can rule out subtyping related issues by adding a 'static bound, but Super and Sub in the example above both meet a 'static bound, so it's not actually true. This demonstrates that just because you can't name a temporary lifetime doesn't mean that there's no possibility of subtyping, even today.


Rust has bivariance (which you will probably never notice) but it probably doesn't mean what you think.

From the Ehsan blog:

there's a fourth case such as most of primitive types that are bi-variant, meaning they're both co-variant and contra-variant.

Rust doesn't support being both co- and contra-variant.[5] Rust does have something it calls bivariant, but it means "the generic can be anything and considered a subtype". For example, if a type parameter of a struct is bivariant,[6] you can coerce the parameter from one type to an unrelated type.

No, you're not likely to ever run into it in real code. But it does exist in the language.


  1. more so from top to bottom ↩︎

  2. why? because 'a: 'this was already implied by self: &'this mut Snek<'a>, and 'a: 'this plus 'this: 'a means that 'this == 'a ↩︎

  3. once you create the self-referential struct it's directly unusable: can't be moved, borrowed again, have a non-trivial destructor, etc ↩︎

  4. could be significantly improved ↩︎

  5. See the end of this comment. ↩︎

  6. which is possible when its only use is as an associated type bound ↩︎

You can coerce from subtypes to supertypes. So a &'long T is a subtype of &'short T.

Lifetime aren't types so it's all a bit of an abuse of terminology when we talk about their "subtypes". You actually have a choice between putting 'static at the top or the bottom of the hierarchy. So we could have said that the 'a in &'a T is contravariant -- and in fact, we use to! -- in which case shorter lifetimes are "subs" of longer lifetimes and 'static is at the top (and there is no bottom).

But (today) the terminology that has been settled on is to call it covariant. With that covariant terminology, the longer lifetime must be a "sub" of the shorter lifetime, and 'static is the bottom of the hierarchy (and there is no top).

Which is confusing to many. (But I guess having "contra" so prevalent was also confusing.)

4 Likes

:person_facepalming:

I find myself confused regularly with the sub- vs super terminology.

Thankfully in Rust syntax it’s always really simple.

You have something like

Foo : Bar

then it means that you can convert/coerce Foo into Bar. It’s always left-to-right conversion.

Of course subtyping isn’t directly expressible in Rust syntax, and lifetimes aren't types of values anyway… and also the : is used with traits, too; but even so, the directionality of the syntax is nicely consistent. I don't really need to think in these words at all. “subtype, supertype”. “outlives”, “implements (trait)”, “supertrait”, …

[Looking into further usage of :… I suppose x: T meaning x has type T is the only thing that doesn't quite fit.]

Examples: If I want to make

fn convert<'a, 'b>(s: &'a str) -> &'b str {
    s
}

compile successfully, I need &'a str : &'b str, which boils down to 'a : 'b

fn convert<'a, 'b>(s: &'a str) -> &'b str
where
    'a: 'b
{
    s
}

If I want to make

fn convert<T>(s: Box<T>) -> Box<dyn Trait + 'static> {
    s
}

work, I need T: Trait and T: 'static.

“Variance” just refers to the rules the compiler can use to determine (some of) these coercions. But you don't need to understand these abstract and general rules in order to understand concrete cases. Like you don't need to understand a Turing machine in order to understand for concrete cases whether or not something is a valid description of an “algorithm” / a computation. Like you don't need to understand monad laws, or any abstract theory of monads, in order to understand what singleton, map, and flat_map to for concrete collection(-like) types.

Ok, let's just determine a few coercions. I have A : B, and Box<A>, can I create Box<B>?

Yes, I can. This could even when just thinking about conversion functions. If I have f: fn(A) -> B and b: Box<A>, I can do Box::new(f(*b)) to get Box<B>. [In a very real sense, by writing an expression like Box::new(f(*b)), we effectively just formalized a proof that Box is covariant.[1] Yes, we just did a mathematical proof; and since it involved writing down some machine-readable term, it was (in a sense) more rigorous than much of normal mathematics.[2]]


But Box::new(f(*b)) – that’s expensive! Right, and that's why “subtyping” is not only about conversion/coercion in the general sense, but concretely about certain coercions that don't do anything at run-time. (Simple re-interpretation of bit patterns, if you will.) The coercion of Box<A> to Box<B> due to subtyping can do without any run-time cost, or re-allocation. This helps sharpen our intuition on what kinds of coercions we can allow; we can ignore matters of ownership, and only need to think about matters of data-flow. Like “input”, “output”, “inout”.


Thankfully, I don't actually want to focus on a mental model to accurately determine “does (or could) this subtyping-coercion work” for all cases, my main point was to convey how I try to avoid mixing up the direction.

Nonetheless, if you would like, here would be a quick walk-through on a handful of the most important cases of the “variance” rules for certain types.
  • for owned things, data structures, etc… it's generally like Box; e.g. for Vec, too: With A : B, you get Vec<A> : Vec<B> (this is like .into_iter().map(f).collect())
    • these considerations do really constitute general rules. The ‘rule’ for Vec is “if A : B then Vec<A> : Vec<B>
  • for &A, we can use A : B to turn &A into &B. This is perhaps the clearest case where we notice that this “simple re-interpretation of bit patterns” restriction really matters. Unlike the Box case, we don't have sufficient ownership to do this at all with just a conversion function between A and B.
  • if you have A : B and B : C, that gives A : C. Again, to illustrate: if we think of functions f: fn(A) -> B and g: fn(B) -> C we can convert A to C using those two functions like g(f(a)).
  • Now the more tricky parts: the one they call “contravariance”. If you have A : B and get a function fn(B) -> C, you can write a function fn(A) -> C like … didn't I just mention this in the previous point? Yes, kind of, but this time I want to talk about acutal fn(…) -> … types, not just use them as an analogy for implicit coercion. The point is that we make this into the following rule:
    if A : B, then fn(B) -> C : fn(A) -> C. If we use a type alias here, let's say type Xyz<T> = fn(T) -> String, then this looks like “if A : B then Xyz<B> : Xyz<A>”. The “contra” just indicates that “stuff switches sides in our rules”. The A and B switched sides :slight_smile:
    contravariance generally only comes up with such cases of “input” parameters
  • Similarly, you can hopefully see that for the return type, things aren't switching sides; if A : B, then fn(C) -> A : fn(C) -> B this is the “output” type parameter of the function
  • Combining both the previous rules, we can switch out input and output type at once: if InA: InB and OutA: OutB, that allows us to turn fn(InB) -> OutA into fn(InA) -> OutB
  • If we look at fn(A) -> A [same input and output type] and wonder, can we coerce it anyhow? Turning it into fn(B) -> _ requires B : A. Turning it into fn(_) -> C requires A : C. So yes with B : A and A : C, we get fn(A) -> A : fn(B) -> C.
    • but what if we want get get out of this some function where input and output types are still the same? In other words, B and C should be the same? We would need B : A as before, but also A : B (previously A : C). Two types that are both subtypes of each other. In Rust's variance rule-set, we speak of “invariance” if a rule requires A : B and B : A in both directions, and (for simplicity), in these cases we don't allow A and B to be different types at all.[3]
    • we express this in words by saying for type FnSameInOut<T> = fn(T) -> T, that FnSameInOut<T> is invariant in T. Invariance generally appears in contexts with an “inout” sort of flow of data/information.
  • finally, &mut T. For this, it helps to consider functions again. An important consequence of Rust's model of exclusive access for mutation is that fn(Foo) -> Foo and fn(&mut Foo) is essentially the same thing[4]
    • if fn(T) -> T has invariance, then fn(&mut T) must somehow have it, too;
    • this is where we actually do start needing to look at variance as a whole rule-system again, unfortunately. If we fit all of the previous cases in a “simple” system where every lifetime and type parameter is simply described as “covariant”, “contravariant”, or “invariant”, then we simply must make the T in &mut T invariant, otherwise, our system would deduce coercions for fn(&mut T) that wouldn't actually be sound
  • as a bonus, Cell<T> is also interesting. This is very similar to the argument just above. The type &Cell<T> of references to a cell works very much like a &mut T reference; so in the simple framework of variance that we have, having already defined &U as “covariant” in the parameter U, we have no choice but to make Cell<T> invariant in T to avoid unsound coercions

This is all getting close to the fully abstract concepts now. One meta-step even further is only the compiler or language specifications; because they need not only talk about variance rules for concrete types[5] (where the “rules” always look something like “if this parameter type A is subtype of this other type B, then this combined type Foo<A> is subtype of Foo<B>”). They need to talk about the “meta-rules” of creating the variance rules of concrete types[6].

The simplest of such meta rules in action would be e.g. the effect that if you define struct MyStruct<T>(Vec<T>) then your struct MyStruct<T> is covariant in T because the type of its field, Vec<T>, was covariant in T. The full story needs to be able to handle all kinds of usage of type parameters T, which can appear in multiple places, in deeply nested places, even interacting with traits and associated arguments…


  1. This is of course ignoring that subtyping isn't quite the same as fn(A) -> B. ↩︎

  2. Mathematics generally only lays down arguments in human-readable form. Then other humans will have to think really hard through everything step-by-step to make to make sure they're convinced it's actually correct. This is why it can take a long time between when someone publishing some proof the claim to be solving a hard mathematical problem, and when it's actually reviewed and accepted. ↩︎

  3. It needn't be this way. In fact, for notions of “safe transmutations”, perhaps it would be desirable to explicitly support types that can support 2-way coercion though the subtyping and variance system. But also, there's backwards compatibility issues, as some usages of invariant type parameters currently rely on the fact that the type doesn't change at all. ↩︎

  4. they're really almost equivalent interfaces, the only difference is significant but very small: the question of ownership when panicking is different ↩︎

  5. or rather, concrete type constructors? ↩︎

  6. or rather, concrete type constructors? ↩︎

2 Likes

Dang it, I also got input and output around the wrong way in that paragraph! The actual examples are correct then I messed up describing the thing I just provided examples of. Ugh I'll need to take another pass when I get time.

1 Like

Thanks for this explanation. It does not leave me with an Aha! moment, regrettably, but I do appreciate the effort. I will continue to meditate more on this topic.

I will say this much though: I get the idea of lexically scoped lifetimes, and how ownership of data structures tends to bubble up towards the outer most scopes (at least that is how my code tends to evolve over time naturally in response to lifetime-related errors). However, I've run into many situations where a parent scope needs to refer to data managed in child scopes. When this happens, the errors I run into remind me much of the issue in Haskell where one monad conflicts with another monad and you're forced to use something like a "monad transformer" to work around the problem. Absolutely black magic. I still don't understand that either. But I suspect that any Aha! moments I get here will contribute towards the other and/or vice versa. :slight_smile:

OK, fine, I used the wrong term here. It is not the idea of a lifetime which confuses me. I think I'm not alone, however, in using the phrase "lifetime" when "meaning of this lifetime annotation" is more correct. While technically wrong, it is a convenient shorthand that I've seen many others use. I don't think I should be criticized for making this same shorthand.

This might come as a surprise to you, but I actually get the idea of lifetime annotations too. See above.

I will die on the hill that says otherwise. There are lots of things in mathematics that I am forced to take for granted, axiomatically, because I simply don't understand how they are derived.

I am skeptical of the phrase, "It doesn't matter." If it didn't matter, there'd be no need for a theorem prover, custom syntax to support said prover, and the compiler will happily accept any code we throw at it. Maybe I am not interpreting what you're trying to say correctly?

Sorry; but, no.

I get the idea that exclusive borrows are exclusive, that at most one can exist (for a given variable) at a given time during a program's execution, etc. But that says nothing about "invariance" to me. Invariance, to me, relates to invariant, meaning unchanging. It is, in effect, an assertion that must be true at all times. But, that can apply to any number of attributes of a program, not just to the lifetime of a value in a program.

Again, I must disagree. You're completely glossing over the point at which covariance becomes contravariance and saying "but still, no mystery." That IS the whole mystery. To me, at least.

However, I don't want to thread-jack. If/when I run into a circumstance where this actively impedes progress in a Rust project for me, I'll open a new topic. I only replied because there was a reply indicating that "just following the compiler's advice" was a sign of being a sub-standard programmer, which I objected to.

I do appreciate your taking the time to try and explain things. I just wish it had the clarity for me that you intended it to have.

Wow.

If you're going to be this condescending, I think it's best that you no longer reply to any of my posts here, in this thread or in any future thread going forward.

Extremely insulting.