Rust contravariance, Regarding the contravariance of `fn(&'a i32) -> ()`

@Tom47 @nerditation @khimru @Cerber-Ursi @zirconium-n

Thank you all, I think I understand now. Whether it's covariance, contravariance, or invariance, their essence lies in ensuring type safety. Simply put, it's about guaranteeing the validity of all fields (including lifetime parameters in Rust).

Once I grasped the essence, understanding variance naturally followed.

2 Likes

I guess you found an answer, but since you asked me to comment elsewhere, I wrote something up anyway. It ended up way too long again :sweat_smile:. Feel free to ignore it.

This is indeed a general property of variance, no matter the language, which is probably why it's hard to find it written down somewhere. Here's a Wikipedia article that discusses it. As others have explained, one way to reason about this is that you can always write a shim function to achieve the same effect as the contravariant relationship provides.

trait Example<'a> {
    fn example(self, s: &'a str);
}

fn example<'a, Ex: Example<'a>>(ex: Ex, s: &'static str) {
    // coerce the `&'static str` to `&'a str` and call `Ex::example`
    ex.example(s);
}

Variance is indeed a necessary part of soundness in any subtyping language. The variance of each type position in a generic type constructor can be determined mechanically,[1] as explored in this paper.[2] A briefer summary for Rust is possible, which I will provide below. But you may find the paper interesting for reading about variance generally.[3]

Note in particular that variance is most easily phrased as a property of a generic parameter in a specific type constructor. For example, the variance of T in fn(T) is different from the variance of T in fn(Mutex<T>).[4]

You usually don't have to declare variance in Rust as it is inferred for non-primitive type constructors.[5] If it can't be inferred, the definition triggers a compiler error, which is why generic parameters cannot be "unused".[6] Because of the mechanical rules and the inference, one normally doesn't have to think about getting variance right for soundness. But a source of variance-related unsoundness in Rust is when types which use unsafe do not properly constrain the variance inference by use of PhantomData.[7]

Here's the summary for determining variance in non-primitive Rust types

The reference gives a somewhat ad-hoc introduction to variance. It supplies the variance for some primitive type constructors and gives some non-nested examples, but doesn't talk about nested type constructors.

Transform is the operator defined in the Altidor, Huang, and Smaragdakis paper, which can be used to determine the variance of positions in nested type constructors. For example the T in a fn(Vec<T>).

⊗ (transform) :green_circle: Covar :yellow_circle: Contra :red_circle: Invar
:green_circle: Covar :green_circle: Covar :yellow_circle: Contra :red_circle: Invar
:yellow_circle: Contra :yellow_circle: Contra :green_circle: Covar :red_circle: Invar
:red_circle: Invar :red_circle: Invar :red_circle: Invar :red_circle: Invar

Covariance preserves variance, contravariance reverses variance,[8] and invariance dominates. For example:

Type constructor Variance Explanation
Vec<A> A is :green_circle: Covar Follows from defintion of Vec<_>
fn(B) B is :yellow_circle: Contra Primitive type constructor
fn(Vec<C>) C is :yellow_circle: Contra :yellow_circle: Contra ⊗ :green_circle: Covar = :yellow_circle: Contra
fn(fn(D)) D is :green_circle: Covar :yellow_circle: Contra ⊗ :yellow_circle: Contra = :green_circle: Covar

The reference also gives some examples for "other struct, enum, and union types". In these contexts, the author chooses the number of generic type parameters, and type equality is considered a well-formedness constraint. Another operator is required to enforce this constraint: the greatest-lower-bound, or GLB. The GLB is used when the same generic parameter shows up in two more more places in the type of some field.

∧ (greatest-lower-bound) :green_circle: Covar :yellow_circle: Contra :red_circle: Invar
:green_circle: Covar :green_circle: Covar :red_circle: Invar :red_circle: Invar
:yellow_circle: Contra :red_circle: Invar :yellow_circle: Contra :red_circle: Invar
:red_circle: Invar :red_circle: Invar :red_circle: Invar :red_circle: Invar

Equality preserves variance and non-equality results in invariance.

Here are some examples.

Type constructor Variance in Newtype<_> Explanation
Newtype<T>(fn(T) -> T) T is :red_circle: Invar :yellow_circle: Contra ∧ :green_circle: Covar = :red_circle: Invar
Newtype<U>(Vec<U>, U) U is :green_circle: Covar :green_circle: Covar∧ :green_circle: Covar = :green_circle: Covar

Finally, to explain why "outside of an struct, enum, or union, the variance for parameters is checked at each location separately".

For the specific type examples, it is as if those types (function pointers, tuples) declared a distinct type parameter for every top-level type position. Which you can also do for your own types,[9] if you want that flexibility.

And in other contexts, variance isn't calculated because the generics will be instantiated with concrete types instead. For example, arg isn't invariant in T here:

trait Ex<T, U> {
    fn ex(arg: fn(T) -> T) -> fn(U) -> T;
}

impl<'a> Ex<&'a str, &'static str> for () {
    fn ex(arg: fn(&'a str) -> &'a str) -> fn(&'static str) -> &'a str {
        arg
    }
}

And, some other notes that most people will never need to consider.

Bivariance in Rust

Rust does have bivariance, but not in a way that you will probably encounter directly. First, we should say what bivariance is. The Wikipedia article says it's being both covariant and contravariant, but the paper I cited says that it's being able to convert between any types at all. Bivariance in Rust is the latter.

Before we said that if the variance of a type parameter cannot be inferred, the definition triggers a compiler error. However, what really happens is that parameters inferred to be bivariant are usually rejected. A parameter can be inferred as bivariant if it's unused:

// error[E0392]: type parameter `U` is never used
struct Ty<U>();

And it can also be inferred as bivariant if it is only used in a recursive manner:

// error: type parameter `U` is only used recursively
struct Ty<U>(*const Ty<U>);

There are more details in the cited paper.

(It can be useful to know why recursive definitions are rejected. But the next part is more Rust Trivia Night material than useful.)

However, there is an exception that is not rejected: bivariant parameters which are also constrained are allowed. Parameters can be constrained by being part of an associated type equality bound. So the bound allows this definition to compile:

trait Trait {
    type Assoc;
}

struct Ty<T, U>(T, *const Ty<T, U>) where T: Trait<Assoc = U>;

And in fact, we can just not use U in any fields at all.

struct Foo<T: Trait<Assoc = U>, U>(T);

Because U is bivariant, when we coerce T to SuperOfT, we can also coerce U to <SuperOfT as Trait>::Assoc... even if the types are completely unrelated.

type Sub = for<'a> fn(&'a ());
type Super = fn(&'static ());

impl Trait for Sub {
    type Assoc = String;
}

impl Trait for Super {
    type Assoc = [usize; 3];
}

fn use_covariance_of_t(x: Foo<Sub, String>) -> Foo<Super, [usize; 3]> {
    x
}

The example is taken from here.

Finally, I think I phrased most things in terms of types, and not lifetimes specifically. I'll just note that the terminology when it comes to lifetimes is somewhat arbitrary.

Currently, almost everyone says that 'static is a "subtype" of 'a[10], and as &'static str is a subtype of &'a str, 'x is therefore covariant in &'x str. However, lifetimes aren't actually types. We could have just as easily said that 'a is a "subtype" of 'static and thus 'x is contravariant in &'x str -- and indeed, that was the original terminology Rust used, over a decade ago.

Which is more intuitive is a matter of opinion, but again, pretty much everyone today uses the "'x is covariant in &'x str" terminology.


  1. there are some nuances around bivariance, but you can laregly ignore bivariance in Rust ↩︎

  2. Taming the Wildcards: Combining Definition- and Use-Site Variance; Altidor, Huang, and Smaragdakis ↩︎

  3. I recommend sections 2.1 and 3.1 if you're curious. ↩︎

  4. We humans tend to take shortcuts when talking about variance and leave out the wider context, which can lead to misleading or counterfactual statements. For example, it's not the case that lifetimes in function arguments are always contravariant, but that's the sort of statement someone might casually say. ↩︎

  5. e.g. your own struct Example<T> ↩︎

  6. In actuality it can always be inferred, but bivariance is disallowed for unconstrained parameters. There's more details in the collapsed bivariance section below. ↩︎

  7. PhantomData is used to drive inference as we don't have a way to directly declare variance at all. ↩︎

  8. the reverse of invariance is invariance ↩︎

  9. e.g. Newtype<T, U>(fn(T) -> U) ↩︎

  10. for any other lifetime 'a ↩︎

4 Likes

Thank you for your response; it was very helpful. I’d like to ask: since lifetimes are essentially not types, how should we understand the essence of lifetimes?

(not a contribution)

42.

Lifetimes are part of Rust's type system, and also have variance as several examples mentioned in this thread showed.

Lifetimes are types. The trouble with lifetimes is that they are, essentially, uncountable.

Consider the following types[1]: i64, *const i64, *const *const i64, …

If you only generate these in your program (by calling function that just hands out random sequences of bits), compare them and print them, as addresses (using {:?}) – then, after compilation, you couldn't even distinguish them! It doesn't matter if you have “pointer to i64” or “pointer to pointer to i64” – there are no difference as long as you don't try to dereference them!

In some sense, in that sequence of types, “once pointer” is not distinguishable from “ten times pointer” (as long as we don't derefence them).

But the difference between lifetime and “dereference rank” of pointer is precisely that dereference operation: for some pointers it can be applied once, for some pointers it can be applied twice and for interger it can not be applied at all!

Lifetimes are different. Because they are, essentially, just the part of theorem prover, something that just verifies correcness of you program… it's not possible to look on the type and say “hey, this type have lifetime of one second” or “that variable exist for lifetime of two seconds”… the only concrete lifetime that exist is 'static.

That means that everything that touches lifetime (except 'static) is, in reality, “generic over lifetime”. This may sound a bit crazy, but that's also necessary!

Consider something like this:

fn foo() {
  let x: i32 = 42;
  let r: &i32 = &x;
  println!("{r}");
}

You can call this function two times – and would get two different references… especially if you would do that from two different threads. And these two references would have two different types (because lifetime is part of reference type). And then you would need two functions to print them. And that means that our two copies of foo, that run at different threads, are distinct… but hey, that's precisely what actually happens: they are distinct (if only because they have different frames on stack), they only share machine code that implements them.

And that's how lifetimes become “strange”: on one hand Rust declares them “not effecting the generated machine code”, on the other hand, they really-really-really want to affect the generated machine code (look on bugs related to the specialization one day, if you want gory details)… thus I, for one, imagine lifetimes as “kinda-sorta-real” types – but the ones that compiler doesn't want to distinguish in the generated machine code.

All the constructs that may help us distinguih and separate them (similar to the “dereference” for pointers) are simply outlawed (in a stable Rust, at least).

One may imagine different language, where there are no such “law”… and it may even turn out to be viable and usable language… but that wouldn't be Rust (as we know it, at least).


  1. One the 64bit CPU for simplicity. ↩︎

Let me say some things specifically on this “not types” idea:

  • They are definitely not types in the sense that they don’t define the properties of a set of values, but they are a lot like types otherwise, and when you put a lifetime in a type some of the properties of the lifetime carry over to properties of the type.
  • Types, lifetimes, and constants are all entities that can be manipulated by generic code at compile time, for which Rust doesn’t have a single unifying name (other than their syntactic usage as “generic parameters”).

In particular, if you’ve looked at complex generic Rust code, you may have seen it use “dummy” or “marker” types for which no values are ever created at run time, but which are still used to help constrain which trait implementations are used, or impose other constraints on the static structure of the program. This is using a type as not-a-type — saying “we don’t actually care that this is a type, just that traits are predicates for it”. We could imagine a language (or a future version of Rust) which had a general name for these compile-time-entitities of which types, lifetimes, and generic constants were all, uh, “subcategories”.

So, it is true that lifetimes are not types, but they are manipulated in the ways types are.

(Also, while I am not agreeing with @khimru’s “lifetimes are types” just above, I do agree that it’s important to fully understanding lifetimes to understand that you never meet a concrete lifetime other than ’static. Every other lifetime name is a generic parameter which, abstractly, is instantiated to a new concrete lifetime every time you run the code in question — but in practice is erased.)

3 Likes

I don't have a simple answer, as lifetimes fulfill a few different roles.

In terms of the type system, lifetimes are a kind of generic parameter for type constructors. Other generic parameter kinds include types and const values. Effects can also be modeled as a kind of generic parameter.

As a part of the type system, lifetime parameters can be used for all sorts of type-level things. I'll come back to that later.


The lifetime parameter on a reference can be thought of as the duration of a borrow. Borrow checking is a large topic on its own, and one could fill chapters exploring exactly what a lifetime is in that context. From a type system perspective, the soundness of being able to "approximate" borrows -- coerce a &'static str to &'non_static str, say -- aligns with the soundness of subtyping and variance. For example it's not sound to coerce a &mut Vec<&'static str> to a &mut Vec<&'non_static str>, so T in &mut T must be invariant. This is analogous to the IList interface example in the Wikipedia article.

Outside the context of this thread, if you asked a typical Rustacean what the essense of a lifetime is, they'd probably say it's about borrowing. If I was forced to give one concise answer, that's the answer I would give.


Lifetimes can also be used in bounds, such as where 'a: 'b. This is also called an "outlives" relationship. When thinking of lifetimes as durations, you would say that 'a is longer than 'b. When thinking more in depth, it turns out that you can have two lifetimes 'x and 'y such that neither 'x: 'y nor 'y: 'x. So you may also think of 'a: 'b as meaning "'b is a subset of 'a". If this is your primary memory model, it's intuitive to say "'x is contravariant in &'x str" so that lifetime subsets and subtypes "line up". And that used to be the terminology.

In addition to the arguments made in that thread, there's another point of view where lifetimes are a set of loans. Under that mental model, subsets and subtypes line up with our current terminology: 'x is covariant in &'x str.

Either mental model is compatible with the subtyping of references, and the variance in the language, which keep everything sound.

Outlives requirements can also be applied to types or generic type parameters (T: 'b), in which case it is a type-level constraint which can be "destructured" into a set of 'a: 'b bounds (for lifetimes 'a found within T's resolved type).


Traits can also be parameterized by lifetimes. The trait can then use the lifetime in its items however it wants, such as in types or in outlives bounds.[1] This usually corresponds to having the capability to work with a lifetime parameterized type in some way.

trait Parse<'a> {
    type Error;
    fn parse(&'a str) -> Result<Self, Self::Error>;
}

Implementors of traits can also be type-erased by coercion. With the trait above, for example, one could have a dyn Parse<'a, Error = String>. Now the lifetime is in a type again,[2] but it represents more of a capability to work with the lifetime.

And it's useful to have types which can work with all lifetimes, so we have higher-ranked types:

dyn for<'any> Parse<'any, Error = String>

This is a type capable of working with any lifetime.

It's also useful to be able to require the capability of working with all lifetimes in a generic context, so we also have higher-ranked trait bounds (HRTBs):

fn foo<P>(p: P) where P: for<'any> Parse<'any> { /* ... */ }

So in addition to representing durations of borrows, lifetimes can also be used to represent capabilities (usually capabilities of being able to work with borrows of a certain duration).


But because they participate in the type system, you can use lifetimes in ways not directly related to borrowing in order to do type level "gymnastics" in the language. This is along the lines of what @kpreid was talking about with the use of "marker" types. Sometimes this can exercise emergent properties of the language which are pretty far away from the concept of borrows that we started with.

For example, you can create branded types using "phantom lifetimes" that don't correspond to borrows at all; they are a type-level mechanism. Or you can emulate a GAT with a non-generic associated type,[3] or by using a lifetime parameterized supertrait.[4] That article goes on to demonstrate a way to emulate conditional higher-ranked trait bounds using some emergent properties of implicit lifetime bounds (which I haven't even talked about here). That was also used in the design of scoped threads.

These examples are using the properties of lifetimes as generic parameters (and how they combine with other parts of the language like trait solving). It's easier to understand them from a type system POV than a "lifetimes are the duration of borrows" POV.

(They're also practical, but Rust's type system is turing complete, so you can do all sorts of impractical things with it as well, if you try hard enough.)


That's mainly what I had to say; this final piece is a "me too" regarding the last couple comments.

Another way to see that there are "infinite" lifetimes, or that you never meet a concrete non-'static lifetime, is to consider this function.

fn example(vec: &mut Vec<u64>, current: u64) {
    if current == 1 {
    } else if current % 2 == 0 {
        example(vec, current / 2)
    } else {
        example(vec, current * 3 + 1)
    }
    vec.push(current);
}

We don't give away vec in the recursive calls, so we must be reborrowing *vec for some shorter duration than the input lifetime each time. But we do this an unknown number of times (depth).

There's no way to compile this with "concrete lifetimes".[5]


  1. and it can even not use it in its items at all, and just make the lifetime available to implementors, as some linked examples later demonstrate ↩︎

  2. dyn Parse<'a, Error = String> is its own concrete type ↩︎

  3. nothing needs to actually implement the GivesItem trait ↩︎

  4. here the lifetime parameter on the trait isn't being used for a capability per se, but is available for the implementor to parameterize the associated type in practice ↩︎

  5. One can also craft an example that isn't even decidable. Assuming infinite stack space anyway, I suppose. :wink: ↩︎

3 Likes