I find myself confused regularly with the sub- vs super terminology.
Thankfully in Rust syntax it’s always really simple.
You have something like
Foo : Bar
then it means that you can convert/coerce Foo
into Bar
. It’s always left-to-right conversion.
Of course subtyping isn’t directly expressible in Rust syntax, and lifetimes aren't types of values anyway… and also the :
is used with traits, too; but even so, the directionality of the syntax is nicely consistent. I don't really need to think in these words at all. “subtype, supertype”. “outlives”, “implements (trait)”, “supertrait”, …
[Looking into further usage of :
… I suppose x: T
meaning x
has type T
is the only thing that doesn't quite fit.]
Examples: If I want to make
fn convert<'a, 'b>(s: &'a str) -> &'b str {
s
}
compile successfully, I need &'a str : &'b str
, which boils down to 'a : 'b
fn convert<'a, 'b>(s: &'a str) -> &'b str
where
'a: 'b
{
s
}
If I want to make
fn convert<T>(s: Box<T>) -> Box<dyn Trait + 'static> {
s
}
work, I need T: Trait
and T: 'static
.
“Variance” just refers to the rules the compiler can use to determine (some of) these coercions. But you don't need to understand these abstract and general rules in order to understand concrete cases. Like you don't need to understand a Turing machine in order to understand for concrete cases whether or not something is a valid description of an “algorithm” / a computation. Like you don't need to understand monad laws, or any abstract theory of monads, in order to understand what singleton
, map
, and flat_map
to for concrete collection(-like) types.
Ok, let's just determine a few coercions. I have A : B
, and Box<A>
, can I create Box<B>
?
Yes, I can. This could even when just thinking about conversion functions. If I have f: fn(A) -> B
and b: Box<A>
, I can do Box::new(f(*b))
to get Box<B>
. [In a very real sense, by writing an expression like Box::new(f(*b))
, we effectively just formalized a proof that Box
is covariant. Yes, we just did a mathematical proof; and since it involved writing down some machine-readable term, it was (in a sense) more rigorous than much of normal mathematics.]
But Box::new(f(*b))
– that’s expensive! Right, and that's why “subtyping” is not only about conversion/coercion in the general sense, but concretely about certain coercions that don't do anything at run-time. (Simple re-interpretation of bit patterns, if you will.) The coercion of Box<A>
to Box<B>
due to subtyping can do without any run-time cost, or re-allocation. This helps sharpen our intuition on what kinds of coercions we can allow; we can ignore matters of ownership, and only need to think about matters of data-flow. Like “input”, “output”, “inout”.
Thankfully, I don't actually want to focus on a mental model to accurately determine “does (or could) this subtyping-coercion work” for all cases, my main point was to convey how I try to avoid mixing up the direction.
Nonetheless, if you would like, here would be a quick walk-through on a handful of the most important cases of the “variance” rules for certain types.
- for owned things, data structures, etc… it's generally like
Box
; e.g. for Vec
, too: With A : B
, you get Vec<A> : Vec<B>
(this is like .into_iter().map(f).collect()
)
- these considerations do really constitute general rules. The ‘rule’ for
Vec
is “if A : B
then Vec<A> : Vec<B>
”
- for
&A
, we can use A : B
to turn &A
into &B
. This is perhaps the clearest case where we notice that this “simple re-interpretation of bit patterns” restriction really matters. Unlike the Box
case, we don't have sufficient ownership to do this at all with just a conversion function between A
and B
.
- if you have
A : B
and B : C
, that gives A : C
. Again, to illustrate: if we think of functions f: fn(A) -> B
and g: fn(B) -> C
we can convert A
to C
using those two functions like g(f(a))
.
- Now the more tricky parts: the one they call “contravariance”. If you have
A : B
and get a function fn(B) -> C
, you can write a function fn(A) -> C
like … didn't I just mention this in the previous point? Yes, kind of, but this time I want to talk about acutal fn(…) -> …
types, not just use them as an analogy for implicit coercion. The point is that we make this into the following rule:
if A : B
, then fn(B) -> C : fn(A) -> C
. If we use a type alias here, let's say type Xyz<T> = fn(T) -> String
, then this looks like “if A : B
then Xyz<B> : Xyz<A>
”. The “contra” just indicates that “stuff switches sides in our rules”. The A
and B
switched sides 
contravariance generally only comes up with such cases of “input” parameters
- Similarly, you can hopefully see that for the return type, things aren't switching sides; if
A : B
, then fn(C) -> A : fn(C) -> B
this is the “output” type parameter of the function
- Combining both the previous rules, we can switch out input and output type at once: if
InA: InB
and OutA: OutB
, that allows us to turn fn(InB) -> OutA
into fn(InA) -> OutB
- If we look at
fn(A) -> A
[same input and output type] and wonder, can we coerce it anyhow? Turning it into fn(B) -> _
requires B : A
. Turning it into fn(_) -> C
requires A : C
. So yes with B : A
and A : C
, we get fn(A) -> A : fn(B) -> C
.
- but what if we want get get out of this some function where input and output types are still the same? In other words,
B
and C
should be the same? We would need B : A
as before, but also A : B
(previously A : C
). Two types that are both subtypes of each other. In Rust's variance rule-set, we speak of “invariance” if a rule requires A : B
and B : A
in both directions, and (for simplicity), in these cases we don't allow A
and B
to be different types at all.
- we express this in words by saying for
type FnSameInOut<T> = fn(T) -> T
, that FnSameInOut<T>
is invariant in T
. Invariance generally appears in contexts with an “inout” sort of flow of data/information.
- finally,
&mut T
. For this, it helps to consider functions again. An important consequence of Rust's model of exclusive access for mutation is that fn(Foo) -> Foo
and fn(&mut Foo)
is essentially the same thing
- if
fn(T) -> T
has invariance, then fn(&mut T)
must somehow have it, too;
- this is where we actually do start needing to look at variance as a whole rule-system again, unfortunately. If we fit all of the previous cases in a “simple” system where every lifetime and type parameter is simply described as “covariant”, “contravariant”, or “invariant”, then we simply must make the
T
in &mut T
invariant, otherwise, our system would deduce coercions for fn(&mut T)
that wouldn't actually be sound
- as a bonus,
Cell<T>
is also interesting. This is very similar to the argument just above. The type &Cell<T>
of references to a cell works very much like a &mut T
reference; so in the simple framework of variance that we have, having already defined &U
as “covariant” in the parameter U
, we have no choice but to make Cell<T>
invariant in T
to avoid unsound coercions
This is all getting close to the fully abstract concepts now. One meta-step even further is only the compiler or language specifications; because they need not only talk about variance rules for concrete types (where the “rules” always look something like “if this parameter type A
is subtype of this other type B
, then this combined type Foo<A>
is subtype of Foo<B>
”). They need to talk about the “meta-rules” of creating the variance rules of concrete types.
The simplest of such meta rules in action would be e.g. the effect that if you define struct MyStruct<T>(Vec<T>)
then your struct MyStruct<T>
is covariant in T
because the type of its field, Vec<T>
, was covariant in T
. The full story needs to be able to handle all kinds of usage of type parameters T
, which can appear in multiple places, in deeply nested places, even interacting with traits and associated arguments…