On the meaning of type erasure in Rust

Words change meaning with time. E.g. byte, originally, had entirely different meaning and 8 bit bytes weren't all that common! That's what Stretch memo had:

60 is a multiple of 1, 2, 3, 4, 5, and 6. Hence bytes of length from 1 to 6 bits can be packed efficiently into a 60-bit word without having to split a byte between one word and the next. If longer bytes were needed, 60 bits would, of course, no longer be ideal. With present applications 1, 4, and 6 bits are the really important cases.

Yet it would be very strange today to assert that bytes longer than 6 bits are not interesting and we shouldn't assume that byte is an octet except if explicitly noted.

Yes, it may be inappropriate that one term was conflated with another but it's too later to fix that.

Similarly with type erasure and Java: Java 5 was something that popularized the term and even if someone may have used it for something else before arrival of Java today we would expect something that is at least consistent with the use of term in Java world.

Java may not have been first, but it is certainly the most influential user of the term.

    trait Tr       {}
    impl  Tr for A {}; struct A;

    fn ttt<T: Tr>(_:      T ) {}     // 1
    fn aaa       (_: impl Tr) {}
    fn bbb       (_: &dyn Tr) {}

    let t = &ttt::<A>;               // 2
  //let a = &aaa::<A>; // Error because aaa is not *quite* the same as ttt
    let b = &bbb;

Is there any type erasure in the preceding code on the line that defines

  1. ttt
  2. t

?

OK, take it from here:

    fn take_aaa(_: fn(A)) {}
    take_aaa(ttt::<A>);

No and no. ttt defines a "function generator" (not a function), t instantiates it by specifying the type of its generic parameter.

ttt::<A> and ttt are, in general, different things. When we call ttt::<A>(something), this A can often be inferred, and we write the call as simply ttt(something). But A is still here - just like, when we write let i = 0i32, we don't "type erase" anything, i is still i32.

So you are saying that bbb is type-erased because it is not associated with any 'internal' type, while ttt::<A> is not type-erased because it is associated to that A?

Edit: Put another way, if Tr had any methods, then the whole aaa/ttt family would have type information encoded within it (so it's not type-erased) through the hard-wired pointers to the specific method implementations, while bbb has no type-specific information in it at all (so it's type-erased), it is injected via the argument at the time it is called.

bbb is type-erased because it does the same thing, no matter what type was coerced to a &dyn Tr - it calls the method from its vtable. ttt::<A> is not type-erased, since, well, there's no type that might have been erased - all arguments are typed explicitly.

Here is an explanation of the meaning of type erasure as used in Rust, in which the use of the word erasure makes sense to me:

  • The code that the compiler generates for &[1]dyn Trait parameters, contains no information specific to any concrete type represented by the dyn Trait.

    Thus any knowledge of those concrete types is erased and does not make it into the object code.

    Type-specific information is passed in the argument at run time.

  • The code that the compiler generates for impl Trait[2], refers to information that depends on specific concrete types represented by the impl Trait, such as the size of the type and the location of its methods.

    Thus some knowledge about those concrete types is not erased and ends up in the object code.

    Type-specific information is hard-coded in the object code at compile time.

Do you see any problems with this interpretation?


  1. or any other indirection mechanism ↩︎

  2. or generic parameters ↩︎

4 Likes

Hmm, according to quinedot:

Every dyn Trait value is the result of type erasing some other existing value.

What information does type-erasing a value remove?

FWIW, my take on this question: Type erasure is what happens at sites where a value is being passed into polymorphic code — code that can accept values of different types chosen at run time.

As I'm defining it, ttt and aaa do not use type erasure because the type cannot be selected at run time; every call has a specific T involved. (For example, you cannot obtain a ttt function pointer without picking a T for it.) bbb requires type erasure to occur because the static type information is discarded, and some run-time information derived from the type (the vtable) is added to fit the input of bbb. But type erasure does not necessarily have to involve adding any run-time information. For example:

  1. As previously mentioned, Java generics discard the type information entirely.

  2. Rust lifetimes are erased! Every time you use a type with a lifetime parameter, the lifetime is discarded when the program is compiled, and has zero influence on the execution of the program. fn foo<'a>(&'a Bar) doesn't and can't know what 'a is, not even whether it is 'static. foo's Rust definition and foo's machine code are polymorphic over the lifetime 'a.

  3. fn four_bytes_equal(a: &[u8; 4], b: &[u8; 4]) -> bool {
        a == b
    }
    let x = 1;
    let y = 2;
    assert!(!four_bytes_equal(bytemuck::cast_ref(&x), bytemuck::cast_ref(&y)));
    

    This is a type-erased equality comparison. Zero run-time information was added to the values in order to pass them in erased form; they only had to meet the criterion of being four bytes long (and free of padding).

Type erasure, as I’m defining it,

  • discards some compile-time information — a type, which can be used in type checking — and
  • creates some (or no) run-time information — a value, which cannot be used in type checking but is available for the polymorphic code to read at run time.

Imagine you have a &i32. The compiler knows it points to a i32, i.e. it knows the type of the pointed type. By converting it to a dyn Display the compiler no longer knows it points to a i32, that is you erased its knowledge about the concrete type of the pointed value.

I agree with OP that the term "type erasure" is used very differently in Rust than it is used in Java (where it may have originated?).

Java has abstract interfaces and dynamic dispatch too, but it's not what the term "type erasure" is used to describe in Java. There, "type erasure" is a strange hack on top of the type system created to make collections generic in a backward-compatible way.

1 Like

@kpreid Do you see any incompatibility between your definition and the one I gave here ?

My problem with this is that values are[1] run-time entities. In your description nothing is removed at run-time; nothing is removed from the value itself. Yes, something is erased from the information that the compiler has at compile-time (before any of the values exist) about any and all values of this type, and from information that might otherwise have been stored in the object code. However, crucially, no information is removed from the value itself.

The i32 value occupies exactly 32 bits. You cannot remove a single one of those without screwing up the value!


  1. sigh, mostly ↩︎

No.

Looks like we've hit another ambiguity. [1] When you say "value is a run-time entity", you speak of some sequence of bytes existing at, well, runtime. When we say "type-erasing some value", we speak of some entity inside the abstract machine - that is, something that doesn't yet have any byte representation, but has some semantic properties instead, and it's these semantic properties that are erased on conversion.


  1. As we always do in the natural languages, since they are fundamentally imprecise. ↩︎

3 Likes

I wouldn't call “use of OOP as it's described in basically every book” as “a strange hack”, though.

In Java “type erasure” works by passing around objects of “most-base type” into “type-erased” function with follow-up upcast.

You can do that in Rust, too, just would need to use explicit conversion into dyn Trait and then use of core::any::Any to downcast on the extraction from collection.

That's not how how Rust collections work, but one can, most definitely, create Java-like collections with Box<dyn Trait> or maybe Rc<dyn Trait> and type erasure there would work in exactly the same way as in Java.

In what language type erasure removes some information from value?

What language loses bits from their integer representation and turns zero into one (or the other way around) when it does type erasure?

That would be strange and illogical. As was already noted type erasure tend to add some information to the value (to ensure that polymorphic code that have no idea about actual type of the value that it processes may process it).

That's not a requirement (lifetimes are stripped entirely without leaving any information about their existence), but, in general, type erasure keeps the value intact and then adds some runtime markups which are needed to “return the value” from it's typeerased state.

When you put something in an HashSet you expect to get that value back intact whether said HashSet uses type erasure (like in Java) or monomorphisation (like in Rust)!

In languages like C++ or Rust you even, usually, have a choice between monomorphisation and type erasure, these are two fundamentally different ways of doing type-generic programming.

Rust is a bit of an outlier here, most languages that try to use generics (as opposition to templates) then use some kid of type erasure and polymorphic code (be it Extended Pascal, Ada, Java (much discussed here), or Swift) while Rust combines problem of generics with templates (in particular generics in Rust couldn't be used with dynamic linking which is, usually, the #1 motivation for type erasure: you couldn't pass information about type to your generic code when said type doesn't even exist when your code is compiled which, naturally, leads to “pointers + vtable” design and thus type erasure).

Heck, even good old C qsort that exists for half-century or so may be treated as [primitive and manual, yet still valid] kind of type erasure.

Rust (and many other language) are just elevating this technique to the language level, that's all.

This is incorrect. This would work the same as in Java before generics and type erasure were introduced to Java.

In Java before generics, you could only have ArrayList, which is essentially equivalent to ArrayList<Object> and similar to Vec<Box<dyn Any>>.

In Java with generics and type erasure you can now have types like ArrayList<Integer> which provide more type safety than ArrayList<Object>.

Point is, they provide more type-safety at compile-time, but are fully equivalent at run-time. And, yes, I guess that's why Vec<Box<dyn Any>> is "Java-like" and not "the same as Java" - it's not exactly the same.

This only supports my point that type erasure in Java is something different from dyn. Non-generic ArrayList prior to when Java introduced type erasure is also analogous to Vec<Box<dyn Any>> at runtime.

What have changed after they have been added? Simple code produces exactly the same bytecode whether you do these conversions manually or allow compiler to do them for you.

Can you show me an example where “new way” of doing things is not just a syntax sugar that you can manually convert to “old way” but something actually different?

Yes, but these are mechanically trasformed to “old way” during compilation. Only debug info is different, bytecode is the same.

Yes. And you are just doing the exact same type erasure in “old Java”, only manually, without help from Java 5 syntax sugar.

With generics in Java you get better type checking, getting compile-errors where previously you wouldn't get type errors. But I'm not going to get into this because it's sidetracking the discussion.

They couldn't be exactly the same since type erasure in Java often is built on top of subtyping and then subtype upcasting/downcasting. Rust doesn't support that.

Traits are closer to the interfaces than to subclasses.