An alternative implementation for `Any`?

I came up with an alternative implementation of the Any type.

Instead of using a 128-bit hash as a TypeId, it uses a function pointer in the vtable for the same purpose. This means that you skip one indirection and also only have to use usize-width bits for the comparison, rather than 128 (a bit win on 32-bit or lower).

I'm sure I'm not the first person to consider this (IIRC, C++ puts its RTTI key directly in the vtable to avoid the indirection too). But what do people think?

Right now there are two ugly parts to it:

  1. For correctness it relies on the compiler generating exactly one version of the type_ptr function for each type, so there is a one-to-one correspondence between vtable entries for type_ptr and the type. I'm pretty confident this should be the case even across translation units, but I've not found a guarantee that it is.
  2. I couldn't find a stable way of getting a function pointer from a vtable, so had to manually hack around what I understand to be the current vtable structure. This isn't a fundamental problem (since the compiler knows how to get that function pointer by name) I just couldn't find a nicer way of doing it.

AFAIK no, different codegen units can generate duplicated vtables. This is also mentioned as one of the caveats of std::ptr::eq

4 Likes

Yeah, I knew about the duplicated vtables (that should be discussed in the blog post) otherwise I'd suggest using the vtable pointer itself as the TypeId-alternative.

This assumes the function pointers within each vtable is unique, which is much more likely to be true. I'd hope that the linker would ensure there is only one copy of every function.

I'm not sure why this would be true. You can just have a generic type that's instantiated, along with its trait implementations, in different codegen units, thus duplicating its functions. If the linker could solve this problem why can't it also solve the problem for vtable duplication?

Remember that you also need to consider dynamic libraries (at least the dylib crate type) on different platforms. Overall this feels like the same problem as generic statics.

Also, you should probably post this on internals.rust-lang.org for more targeted feedback

3 Likes

It's not a compile time constant. It can only be a link time constant.

This prevents specialization workaround pattern which relies on DCE and TypeId being constant.

1 Like

Given the circular nature of your definitions, and that the &self argument is completely unused, I don't see anything that would prevent TypePointer::of::<T> being unified to the same function for all T, which then means that you no longer have a type-unique id.

1 Like

type_ptr doesn't use self, but it does use T. TypePointer::of::<T> also uses T. Therefore I don't think the compiler can conclude that TypePointer::of is the same for every T.

That is interesting, thanks!

Compiler wouldn't even try. Rather linker would look on the code of functions and merger them. That's standard procedure novadays.

1 Like

No, this is a circular argument. type_ptr uses T only for TypePointer::of::<T>(), but TypePointer::of::<T>() uses T only for getting the address of the T::type_ptr, so we're back at the initial claim.

Viewed from another point of view: if you inline TypePointer::of::<T>() inside type_ptr it just becomes TypePointer(<Self as PointerAny>::type_ptr as _), where <Self as PointerAny>::type_ptr is the function we're implementing. Hence type_ptr only uses itself, and nothing from T and thus every instance is the same.

1 Like

I swear I'm not being obtuse, but I just don't see that this follows. In your example type_ptr uses Self (which is T), therefore it does use T.

Moreover, I can't find a way of writing this such that the compiler does that "optimisation". For example this and this clearly show two different functions being generated. Obviously this is empirical and not a guarantee, but if you're right then this should be a pretty easy place for that optimisation to be applied.

It uses <T as PointerAny>::type_ptr, but that's not enough to show it uses T, for that you also need to prove that <T as PointerAny>::type_ptr's implementation actually depends on T. But this is what you wanted to prove in the first place, and you can't assume it's true in order to prove it [1].

Doing circular reasoning is not the easiest/fastest thing, so that might be why it doesn't optimize that case. For example here are two examples where the compiler succeed and fails at a similar optimization https://rust.godbolt.org/z/jzTP1qr3E

You didn't properly share it, that's just godbolt's homepage.


  1. This would work when dealing with coinduction, but this is not the case. ↩︎

That's because currently it doesn't exist. There are polymorphization workgroup, but it stalled AFAIK and people are just using C++-derived approach with ICF option for linker.

Start from this blog post and you'll see how it was enabled for rustc.

I guess when I said “that's standard procedure novadays” I gave the wrong impression: it's not yet enabled by default everywhere, but it's quite common to use it.

Here's how to observe it on Godbolt. Note that rustc-generated code still includes two functions, but they are merged by linker and thus addresses are the same when program is run.

1 Like

I don't doubt the optimisation exists as a concept. What is not obvious to me is whether it would ever apply to the code I posted.

In any case, since actually calling type_ptr is not really necessary nor desirable, we could replace that function with literally anything. In the blog post I even suggest using the address of Any::type_id (which obviously could never be optimised as identical). The general approach would still work. The remaining issue being whether there is a way to guarantee uniqueness of a function pointer used in a vtable (and getting the pointer from the vtable, but that's more of an implementation detail).

Sorry, I edited the link after you opened in apparently.

In any case, this potential issue could be avoided by using the address of Any::type_id rather than the current type_ptr (since that function is never really meant to be called). That would eliminate false-positive type equality.

For false negatives, I think you're right that in the face of dynamic linking that can never be addressed with this method. Within a compilation unit though (or potentially statically linked) I think this is interesting though).

I was hopeful that the linker would eliminate duplicates across (statically linked) compilation units because they are functions that should have the same symbol per-type. Surely the linker deduplicates symbols? As to why vtables aren't given symbols that the linker could deduplicate similarly, I think that's an interesting question which I'm not qualified to answer. I'll continue to think about it though.