std::any::TypeId <-> u64?

It doesn't work if you don't have a fixed set of types, and even if you do, it breaks (i.e. type IDs change) if you need to update that set.

I'm okay with this set of limitations. I'm not creating new Rust types at Runtime. :slight_smile:

That said, if anyone has a way to create Rust types at runtime, I'd love to hear about it [though we should post another topic].

It's not about creating new types at runtime; that's not possible AFAICT. This sort of constraint matters when your code exposes a public API so you can't guarantee that it will only ever be used with the exact set of types you envisioned.

In theory, I agree with you. In practice, there is one function which does this initialization, and all wasm executors call it, so it is guaranteed to be the same.

In my mind, the gap between 'nothing' and '99.99% working' is the big improvement.

Also, hashing in this context makes me uncomfortable. I understand the birthday paradox, I understand it is more likely the Sun blows up and eats planet Earth than for a collision to happen, but nevertheless, I'm not a fan of hashing in this context.

Then you'd be better off sidestepping the whole TypeId mechanism and reimplementing it from scratch, because it is a hash.

What hash function? With what inputs? How can the docs claim it to be globally unique if it's the hash of something ?

See the issue linked above :wink:

(Note that I don't know if the description of what hashing algorithm is used is still up to date, but since the issue is still open, there's still a collision problem.)

1 Like

It's currently SipHash.

It has inputs such as the field names/types of a struct, and the identity of the enclosing crate.

Probability.

1 Like

@steffahn : Thanks for bringing this to my attention.

@H2CO3 : Thanks for pointing this out. At first, I (incorrectly) assumed typeId was some compiler incremented counter. Yeah, I'm not happy with this. Back to the redesign board. I might end up going with (file!, line!) which should be safe until someone tries declaring two structs on a single line.

Alright, so the difference between this (referring to @H2CO3 's sha256 solution) and the sort solution is:

  1. this works if the set of types is dynamic

  2. for sort solution, the pr of collision if k^2 * pr(siphash collision), whereas this is k^2 * [ pr(siphash collision) + pr(sha256 collision) ]

?

There's also column!, if that helps.

4 Likes

Why is sha256 useful here at all?

Let a: Vec<u8> = bytes written in type_id.hash(...);

Why do we want to use key sha256(bytes) instead of key bytes ?

I seems like using bytes would be strictly superior: (less work, also avoids case when sha256 might generate collision, however unlikely).

It would be superior if TypeId were backed by a proper data structure that describes the type.

However, if the original bytes came from an integer that's succeptible to hash collisions (i.e. like it does now) it doesn't matter how cryptographically secure SHA-256 is because sha256(collision) is still going to give you a collision.

1 Like

There's one feasible way to get a u64 from a TypeId, though it's still subject to breaking on implementation changes:

pub fn typeid_to_u64(x: TypeId) -> u64 {
    struct JustOneU64(Option<u64>);
    impl Hasher for JustOneU64 {
        fn finish(&self) -> u64 { todo!() }
        fn write(&mut self, _: &[u8]) { todo!() }
        fn write_u64(&mut self, val: u64) {
            if self.0.is_none() {
                self.0 = Some(val);
            }
        }
    }
    
    let mut h = JustOneU64(None);
    x.hash(&mut h);
    h.0.unwrap()
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=66223f94d1f25d5ea23da3df94d1c252

There's no way to go the other way, and fundamentally never will be.

You're assuming that TypeId's hash implementation is based on a single u64 (i.e. #[derive(Hash)] struct TypeId(u64)) so this isn't really any different from a transmute().

Honestly, I'd prefer if people just transmuted TypeId to u64 because it means readers will immediately see that you're doing something dodgy. The // Safety: comment will also acknowledge the assumptions regarding layout and lack of any stability guarantees.

Because that's exactly the point of a hash function: it compresses an arbitrary amount of data into a fixed-size digest. If you stored all the bytes written, then it could be arbitrarily long.

The point of using a cryptographic hash here is that it doesn't significantly increase the possibility of a collision if the original data is already a hash.

Hashing is usually constant-space (it certainly is in the case of the SHA-2 family), while extracting a dynamic amount of bytes into a Vec as you proposed requires a heap allocation, which is also extra work, and often a comparable amount to the act of hashing itself.

It sounds like you are optimizing for a NOT-YET-REALITy world where TypeID is changed from u64 to something longer than a Sha256.

That is highly offensive.

By the way, if you had paid attention to the issue linked above twice, you could have inferred that this is exactly what is being planned.

What is?

If there is a logical flaw in my argument, you should point out the flaw.

If there is none, it is weak to try to change the conversation to a matter of tone instead of admitting the idea makes assumptions that currently are not true.