std::any::TypeId <-> u64?

Question: is there a way to convert a std::any::TypeId <-> u64 ?

Objection: This is very unsafe. You should not do this. This might change across compiler versions, target, etc ...

Response: I have a single *.wasm file. It runs on both index.html and in webworkers (the exact identical *.wasm file).

I want to take a TypeID on index.html/main.wasm
convert it to a u64

use postMessage to send it to webworker/main.wasm

then have webworker/main.wasm do u64 -> TypeID

There is no sanctioned/correct way. TypeId is not supposed to be sent between processes or address spaces. And the type is definitely not going to remain layout-compatible with u64 forever, because issue 10389 needs to be addressed. (See also.)

8 Likes

It’s fantastically unsafe, because you need to know that both sides are actually running the same binary: I don’t know what all goes into the TypeId hash, and therefore what might make it change— Even recompiling identical source code might do it.

That said, transmute will do the job:

unsafe fn typeid_to_u64(id:TypeId)->u64 { std::mem::transmute(id) }
unsafe fn u64_to_typeid(id:u64)->TypeId { std::mem::transmute(id) }
1 Like

I guess one question I have is: is TypeID determined 100% at compile time, or is it partially determined at runtime ?

Because I can guarantee that both index.html and webworker are running the same identical *.wasm file, but the runtime environments might be slightly different as one is initiated in a webworker and another is initiated in index.html

It's determined at compile-time

What are you intending to use it for?

Implementing something like typtag, for serializing / deserializing dyn trait objects, only to be sent between index.html/main.wasm and webworker/main.wasm .

So, in particular, the serializer is going to package the object as (u64 = typeid, data = Vec<u8>), then the deserializer is going to look at the u64 to know which deserializer to call.

You're going to have to manually create a list of types to lookup the TypeIds anyway aren't you? You could just as easily make it a string or a number you chose if so

With https://crates.io/crates/inventory we don't have to define all the types at once, we can define them locally whenever. If we manually assign a string / number, there is a (tiny) chance of collision. With TypeID, it's (afaik) guaranteed to be unique.

It is an explicit goal to make TypeId not the same size as u64 and break the unsound uses in the wild (see @cole-miller's links). So I suggest you not go the u64 route. In fact, the fix involves adding more information behind a pointer, so serializing the bytes of TypeId won't work either.

If you have a fixed set of types, you could sort their TypeIds and use the index for serialization.

Whatever you settle on, if you're relying on TypeId, make sure everything used the same compiler version and invocation flags. Probably this means use the same executable (which you said you were doing, so good).

8 Likes

I think just detecting a duplicate and panicking at launch would be a reasonable way to handle that, rather than trying to do wacky unsafe shenanigans with TypeId

1 Like

It does look like any.rs - source Ord is defined.

This is clever. I can't believe I never thought of it.

2 Likes

I don't think the inventory crate works on wasm32-unknown-unknown because WebAssembly doesn't let you do the same __attribute__((constructor)) trickery where you can automagically set one or more functions to be executed on startup. Or at least, Rust has no way to tel LLVM to do the right thing.

https://github.com/mmastrac/rust-ctor/issues/14

You could just hash the TypeID using a fixed hashing algorithm, and perhaps store a global map of TypeID <-> u64, updated dynamically.

  1. I tried compiling it: inventory 0l.3.1 on rustc 1.62.1

  2. I have not tried running it yet.

  3. Thanks for the notification.

  4. This will definitely be inconvenient, but I think I can get around inventory not running on wasm32.

Instead of doing unreliable trickery with TypeId and hoping for the best, and given you have complete control over both sides, could you use something like the typename crate to get a stable identifier (the type's fully qualified name) for your type tag?

Your crate above claims to be deprecated and links to type_name in std::any - Rust .

Quoting that page:

For example, amongst the strings that type_name::<Option<String>>() might return are "Option<String>" and "std::option::Option<std::string::String>".

The returned string must not be considered to be a unique identifier of a type as multiple types may map to the same type name. Similarly, there is no guarantee that all parts of a type will appear in the returned string: for example, lifetime specifiers are currently not included. In addition, the output may change between versions of the compiler.

The "not unique identifier" part makes this not a good choice.

That's exactly why you were directed to the crate instead of the standard, non-stable std::any::type_name().

1 Like

@H2CO3 : Thank you for pointing out my error. :slight_smile:

@Michael-F-Bryan : Sorry for misinterpreting your suggestion. I saw the "depreciation notice' and ignored the rest of the page. :slight_smile:

1 Like

In the meantime, I implemented the "hash the type ID with a cryptographically strong hash" idea in this Playground. Here's the gist of it:

static TYPE_MAP: Lazy<Mutex<HashMap<TypeId, u64>>> = Lazy::new(Default::default);

#[derive(Clone, Default, Debug)]
struct Sha256Hasher {
    sha: Sha256,
}

impl Hasher for Sha256Hasher {
    fn write(&mut self, bytes: &[u8]) {
        self.sha.update(bytes);
    }
    
    fn finish(&self) -> u64 {
        let mut raw: [u8; 8] = <_>::default();
        for chunk in self.sha.clone().finalize().chunks(8) {
            for (dst, &src) in raw.iter_mut().zip(chunk) {
                *dst ^= src;
            }
        }
        u64::from_le_bytes(raw)
    }
}

fn hash_type_id(type_id: TypeId) -> u64 {
    let mut state = Sha256Hasher::default();
    type_id.hash(&mut state);
    state.finish()
}

fn get_type_id_raw(type_id: TypeId) -> u64 {
    TYPE_MAP
        .lock()
        .expect("poisoned mutex: TYPE_MAP")
        .entry(type_id)
        .or_insert_with(|| hash_type_id(type_id))
        .clone()
}
1 Like

Sorry, I missed something important. What is wrong with this solution that we are considering other solutions?