How to optimize lookups in Generic functions? without static?

It might have been asked a million times. I am just bad at finding the solution.

The basic problem: I have 3000+ struct types that share common trait. I want to implement JSON serialization logic for the trait that dispatches to the concrete types. All is well if I settle for linear complexity of iterating the types. I struggle to figure out how to use Map that can give logarithmic or constant performance.

Here is the basic code I have now:

    pub trait Polymorphic: std::fmt::Debug {
        fn obj_type_id(&self) -> TypeId;
        fn to_polymorphic_ref<'a>(&'a self) -> &'a dyn Polymorphic;
        fn to_polymorphic_mut<'a>(&'a mut self) -> &'a mut dyn Polymorphic;
    }

    impl serde::Serialize for dyn Polymorphic {
        fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: serde::Serializer,
        {
            serialize_polymorphic(self, serializer)
        }
    }

    fn serialize_polymorphic<S>(p: &dyn Polymorphic, serializer: S) -> Result<S::Ok, S::Error> 
        where S: serde::Serializer {
        if let Some(cat) = cast_ref::<Cat>(p) {
            return Cat::serialize(cat, serializer);
        }
        if let Some(dog) = cast_ref::<Dog>(p) {
            return Dog::serialize(dog, serializer);
        }
        if let Some(animal) = cast_ref::<Animal>(p) {
            return Animal::serialize(animal, serializer);
        }
        if let Some(data_object) = cast_ref::<DataObject>(p) {
            return DataObject::serialize(data_object, serializer);
        }
        Err(serde::ser::Error::custom("Unknown type"))
    }

Goal is to use constant complexity of matching the type and not linear.

I want to change the series of 'if let' statements to map lookup using the TypeId of the objects. I have this working with de-serialization.

Functionally typetag and/or erased_serde work. However the 3000+ struct types and around 400 trait types in which the data structs are arranged explode the executable size and compile time enormously. I think typetag generates a ton of code for each of the 400 trait types. That is not necessary in my case. In my case the whole type system is a classic Java like library with single root type and very deep inheritance relationships (I tried user tree of enums first and that works great except it is inhumanely complex to write code against 10 level deep nested enums. That is why I switched to traits with some traitcast voodoo). So in my case a single serialize and deserialize implementation shared by all traits works great.

I did the deserialization part more or less.The good thing is that the visitor in serde deserialize is not so generic

The subjectively more trivial Serialize trait proves a bit of a problem. Some things I considered so far:

a. use a static HashMap<TypeId, SerializaitonFn> - does not work as even when a static is embedded in generic function there is only a single static instance. It is explained in the Rust book.
b. use a two level mapping using the TypeId of the Serializer upon which serde::Serialize trait is declared generic e.g. HashMap<SerializerTypeId, HashMap<SerlializedTypeId, SerializationFn>>. This does not work unless I enter unsafe waters as I cannot declare the map with the SerializaitonFn using the generic type of the serializer
c. Not tried yet but may be I need to rip out the erased_serde type erasure logic. I cannot really get my head around it. That is what is stopping me. A pointer to a good read on this will help

For what it is worth GPT 4o and Gemini did not provide a working solution. They are very bad with Rust. If they fix one problem in such complex puzzle they would add 4 or 5 new problems to think through.

Any advice or help will be appreciated.

since you are already using dyn Polymorphic, why not do the dynamic dispatch for the serialization too? unless I misunderstood your requirement, you are only concerned about json, right? if so, although the serde::Serialize is not dyn compatible, you can create a custom trait for json serialization that is dyn compatible. you can also add a blanket implementation for types that already implements serde::Serialize.

2 Likes

First off thank you! It is not a bad idea.

you are only concerned about json, right?

Yes, I already compromised to JSON only in some sense with the use of serde_json::value::RawValue to parse and not entirely copy the underlying JSON bytes in search of embedded discriminator.

you can create a custom trait for json serialization that is dyn compatible. you can also add a blanket implementation for types that already implements serde::Serialize

The issue is I need serde:Serialize implemented on the traits as to use #[derive(serde::Serialize, serde::Deserialize)] on the struct types. The traits are referred to by fields in the structs. For testing in my Cat test struct has pub friend: Option<Box<dyn AnimalTrait>>,. So to use #[derive(serde::Serialize, serde::Deserialize)] on Cat I need dyn AnimalTrait to implement serde traits.

I imagine only one functional implementation on the root Polymorphic type and then delegate to it from the others like so :

    impl serde::Serialize for dyn AnimalTrait {
        fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where
            S: serde::Serializer,
        {
            serialize_polymorphic(self.to_polymorphic_ref(), serializer)
        }
    }

I am gonna look into a hybrid approach in the next hours i.e. use erased_serde to cater for Polymorphic and then have all other trait implementations delegate to it. This would not be as efficient as I would like I suppose. Rust language and libraries do not seem to be optimized for handling extra large Java like class hierarchies. It may just work, cut compile time by a minute and shed off few tens of MBs of unnecessary binary code that typetag produces.

that explanation is about how to make an erasable "shadow" trait for the serde::Serialize trait, which is the easy part; the real hard part is how to make the Serializer trait erasable. the wrapper type erased_serde::ser::erase::Serializer<S> is essentially a state machine, very similar to how you would imagine async functions are desugared.

I don't have know good description of the exact technique used by erased_serde, but here's a blog post about how to craft a dyn compatible wrapper type for traits with async methods:

https://smallcultfollowing.com/babysteps/blog/2021/10/07/dyn-async-traits-part-4/

although the example AsyncIter trait in the post is very different from serde, it can still give you some good ideas, and you can read the full series if you desire. fun fact, it just happened that part 6 of the series (briefly) mention erased_serde. note, the blog series were written before async_fn_in_trait was stablized.

1 Like

I see, so you are using trait objects as fields of other structs. then a custom dyn compatible SerializeToJson is not enough, but I believe you can still use erased_serde for the dynamica dispatch, because erased_serde can interop with serde in both directions:

  • when your type implements or derives serde::Serialize, it automatically implements erased_serde::Serialize. actually, it is sealed, only this blanket implementation will work.

  • the trait object dyn erased_serde::Serialize also implements serde::Serialize so structs can derive serde::Serialize when using it as a field.

    • can also be implemented for traits which have erased_serde::Serialize as a supertrait, see serialize_trait_object,

so the Polymorphic trait in your example can just use erased_serde::Serialize as a supertrait, and everything should just work.

    pub trait Polymorphic: std::fmt::Debug + erased_serde::Serialize {
        fn obj_type_id(&self) -> TypeId;
        fn to_polymorphic_ref<'a>(&'a self) -> &'a dyn Polymorphic;
        fn to_polymorphic_mut<'a>(&'a mut self) -> &'a mut dyn Polymorphic;
    }
    erased_serde::serialize_trait_object(Polymorphic);
2 Likes

Thank you! It worked. I am not seeing much improvement in build times or executable size though. So (1) typetag works well even with many traits or at least my optimizaiton does not help. (2)I need to analyze more

For future readers I needed one last tweak. When serde::Serialize is implemented for a trait a bit of lifetime hints keep the compiler happy.

changes to
impl<'s> serde::Serialize for dyn AnimalTrait + 's {

without this working with &dyn AnimalTrait may throw strange lifetime errors.