Serde - optimizing untagged enums... with threads?

Untagged enums are slow How to deserialize untagged enums fast?

So we're thinking, how can we optimize untagged enums? One "obvious" way seems to use reflection. Specifically, Deserializer gives us a pretty good look into the deserialization process - precisely what we need to emulate reflection.

So all you gotta do is spawn some threads for all your variants and then drive the main Deserializer based on what the threads do, yeah? And this should be much faster?

I am very skeptical. The overhead of starting and communicating between threads is much much bigger than what I would imagine the overhead of untagged enums is.

1 Like

Untagged enums grow overhead through deep nesting, but yes that's a good point. Maybe if there was a way to suspend a call stack and resume/start a different one in the same thread...

Here's an extreme example:

#![recursion_limit="256"]

use serde::Deserialize;
use serde::de::IgnoredAny;
use serde_json::from_str;

#[derive(Deserialize)]
#[serde(untagged)]
enum VerySlow<T> {
  _V0(T),
  _V1(T),
  _V2(T),
  _V3(T),
  _V4(T),
  _V5(T),
  _V6(T),
  _V7(T),
  _V8(T),
  _V9(T),
}

#[derive(Deserialize)]
#[serde(transparent)]
struct Forward<T>(T);

fn main() {
    type X<T> = VerySlow<T>;
    type X1<T> = X<X<X<X<X<X<X<X<X<X<T>>>>>>>>>>;
    type X2<T> = X1<X1<X1<X1<X1<X1<X1<X1<X1<X1<T>>>>>>>>>>;
    //type X3<T> = X2<X2<T>>;
    let time = std::time::Instant::now();
    from_str::<X2<IgnoredAny>>("[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]").unwrap();
    let time = std::time::Instant::now() - time;
    println!("time: {:?}", time);
    
    type Y<T> = Forward<T>;
    type Y1<T> = Y<Y<Y<Y<Y<Y<Y<Y<Y<Y<T>>>>>>>>>>;
    type Y2<T> = Y1<Y1<Y1<Y1<Y1<Y1<Y1<Y1<Y1<Y1<T>>>>>>>>>>;
    //type Y3<T> = Y2<Y2<T>>;
    let time = std::time::Instant::now();
    from_str::<Y2<IgnoredAny>>("[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]").unwrap();
    let time = std::time::Instant::now() - time;
    println!("time: {:?}", time);
}

Playground link

Running it gives results like: (and this is the success case)

time: 1.991118ms
time: 49.395µs

A more practical example would involve an arbitrary input limit - like 2 MB - and a reasonable use of untagged enums, not as extreme as this. The idea being the attacker needs to make a 2MB input that grinds the serde to a halt. Note also that each depth of nesting adds linearly to the RAM usage, so a 2MB input would use, say, 1GB for one level of untagged enum, 2GB of two levels, 3GB for 3 levels, 4GB for 4 levels, and so on. (This is because serde needs to parse the input into the internal serde_value-like thing.)

It does seem impractical to try and solve this problem with threads, yes. Altho we do have real code that would have made use of untagged enums if we hadn't noticed this small potential issue with them... and it has about 5 levels of nesting.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.