Blog post: Making Slow Rust Code Fast

That's not what I consider designing to be fast. That's what I consider baseline for any program written in Rust.

The entire data model is entirely biased towards serde_json.

It's not because of static dispatch. It's because of bloat required by the data model.

1 Like

Maybe you have too high standard on general purpose serialization libraries, because the serde is the only thing I know which is fully statically dispatched type and format agnostic and ready for casual use, across all the widely used languages. By casual use I mean you can use it without much effort like writing state machines. The usual approach in this domain was to use reflections or to expose raw maps and arrays, or to expose streaming parser internals like RapidJSON I mentioned above. Serde archives the pros of both by utilizing Rust's macro and trait system, from the cost of increased compile time. Please reply me if you know another implementation with similar or better properties as I'm very interested in this topic.

The model doesn't fit 1:1 to the JSON data model, Usually getting &str is impractical in JSON as the string may contains some escape sequence. And you can see here how inefficient the serde_json handles the &[u8]. But it's the only way to consistently handle byte sequence in JSON to treat it as a sequence of numbers.

Can you elaborate more on it? Generally static dispatch generates O(N * M) code from the N and M combination while dynamic dispatch generates O(N + M).


Alas, it is not! A general pattern in all of programming is to build up complex types from a limited set of primitives using type-level operations. This is not specific to JSON or serde-json; almost all non-format-agnostic serialization libraries end up eventually defining their own Value type.

I think you're confusing the internal data model of Serde with the Value enum of serde_json, which is not what we're talking about.

Just because you haven't seen something better than Serde doesn't mean that Serde doesn't have room for improvement.

Which is a good idea, that can be implemented with much less overhead than Serde. Instead of making a strawman argument defending an idea I wasn't criticizing, you could look at my first post which parts of Serde I'm actually criticizing. The high compilation time is not because of this idea. It's purely because Serde contains a lot of this bloat as characterized in my first post.

Now I made an implementation with similar properties just so I have something to point at.

When serializing, non-Serde is twice as fast.

When deserializing, non-Serde is 5.4 times as fast.

You can try it yourself:

git clone
cd serde_benchmark
cargo bench

Someone could start bike-shedding about how non-Serde doesn't behave exactly as Serde, none of which would have any significant impact on performance if accommodated. Note that the serializer and deserializer implementations of the example struct, although hand-written, are identical to what a proc-macro designed for non-Serde would generate.

1 Like

Oh, even better now. I compared serialization to deserialization (as opposed to comparing Serde to non-Serde) and found it odd that serialization is slower. Then I put in a benchmark with Vec::with_capacity. BSON for Serde doesn't have the option to this, so I only did it for non-Serde.

Non-Serde is now serializing not twice as fast as Serde, but 6.4 times as fast.

I'm not confusing them. They are related. The serde data model absolutely does rely on a small number of core primitives and a small number of type-level operators (e.g. sequences and maps). This is not specific to JSON.

The Serde data model doesn't allow simply parsing a field name, but instead requires bloating deserializers with an obscene amount of code defining a bunch of structures and trait implementations for each struct only to correctly handle the pathological case of a field name containing an escape sequence where there shouldn't be one. Don't even try to tell this isn't JSON specific. It's not like I don't know what I'm talking to. If you want to make a counter-argument, you could at least try to understand my argument first and look into the details.

1 Like

How so?

Please be more specific. In what situation and how does this occur?

Are there not other formats that require escaping?

That's exactly what I am trying to do, but ad hominem insults don't help with that.

1 Like

So I wrote a new serde impl from the bottom! Here's the code.

It's pretty much incomplete same as your benchmark example. It only contains serializer because it's simpler. I have nearly zero prior knowledge about the bson format so I mostly copy-pasted your code. Performance seems on par, I don't believe the difference in this scale as my macbook pro is running dozen of chromiums concurrently.

Aside, I found that the benchmark group serialization and deserialization was swapped in your repository. All three compares perf of the to_vec() despite the name deserialize.


Have a look at the newer commit in my repository. It fixes the swapped group labels and it adds the with_capacity benchmark that is way faster (see my previous post presenting the performance difference). You could do the same and have one benchmark with and one without with_capacity for your serializer implementation.

Criterion should be ensuring that the results are statistically significant even if you're running other tasks on the same computer.

1 Like

From an API perspective, it's really important that the MongoDB driver supports serde for maximum interoperability with the greater Rust ecosystem, even if there would be a performance penalty for doing so over supporting a custom serialization framework. And as other users have pointed out, serde is used successfully with many other serialization formats besides JSON, including BSON.

This is absolutely true, and actually my current project is to add support for this to the driver, so hopefully that should be released in the near future!

Regarding the performance of serde in general, the cases where the data model doesn't support a custom type for a given serialization format can incur overhead. For example, BSON and TOML both have a datetime type, but there is no equivalent type in the serde data model. As a result, they have to be represented as something like the equivalent of { <special magic key>: "<datetime as string>" }, which the BSON/TOML serializer can interpret and serialize accordingly. I think once specialization is stabilized, this drawback will mostly go away though, since serializers will be able to provide custom implementations for specific types. For types that are supported by the data model, there seems to be very little if any overhead, however.

Also, thank you all for highlighting some bottlenecks in the bson::to_vec implementation! After looking at a flamegraph of bson::to_vec, I noticed that a lot of time was being spent serializing the keys of a BSON array. In bson, this is implemented using u64's Display implementation, which involves a heap allocation and other expensive work. Once I updated that to use the loop from @Hyeonu's sample, I saw similar serialization times for all three. I've filed RUST-1062 on our issue tracker to ensure this improvement gets upstreamed into bson itself. Thanks again for helping discover this bottleneck!

Post array key optimization serializing a struct with a String, an int64, and an array of int64:

Edit: running it with the exact example from @Frederik's benchmark shows that bson's implementation is still a bit slower, but it's a lot closer now:


It's very understandable you want to have interoperability with Serde. I'd like to point out that you don't have to choose one or the other. It may possible to make the API take a data model agnostic trait, allowing the consumer of the API to make their own choice whether to use Serde or something else. Different choices could even be made for different parts of a program, allowing to performance-critical features to be implemented in one way and having other features be compatible with Serde. I'd rather like to see this for the diversity and evolution of the greater Rust ecosystem than tying the ecosystem up to the stale data model of Serde.

It's nice to see you took hint from the different implementations in the benchmark. I'm suspecting there may be various inaccuracies in the benchmark, as all three of us have got very different result. It may be better you share your forks of the benchmark. And don't forget the deserialization benchmark. It's in deserialization you'll find a lot more questionable things about Serde than in serialization.

Yeah, I think the driver API will end up just exposing the raw BSON bytes and letting users do what they want with it. We assume that they'll largely use serde to deserialize it further, but there's nothing stopping them from writing custom deserialization logic if they wish to.

The benchmark I was using can be found on this branch of my fork of bson: GitHub - patrickfreed/bson-rust at serde-perf. For serialization, I think the results that @Hyeonu and I had were pretty similar after I applied the array index/key optimization.

For deserialization, the difference between bson::from_slice and your implementation is definitely more distinct, though I think that has more to do with the implementation in bson rather than any limitation of serde, given that the msgpack serde library (rmp_serde) seems to be able to achieve similar performance. We've had a ticket open for trying to further optimize the deserializer to match rmp_serde's performance, but it hasn't been prioritized yet because, in practice, the extra ~200ns spent in deserialization is dwarfed by the time spent doing other things like network I/O (in the case of the driver), which is on the order of milliseconds. That being said, there is definitely room for improvement here, so we hope to get to looking into it eventually.

1 Like

This. I have the feeling that the bson crate was originally somewhat of a quick hack, only needed so that the MongoDB driver has something to work with. I have submitted pull requests myself in the past, improving obviously sub-optimal aspects of the library, mostly the removal of spurious allocations.

Exactly. I have yet to see a real-world, non-microbenchmark case where JSON/BSON/whatever (de)serialization is a bottleneck. I have to work with gigabyte-sized JSON files pretty regularly (thanks to the bioinformatics industry for not yet having grokked the concept of relational databases), but the initial couple of seconds spent on parsing them don't matter even there, because the rest of the processing of the extracted data takes way longer.


Well, it was a bug, but

Due to using sscanf, GTA Online was parsing a 10MiB JSON string at startup in O(n²) time. Fixing the parser to be properly O(n) cut load times by 70%, so it was clearly a bottleneck there.

That said, I agree that nearly any deserialization that is not asymptomatically horrible will typically easily complete in less time than it took to do the IO to load the data to deserialize. And if it doesn't... what you serialize is typically the problem before how you serialize (after low hanging optimizations). If you need to go even further beyond, taking a different approach (e.g. memmapping) will almost certainly serve you better than deserialization.


It is true that serde deserializes keys into a C-style enum. But this is really not as bad as you make it out to be; a C-style enum is just an integer to represent which field it is. This mapping still has to happen, because the field could be coming from a string (when serialized into a {string:value} map) or from an integer (in a non self describing format). It's literally encapsulation 101 to split field name recognition from parsing the field itself. (And hey, you can use Cow to deserialize key names if you don't want a custom enum! You'll just have to live with errors in the wrong place, because you didn't parse (don't validate) it in the correct spot.) Also,

the pathological case of a field name containing an escape sequence where there shouldn't be one

So I guess you don't care about being conformant to any general purpose textual format? They all allow escapes in keys, because they want to allow arbitrary keys, which requires escapes.

If you want to complain about serde being tailored for JSON, complain about miniserde, which actually is. And miniserde is pretty much better at being a JSON serialization framework than serde is, because it's specialized to only work for JSON.

Serde is complicated because it has to be in order to support general purpose serialization format agnostic serialization. It's a hard problem. That's not to say it's perfect — nothing is — but it's likely that any simple obvious "problem" is actually required for an important use case you didn't think about. (The exception is not being able to handle special types not in the serde data model, but I don't think that's realistically possible while remaining format agnostic.)

Also, I just want to point out that the serde data model really isn't anything special for a general purpose serialization target, and is if anything specialized for Rust, and definitely not for JSON. JSON can't represent most of the things that the serde data model can (enum variants, value-value maps) and the data has to massaged from one format to the other.

JSON values can be an array, boolean, null, string, number, or object (string-value map). Serde values can be bool, i8, i16, i32, i64, i128, u8, u16, u32, u64, u128, f32, f64, char, string, bytes, option, unit, unit struct, unit variant, newtype struct, newtype variant, seq, tuple, tuple struct, tuple variant, map (value-value), struct (key-value), or struct variant. I fail to see how that's optimized specifically for JSON. (Because it isn't.)

That said, #[serde] attributes do often describe what they do in terms of the corresponding JSON rather than the effect on the serde data model representation. This is primarily due to JSON being a familiar target to talk and care about. But that isn't because they only serve JSON; they work perfectly for any self describing format, and many work for any format.


The raw BSON serializer / deserializer used in bson::to_vec and bson::from_slice were written recently actually (by yours truly), so I think it's more that we haven't had the chance to micro-optimize it yet rather than it being hacky.

More generally speaking, bson was originally a community maintained library, but it's since been transferred to MongoDB the company and these days is being actively maintained by my colleagues and me. So while there is some technical debt in it that needs addressing, it is a fully supported library intended to be production-ready. If you have any ideas for improvements though, we'd love to hear about them on our GitHub issue tracker or Jira project!


Online services serving many concurrent users/requests do spend a lot of time in deserializing user request and serializing response, since data is mostly cached in memory. That is why json is not considered for good communication between internal services. I have seen this in many production services.

1 Like

What I was criticizing was the obscene amount of code generated, not that there is an enum somewhere in it. The enum itself, as you say, is very innocent.

That's precisely what shouldn't happen. A non-self-describing format shouldn't be implemented with an inefficiency that is required only for self-describing formats. Simply deserialize each value in order directly into the struct to be returned. No need to have a loop match field IDs to deserialize values into Options that need to be unwrapped in the end.

Which again is because of a pathological case in JSON. BSON doesn't allow non-canonical keys shouldn't need Cow. As can be seen in the code I wrote for the benchmark, field names are matched directly in the input without even deserializing them as string slices first.

I didn't mean we shouldn't care. I clearly said BSON isn't a text format doesn't need this additional complexity that was designed for JSON. Besides, Serde_derive isn't conformant with all the bad parts of JSON anyway, as it returns an error when encountering duplicate keys in a struct, whereas other parsers would take the last value of each field, happily ignoring errors in anything except the last value. If Serde decides to return an error on duplicate fields, it could as well return an error on non-canonical field names, thereby taking plenty of complexity out of deserializers.

Obviously I wouldn't complain at Miniserde being tailored for JSON if its sole purpose is to work with JSON. Seems like a superior approach.

That's the point I was making. And this use case is BSON, and only BSON, which doesn't require all the complexity of other formats.

I wasn't thinking of the superficial view of the set of primitive types you're listing. I was thinking of the semantics of structs and errors.