Blog post: Making Slow Rust Code Fast

Alas, it is not! A general pattern in all of programming is to build up complex types from a limited set of primitives using type-level operations. This is not specific to JSON or serde-json; almost all non-format-agnostic serialization libraries end up eventually defining their own Value type.

I think you're confusing the internal data model of Serde with the Value enum of serde_json, which is not what we're talking about.

Just because you haven't seen something better than Serde doesn't mean that Serde doesn't have room for improvement.

Which is a good idea, that can be implemented with much less overhead than Serde. Instead of making a strawman argument defending an idea I wasn't criticizing, you could look at my first post which parts of Serde I'm actually criticizing. The high compilation time is not because of this idea. It's purely because Serde contains a lot of this bloat as characterized in my first post.

Now I made an implementation with similar properties just so I have something to point at.


When serializing, non-Serde is twice as fast.

When deserializing, non-Serde is 5.4 times as fast.

You can try it yourself:

git clone https://gitlab.com/Veverak/serde_benchmark.git
cd serde_benchmark
cargo bench

Someone could start bike-shedding about how non-Serde doesn't behave exactly as Serde, none of which would have any significant impact on performance if accommodated. Note that the serializer and deserializer implementations of the example struct, although hand-written, are identical to what a proc-macro designed for non-Serde would generate.

1 Like

Oh, even better now. I compared serialization to deserialization (as opposed to comparing Serde to non-Serde) and found it odd that serialization is slower. Then I put in a benchmark with Vec::with_capacity. BSON for Serde doesn't have the option to this, so I only did it for non-Serde.


Non-Serde is now serializing not twice as fast as Serde, but 6.4 times as fast.

I'm not confusing them. They are related. The serde data model absolutely does rely on a small number of core primitives and a small number of type-level operators (e.g. sequences and maps). This is not specific to JSON.

How so?

Please be more specific. In what situation and how does this occur?

Are there not other formats that require escaping?

That's exactly what I am trying to do, but ad hominem insults don't help with that.

1 Like

So I wrote a new serde impl from the bottom! Here's the code.

It's pretty much incomplete same as your benchmark example. It only contains serializer because it's simpler. I have nearly zero prior knowledge about the bson format so I mostly copy-pasted your code. Performance seems on par, I don't believe the difference in this scale as my macbook pro is running dozen of chromiums concurrently.

Aside, I found that the benchmark group serialization and deserialization was swapped in your repository. All three compares perf of the to_vec() despite the name deserialize.

8 Likes

Have a look at the newer commit in my repository. It fixes the swapped group labels and it adds the with_capacity benchmark that is way faster (see my previous post presenting the performance difference). You could do the same and have one benchmark with and one without with_capacity for your serializer implementation.

Criterion should be ensuring that the results are statistically significant even if you're running other tasks on the same computer.

1 Like

From an API perspective, it's really important that the MongoDB driver supports serde for maximum interoperability with the greater Rust ecosystem, even if there would be a performance penalty for doing so over supporting a custom serialization framework. And as other users have pointed out, serde is used successfully with many other serialization formats besides JSON, including BSON.

This is absolutely true, and actually my current project is to add support for this to the driver, so hopefully that should be released in the near future!


Regarding the performance of serde in general, the cases where the data model doesn't support a custom type for a given serialization format can incur overhead. For example, BSON and TOML both have a datetime type, but there is no equivalent type in the serde data model. As a result, they have to be represented as something like the equivalent of { <special magic key>: "<datetime as string>" }, which the BSON/TOML serializer can interpret and serialize accordingly. I think once specialization is stabilized, this drawback will mostly go away though, since serializers will be able to provide custom implementations for specific types. For types that are supported by the data model, there seems to be very little if any overhead, however.

Also, thank you all for highlighting some bottlenecks in the bson::to_vec implementation! After looking at a flamegraph of bson::to_vec, I noticed that a lot of time was being spent serializing the keys of a BSON array. In bson, this is implemented using u64's Display implementation, which involves a heap allocation and other expensive work. Once I updated that to use the loop from @Hyeonu's sample, I saw similar serialization times for all three. I've filed RUST-1062 on our issue tracker to ensure this improvement gets upstreamed into bson itself. Thanks again for helping discover this bottleneck!

Post array key optimization serializing a struct with a String, an int64, and an array of int64:

Edit: running it with the exact example from @Frederik's benchmark shows that bson's implementation is still a bit slower, but it's a lot closer now:

7 Likes

It's very understandable you want to have interoperability with Serde. I'd like to point out that you don't have to choose one or the other. It may possible to make the API take a data model agnostic trait, allowing the consumer of the API to make their own choice whether to use Serde or something else. Different choices could even be made for different parts of a program, allowing to performance-critical features to be implemented in one way and having other features be compatible with Serde. I'd rather like to see this for the diversity and evolution of the greater Rust ecosystem than tying the ecosystem up to the stale data model of Serde.

It's nice to see you took hint from the different implementations in the benchmark. I'm suspecting there may be various inaccuracies in the benchmark, as all three of us have got very different result. It may be better you share your forks of the benchmark. And don't forget the deserialization benchmark. It's in deserialization you'll find a lot more questionable things about Serde than in serialization.

Yeah, I think the driver API will end up just exposing the raw BSON bytes and letting users do what they want with it. We assume that they'll largely use serde to deserialize it further, but there's nothing stopping them from writing custom deserialization logic if they wish to.

The benchmark I was using can be found on this branch of my fork of bson: GitHub - patrickfreed/bson-rust at serde-perf. For serialization, I think the results that @Hyeonu and I had were pretty similar after I applied the array index/key optimization.

For deserialization, the difference between bson::from_slice and your implementation is definitely more distinct, though I think that has more to do with the implementation in bson rather than any limitation of serde, given that the msgpack serde library (rmp_serde) seems to be able to achieve similar performance. We've had a ticket open for trying to further optimize the deserializer to match rmp_serde's performance, but it hasn't been prioritized yet because, in practice, the extra ~200ns spent in deserialization is dwarfed by the time spent doing other things like network I/O (in the case of the driver), which is on the order of milliseconds. That being said, there is definitely room for improvement here, so we hope to get to looking into it eventually.

1 Like

This. I have the feeling that the bson crate was originally somewhat of a quick hack, only needed so that the MongoDB driver has something to work with. I have submitted pull requests myself in the past, improving obviously sub-optimal aspects of the library, mostly the removal of spurious allocations.

Exactly. I have yet to see a real-world, non-microbenchmark case where JSON/BSON/whatever (de)serialization is a bottleneck. I have to work with gigabyte-sized JSON files pretty regularly (thanks to the bioinformatics industry for not yet having grokked the concept of relational databases), but the initial couple of seconds spent on parsing them don't matter even there, because the rest of the processing of the extracted data takes way longer.

3 Likes

Well, it was a bug, but

Due to using sscanf, GTA Online was parsing a 10MiB JSON string at startup in O(n²) time. Fixing the parser to be properly O(n) cut load times by 70%, so it was clearly a bottleneck there.

That said, I agree that nearly any deserialization that is not asymptomatically horrible will typically easily complete in less time than it took to do the IO to load the data to deserialize. And if it doesn't... what you serialize is typically the problem before how you serialize (after low hanging optimizations). If you need to go even further beyond, taking a different approach (e.g. memmapping) will almost certainly serve you better than deserialization.

4 Likes

It is true that serde deserializes keys into a C-style enum. But this is really not as bad as you make it out to be; a C-style enum is just an integer to represent which field it is. This mapping still has to happen, because the field could be coming from a string (when serialized into a {string:value} map) or from an integer (in a non self describing format). It's literally encapsulation 101 to split field name recognition from parsing the field itself. (And hey, you can use Cow to deserialize key names if you don't want a custom enum! You'll just have to live with errors in the wrong place, because you didn't parse (don't validate) it in the correct spot.) Also,

the pathological case of a field name containing an escape sequence where there shouldn't be one

So I guess you don't care about being conformant to any general purpose textual format? They all allow escapes in keys, because they want to allow arbitrary keys, which requires escapes.

If you want to complain about serde being tailored for JSON, complain about miniserde, which actually is. And miniserde is pretty much better at being a JSON serialization framework than serde is, because it's specialized to only work for JSON.

Serde is complicated because it has to be in order to support general purpose serialization format agnostic serialization. It's a hard problem. That's not to say it's perfect — nothing is — but it's likely that any simple obvious "problem" is actually required for an important use case you didn't think about. (The exception is not being able to handle special types not in the serde data model, but I don't think that's realistically possible while remaining format agnostic.)

Also, I just want to point out that the serde data model really isn't anything special for a general purpose serialization target, and is if anything specialized for Rust, and definitely not for JSON. JSON can't represent most of the things that the serde data model can (enum variants, value-value maps) and the data has to massaged from one format to the other.

JSON values can be an array, boolean, null, string, number, or object (string-value map). Serde values can be bool, i8, i16, i32, i64, i128, u8, u16, u32, u64, u128, f32, f64, char, string, bytes, option, unit, unit struct, unit variant, newtype struct, newtype variant, seq, tuple, tuple struct, tuple variant, map (value-value), struct (key-value), or struct variant. I fail to see how that's optimized specifically for JSON. (Because it isn't.)

That said, #[serde] attributes do often describe what they do in terms of the corresponding JSON rather than the effect on the serde data model representation. This is primarily due to JSON being a familiar target to talk and care about. But that isn't because they only serve JSON; they work perfectly for any self describing format, and many work for any format.

8 Likes

The raw BSON serializer / deserializer used in bson::to_vec and bson::from_slice were written recently actually (by yours truly), so I think it's more that we haven't had the chance to micro-optimize it yet rather than it being hacky.

More generally speaking, bson was originally a community maintained library, but it's since been transferred to MongoDB the company and these days is being actively maintained by my colleagues and me. So while there is some technical debt in it that needs addressing, it is a fully supported library intended to be production-ready. If you have any ideas for improvements though, we'd love to hear about them on our GitHub issue tracker or Jira project!

3 Likes

Online services serving many concurrent users/requests do spend a lot of time in deserializing user request and serializing response, since data is mostly cached in memory. That is why json is not considered for good communication between internal services. I have seen this in many production services.

3 Likes

I like that you added JSON and Message Pack for comparison. Maybe also add CBOR?

You can't make such a conclusion when comparing apples to oranges. You have to look into the details.

rmp_serde::to_vec achieves high performance by not serializing field names. You have to use rmp_serde::to_vec_named for a fair comparison.

Also to consider (after the important detail above) is that Message Pack is not BSON.

  • Arrays are prefixed with the number of items, so arrays can be allocated upfront with the correct capacity when deserializing.
  • Byte lengths of arrays and objects are not encoded, so no need to backtrack when serializing.
  • Keys of array elements are not encoded, significantly reducing the number of bytes to serialize and deserialize.

For further performance improvements in deserializing arrays, they could be allocated in a bump arena. This reduces the number of allocations (and deallocations) for any format, but is particularly helpful in making BSON competitive with Message Pack, because arrays could have a preallocated capacity for a pretty large number of items, and any capacity that is left over after the end of one array can be immediately used for the next array.

There's async for that. So what matters depends on what you're measuring.

thank you for careful read, this mistake is on me. new numbers (EDIT: for older, different benchmark) are less impressive, however there is still ~2.8x difference even for simple struct {i: i64} (and potentially i: ObjectId - this was the main reason for the benchmark as it was an order of magnitude slower).

bioinformatics is on a heavy side of postprocessing. on my websites & lighter number crunching projects, bson was really visible & resulting perf was worse than (c++,) python & even one old php webpage i rewrote in rust. it's significalntly better now, but numbers in this thread suggest there is still room to do several double digits improvements.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.