Serde and serde_json 1.0.0 released


The primary focus of this release is on polish: making Serde easier to learn and more enjoyable to use. That said, this wouldn't be Serde without some amazing new features to tell you about as well.

In celebration of this milestone, I would like to highlight some of the work that has contributed toward making Serde the powerful, efficient, adaptable, convenient beast that exists today. This is the unfinished history of serialization in Rust.

  • March 2011 — Patrick Walton adds support for serializing Rust object metadata into a format called EBML (reader and writer).
  • November 2011 — Elly Jones adds a JSON library called std::json.
  • February 2012 — Niko Matsakis creates a utility to generate serialization code for data structures, eliminating the grunt work of writing it by hand. The first serializer and deserializer traits appear. The same traits still exist practically unchanged in rustc_serialize today.
  • May 2014 — Erick Tryzelaar begins Serde, exploratory work to address some severe, foundational limitations in the standard library's approach to deserialization.
  • December 2014 — Alex Crichton begins the deprecation of libserialize. It is clear that the design has a number of drawbacks, and there aren't resources to address them in time for Rust 1.0. The code is moved out of the rustc codebase into the rustc_serialize crate.
  • March 2015 — Erick Tryzelaar releases Serde 0.2.0, proving decisively that the limitations in libserialize can be resolved by a better approach.
  • May 2015 — Rust 1.0 is released. :cake:
  • April 2017 — Serde is impressively fast, battle-tested in production use cases, and ready for a solid 1.0 release.

Thank you to more than 100 people who have contributed improvements, opened issues, provided help to others, or expressed their frustrations with Serde in a constructive way. We appreciate your continued support.

Zero-copy deserialization

Serde 1.0 adds the ability to deserialize into types that borrow from the input data. For example, this JSON document can now be deserialized efficiently and safely into the following Rust struct.

{
    "fingerprint": "0xF9BA143B95FF6D82",
    "location": "Menlo Park, CA"
}
#[derive(Deserialize)]
struct User<'a> {
    fingerprint: &'a str,
    location: &'a str,
}

The value for each field will be a string slice that refers to the correct substring of the input data. No memory is allocated for the strings and no data is copied. The same thing also works for &[u8] slices. Other types can be borrowed explicitly by using the #[serde(borrow)] attribute.

This uniquely Rust-y feature would be impossible or recklessly unsafe in languages other than Rust. The semantics of Rust guarantee that the input data outlives the period during which the output struct is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output struct still refers to it.

Zero-copy deserialization is discussed in more detail on this page of the website.

Remote derive

One recurrent struggle with Serde is needing to use types from somebody else's crate where they do not provide Serialize and Deserialize impls.

Rust's "orphan rule" allows writing a trait impl only if either your crate defines the trait or defines one of the types the impl is for. That means if your code uses a type defined outside of your crate without a Serde impl, you resort to newtype wrappers or other obnoxious workarounds for serializing it.

Serde 1.0 now supports deriving impls for structs and enums defined in other people's crates.

// Serde calls this the definition of the remote type. It is just a copy of the
// remote type. The `remote` attribute gives the path to the actual type.
#[derive(Serialize, Deserialize)]
#[serde(remote = "nifty::Duration")]
struct DurationDef {
    secs: i64,
    nanos: i32,
}

// Now the remote type can be used almost like it had its own Serialize and
// Deserialize impls all along. The `with` attribute gives the path to the
// definition for the remote type. Note that the real type of the field is the
// remote type, not the definition type.
#[derive(Serialize, Deserialize)]
struct Process {
    command_line: String,

    #[serde(with = "DurationDef")]
    wall_time: nifty::Duration,
}

Remote derive is covered on this page of the website, which also includes an example of handling a remote struct with private fields.

Documentation

Lots of new content on the Serde website. The following new pages in particular are a valuable read even for experienced Serde users.


Is Serde done?

As of this release, Serde is stable but not done. We say with confidence that Serde is the best serialization solution that we can design for the Rust language as it exists today and for the broad range of use cases that we target.

The Serde issue tracker has a long list of planned enhancements for you to look forward to and maybe help us implement. For the first time, none of them involve further breaking changes.


Breaking changes

  • Zero-copy deserialization involves tracking a 'de lifetime.

    The Deserialize and Deserializer traits as well as some other deserialization-related traits now have a 'de lifetime parameter. These lifetimes enable Serde to perform efficient and safe zero-copy deserialization across a variety of data formats. The website has a page on understanding deserializer lifetimes.

  • Method Deserializer::deserialize has been renamed to deserialize_any.

    The old name did nothing to clarify the purpose of this method in contrast to any of the other Deserializer trait methods. Also the existence of a method named deserialize on two different Serde traits sometimes required them to be explicitly disambiguiated using UFCS.

  • Deserializer-provided accessor traits have been renamed.

    The SeqVisitor, MapVisitor, EnumVisitor, and VariantVisitor traits have been renamed to SeqAccess, MapAccess, EnumAccess, and VariantAccess. The old names conflicted with the Visitor trait which plays a totally different role, and this was a source of confusion for new and even experienced Serde users. What's more, these accessor traits were never really playing the role of a visitor in any common sense. The are provided by the Deserializer to a Visitor to give it the ability to access elements of a data structure.

  • Some questionable implicit type conversions are gone.

    For example, a JSON null will no longer successfully deserialize if the output type is a string. Such conversions violated the objective that if Serde successfully deserializes some input into a particular data structure, you can be condient that the input matches what your data structure says it should look like.

  • ValueDeserializer has been renamed to IntoDeserializer.

    The new name aligns with traits that play a similar role in other parts of the Rust ecosystem, such as IntoIterator and IntoFuture.

  • The concept of a fixed-size sequence is gone in favor of tuple.

    Previous versions of Serde represented fixed-size arrays like [u64; 3] and tuples like (u64, u64, u64) as different types in the data model. From the perspective of a Serde data format the distinction between these types never matters, so fixed-size sequences have been removed from the data model.

  • Method deserialize_struct_field has been renamed to deserialize_identifier.

    This method has existed since Serde 0.7 but has always been used for both struct fields and enum discriminants. The new name reflects that this method is used to identify what to deserialize next—whether that is to decide which field of a struct we are looking at, or which variant of an enum.

  • Variant indices during deserialization have changed from usize to u32.

    Previous versions of Serde treated the index of a variant in an enum as usize while serializing and as u32 while deserializing. This inconsistency has been resolved by standardizing on u32.

  • The serde_test Token API has been redesigned.

    The serde_test crate is great for unit testing implementations of Serialize and Deserialize without being tied to any particular data format. The API had stagnated over the past few releases so this redesign brings it in line with modern naming conventions and Serde concepts.

  • Built-in visitors are private.

    The Visitor implementations used for deserializing standard library types have been made private. These are an implementation detail and were not designed to be used by other crates.

  • Byte-related wrapper types have moved to the serde_bytes crate.

    The Bytes and ByteBuf wrappers are a stopgap solution until specialization enables us to handle &[u8] and Vec<u8> in an optimized way by default.

  • Size hint methods have changed from (lower, upper) bounds to Option<usize>.

    The size hint on the standard library Iterator trait returns lower and upper bounds in order to optimize handling of complicated iterator chains. This use case never arises during Serde deserialization; the deserializer either knows the exact length of a sequence or map or has no idea.

  • Rc and Arc impls require opting in to a feature.

    Serialization of the reference-counted types Rc<T> and Arc<T> was a source of confusion. A single Rc could be serialized multiple times as part of a data structure, then deserialize to multiple copies of itself. If this is the behavior you want, opt in to these impls by enabling the "rc" feature of your Serde dependency.

38 Likes

Congrats! Looks awesome, particularly Zero-copy deserialization.. I could definitely see how that lack of copying needed could produce huge performance gains.

Would be very interesting to see performance tests of this compared to other popular libraries in other languages. If you can beat them, would be great advertising for Rust as well as Serde.

1 Like

Well done @dtolnay, @oli_obk and all the serde contributors!

Now begins the fun of upgrading everything :sweat_smile:

3 Likes

Still interested in those updated benchmarks!

Also a thought.. what about single allocation serialization. By which I mean two passes through the data, first one you figure out the correct length of the output Json, then you allocate that much memory, then second pass you actually save the strings to create the json? Not sure if worth it or if it would even be faster, but just a thought, seems like allocating memory might be the time consuming part of serialization. The deserialization trick is quite nice.

I finally updated our nativejson benchmark suite to Serde 1.0. This is the same suite that is tested across dozens of C and C++ libraries in this repo.

Here is serde_json compared to the fastest C or C++ JSON library. Lower numbers are better.

                                DOM                STRUCT
======= serde_json ======= parse|stringify === parse|stringify ===
data/canada.json          10.2ms    11.1ms     4.2ms     6.9ms
data/citm_catalog.json     5.3ms     1.3ms     2.1ms     0.8ms
data/twitter.json          2.5ms     0.6ms     1.3ms     0.6ms

==== rapidjson-clang ===== parse|stringify ===
data/canada.json           5.7ms    10.5ms
data/citm_catalog.json     2.5ms     1.7ms
data/twitter.json          1.8ms     1.2ms

===== rapidjson-gcc ====== parse|stringify ===
data/canada.json           4.7ms     8.6ms
data/citm_catalog.json     1.7ms     1.0ms
data/twitter.json          1.3ms     0.7ms

Note that the STRUCT columns are missing for rapidjson because, while it supports SAX style parsing, the approach is such that it is totally unreasonable to use it for real use cases. To prove it, compare Serde's 3 structs and an enum here vs RapidJSON's 400 line error-prone handwritten state machine here.

The benchmarks show that our DOM (which means serde_json::Value) is not as good at parsing. It is a factor of 1.5-2 slower than rapidjson-clang, and a factor of 2-3 slower than rapidjson-gcc. On the printing side it is much closer and even twice as fast as rapidjson-clang on one of the benchmark files and slightly faster than rapidjson-gcc. So as always, any decisions you make should be based on benchmarks of your own representative data.

So that's good but not great. Keep in mind that RapidJSON is the fastest C or C++ library on these specific benchmark files. Most successful C and C++ JSON libraries are slower by a factor of 10.

But a more realistic comparison is what people actually use, and in Serde that means #[derive(Serialize, Deserialize)]. Comparing that column, you can see that Serde serialization is substantially faster across all three files, whether compiling RapidJSON with Clang or GCC. Serde serialization is more than twice as fast in some use cases. Serde deserialization is noticeably faster than rapidjson-clang and on par with rapidjson-gcc. That's pretty remarkable.

Note that these numbers are just using basic Serde 1.0 and serde_json 1.0 without the new zero-copy feature.

40 Likes

I sense a follow-up post in the works :slight_smile: (at least I hope there is one!)

3 Likes

This post is more than one year old. I'm curious to know the improvements of both sides!

3 Likes