Multiple incompatible Value implementations


#1

Hi all,

I am working on some projects relying on serde, and I noticed a weird thing: there are many implementation of the same data structures scattered across different projects.

In particular, Value is implemented for serde-json, serde-yaml and toml-rs, and they have a nearly identical representation for Object (Table in toml, Mapping in yaml) and Array (same in toml, Sequence in yaml), and for I64 (or Integer), F64 (or Float), Null, String and Bool.

There are minor differences. For example toml implements also a Datetime, and json and yaml support path-indexing. But all of them implement very similar Serialize and Deserialize traits.

I was wondering if if would make sense to unify them in a single external crate that covers all the cases and can be used by the specific format parsers.

I started a project with this idea in mind, but it’s one of my playing grounds to learn Rust more than a usable project, and is using internal mutability to allow passing the Value around as non-mutable, which limits performance.
In my case I just needed the Value/Object/Array, I didn’t need JSON and I used chrono for Datetime, so I couldn’t really use serde-json, but I “stole” the ser-de logic from it.

Maybe @alexcrichton and @dtolnay (and others) are a better fit than me to start something like this, or at least to give their opinion on why it should or shouldn’t be done.

Thanks for reading. :slight_smile:


#2

The data structures are different but not incompatible… Let me explain.

These data structures are designed to be able to represent any syntactically valid data in their corresponding format. A serde_json::Value can represent any valid JSON document. A serde_yaml::Value can represent any valid YAML document. A serde_value::Value can represent any data that comes out of a Serde deserializer.

These are different because the different formats have different features and limitations. Some notable differences:

  • TOML has a first-class datetime type that the other formats do not.
  • JSON maps always have string keys while YAML map keys can be numbers, lists, maps etc.
  • Serde differentiates between u8, u16, u32 etc while JSON, YAML, TOML do not.

Because of these differences, the different Value types are not interchangeable. I cannot use a serde_json::Value to represent an arbitrary YAML document because the YAML document may contain non-string map keys.

But still they are all compatible with each other! From Serde’s perspective, all of these Value types are just types that implement the Serialize and Deserialize traits. In that sense there is no difference between serde_json::Value and Vec<u64> and #[derive(Serialize, Deserialize)] struct MyStruct. They are just types that implement Serialize and Deserialize. This is an important point so let me know if I have not explained it successfully.

So since they are just types that implement Serialize and Deserialize, we can take data in any format and deserialize it to a Value of any format. For example we can do:

let v: serde_json::Value = serde_yaml::from_str(...)?;

Of course if your YAML input contains a non-string map key then the deserialization will fail, but it shows that these Value types are maximally compatible across all formats.

So basically - pick whichever one can represent all the data that you care about. If you only care about JSON-like data (so string keys in maps, no explicit datetime type, etc) then use serde_json::Value everywhere. If you want maximum generality across all Serde data formats, use serde_value::Value.

I was wondering if it would make sense to unify them in a single external crate that covers all the cases and can be used by the specific format parsers.

(a) Given that all of these Value types are compatible with all the formats, I don’t think “covering all the cases” in one type is particularly valuable. People will use whichever type covers all the cases they care about, and a type that covers more cases than they care about is unwieldy.

(b) The specific format parsers don’t use these Value types anyway. Deserializing a JSON document into a data structure does not involve serde_json::Value at all. Data is deserialized directly from JSON bytes into the resulting data structure without ever going through serde_json::Value.

© The formats would not want to use a standard Value type even if there were such a thing. If I need to deserialize an unknown JSON blob and do basic manipulations on it, I would not want to use an “all the cases” type like serde_value::Value because I know that JSON does not differentiate between various bit-width numbers, so dealing with those would complicate my code unnecessarily.


#3

Yes, I understand what you mean. Thanks for your answer.

My case is maybe a little peculiar. I am writing a static website generator. I am reading YAML configuration files created by the user, and I allow to render via handlebars or tera.

YAML naturally uses yaml::Value, while handlebars uses json::Value and tera’s maintainer is thinking about switching from json::Value to toml::Value for the Datetime support.

serde_value is too broad. A good compromise, IMO, would be Null, Bool, I64, F64, String, Datetime, Object and Array, with Value-based indexes for the Object type.
The missing features could be emulated. For example JSON-encoding an Object-typed key could convert the key into a JSON-string, and a Datetime can be converted to/from ISO8601, possibly with a flag passed to the parser.

But, as I said, it can happen (like in my case) that you don’t need to encode it, but just pass it around and use it, and you don’t have a predefined schema, so you can’t use a normal struct.

I know the (in)famous xkcd issue about standards, but maybe I will give it a try. :slight_smile:


#4

My case is maybe a little peculiar.


I know the (in)famous xkcd issue about standards, but maybe I will give it a try.


It seems hasty to target this as a standard when it is designed to address a peculiar use case. Maybe focus on solving your use case first, then generalize if necessary if you see the same need arise in other projects.