Serde - obtain the schema of a type

Suppose I have some type that’s Serialize. Is there a way to get its "schema" without feeding a value of that type through the serializer?

For example, let’s say we have:

#[derive(Serialize)]
struct Foo {
   #[serde(rename = "int")]
   i: i32,
   #[serde(rename = "float")]
   f: f64,
   #[serde(rename = "my_string")]
   s: String,
}

Is there a way to obtain a sequence of (essentially) [("int", i32), ("float", f64), ("my_string", String)]? In other words, the field names and their data type. And do this at runtime, but without having a value of Foo.

The context is a custom serde serializer where the backing protocol requires recording a schema of the values, even if no values are ever written (eg it would leave a file with just the schema present but no “rows”). A good analogy might be a CSV serializer - you want to generate the header row before seeing any values.

My hunch is this isn’t possible with stock serde and likely requires some sort of a custom proc macro, but perhaps I’m wrong. And if it does require a proc macro, what’s the best way to piggyback on/integrate with serde’s attributes, such as rename? An additional nice property with a proc macro would be to fail compilation if a certain data type is included for serialization, but which happens to be unsupported by the underlying protocol (say floats aren’t allowed, just as an example).

Thanks

1 Like

While I don't think this is possible, the ability to do schema generation would be a big help for documenting tools.

I don't think this is possible with serde at the moment, although I imagine the #[derive(Serialize)] proc macro probably builds up some sort of schema when generating code for the Serialize and Deserialize impls.

this issue on GitHub seems to be exactly what you're looking for, although it sounds like the outcome was to wait and see whether the underlying protocol libraries (serde_json, toml, etc) can generate some sort of schema.

1 Like

Thanks - somehow I missed that in my (admittedly quick) search.

It seems that my reflection crate may suit your needs.

The current version 0.1.0 generates hierarchal schema ( composed of field names and fields' fields' names, etc ) which is essentially a forest data structure. The following demonstrates the output of <Foo as Reflection>::names(), without the need of having a value of Foo.

#[derive(Reflection)]
struct Foo {
   #[serde(rename = "int")]
   i: i32,
   #[serde(rename = "float")]
   f: f64,
   #[serde(rename = "my_string")]
   s: String,
}
assert_eq!( Foo::names().to_string(), "( int float my_string )" );

Note that Foo::names() returns trees::Forest<reflection::Name> which is a hierarchical data structure that can be iterated over, rather than &'static str or String. See trees doc for more.

The upcoming version 0.1.1 of reflection crate will consider adding the primitive/std type names to names(), and the result may be something like:

assert_eq!( Foo::names().to_string(), "( int( <i32> ) float( <f64> ) my_string( <String> ) )" );

Trees and reflection author here, and any advice and suggestions will be greatly appreciated. :grinning:

2 Likes

That looks pretty cool! :+1:

Any known issues with it that you’d care to highlight? :slight_smile: Does it work on stable?

I’m actually leaning towards doing something at compile time, primarily to disallow use of certain data types that aren’t supported by the backend format. So maybe I’ll use your crate as an example of how to integrate a proc macro with serde. Thanks!

1 Like

Yes it works on stable rust.

Version 0.1.0 cannot deal with circular references( which form recursive data structures ). This issue will be addressed in version 0.1.1. By the way, I don't think it's a problem because the data for serialization are usually non-recursive data structures.

I'm pleased that the source code is helpful for your project. In fact I learned a lot from @dtolnay 's syn project example: heapsize.

PS: If it could be resorted to filtering fields of unsupported data format by the backend at runtime, I guess reflection + tree walk are enough.

Agreed.

Yeah, it’s certainly doable (and easier to implement) but the UX is much worse since we know at compile time that something will fail at runtime.

Yeah I agreed that compile time processing should be preferred. :smile:

The reflection crate includes a test case to generating similar output required by the issue author.

I have worked on a library to extract Serde formats from standard implementations of Serialize/Deserialize traits: https://crates.io/crates/serde-reflection

It has been used in a large codebase to drive code-generation of Rust-compatible binary serializers in other languages (e.g. Java, C++, python, C#): https://crates.io/crates/serde-generate