Implementing a custom derive macro to work in conjunction with Serde Serialize

In the rust_xlsxwriter crate I've implemented Serde serialization to Excel worksheets. This works by first setting up a struct's fields as headers within the worksheet and then repeated serialization of struct instances are written below the headers. The scheme is explained here: How serialization works in rust_xlsxwriter.

The headers are obtained by Serde serialization or deserialization so that the headers reflect any Serde "skip" or "rename" container or field attributes.

#[derive(Serialize)]
#[serde(rename_all = "PascalCase")]
struct MyStruct {
    foo: bool,

    #[serde(skip_serializing)]
    bar: u8,

    #[serde(rename = "New name")]
    baz: f64,
}

I'd like to be able create a derived macro trait and attributes to add some additional functionality specific to rust_xlsxwriter. Something like this:

#[derive(Serialize, MyTrait)] // New trait
#[serde(rename_all = "PascalCase")]
struct MyStruct {
    foo: bool,

    #[serde(skip_serializing)]
    bar: u8,

    #[serde(rename = "New name")]
    #[my_crate(num_format = "#.00")] // New
    baz: f64,
}

I can't use #[serde(serialize_with = "")] since I need to pass and store the data and format separately rather than reformat the data. Also serialize_with doesn't support arguments.

I have implemented a proc macro to pass and store the user specified format but there is an issue that my derived trait impl will see the last field in the struct as "baz" whereas a Serde serializer will see it as "New name". This breaks the linkage between the headers for the serialization and the serialized data itself.

I can extend the proc macro to read the Serde attributes and mimic the renames and skips but it is likely to be less complete and more fragile than the Serde proc macro that enables Serialize for the struct.

Any suggestions on how I might intermingle the two proc macros to make something more robust, i.e., to reuse the Serde macro(s) as much as possible. I thought of perhaps running the proc macro not on the Struct but on the Serde output for impl Serialize for Struct but I don't know if that is reasonable or practicable.

Does this help?

#[derive(Serialize)]
#[serde(rename_all = "PascalCase")]
struct MyStruct {
    #[serde(rename = "New name")]
    #[serde(serialize_with = "state")]
    baz: f64,
}

#[my_attr(num_format = "#.00")] // used by callers
struct state;
// expand 👇 // used by serialize_with
fn state<S: ::serde::Serializer>(_: &f64, serializer: S) -> Result<S::Ok, S::Error> {
    #[derive(Default)]
    struct Format {num_format: &'static str, others: ()}
    let _state = Format { num_format: "#.00", ..Default::default() };
    todo!()
}

Thanks for the suggestion. That won't work for the required use case because there would be a number of different #[my_crate(something = something_else)] attributes. Also, I think I would still have the issue of passing the data and meta_data around in the serializer.

For context the original feature request is here and my analysis.

Reusing macros from serde means the states should be accessed in serialize_with. That's how it works today...

#[rust_xlsxwriter(
   set_min_width=8, set_max_width=60,
   derive(Deserialize, Serialize) // write derive here to expand this later
)]
struct Produce {
    #[serde(rename = "Value A")]
    #[rust_xlsxwriter(set_num_format="#,##0.00")]
    cost_a: f64,
}
// expansion by rust_xlsxwriter macro
#[derive(Deserialize, Serialize)]
struct Produce {
    #[serde(rename = "Value A")]
    #[serde(serialize_with = "Self::__serialize_cost_a")] // detect derive(Serialize) to add this
    cost_a: f64,
}

impl Produce {
    // detected by derive(Serialize)
    #[doc(hidden)]
    fn __serialize_cost_a<S: Serializer>(val: &f64, serializer: S) -> Result<S::Ok, S::Error> {
        // Downside: have to access the global state per call
        let formatter = &<Self as ExcelSerialize>::formatter()[1]; 

        todo!()
    }
}

// detected by derive(Serialize)
impl ExcelSerialize for Produce {
    const LEN: usize = 2;

    // This will be used in serialization impl via `<Self as ExcelSerialize>::formatter()`.
    fn formatter() -> &'static [Format /* or SerializeFieldOptions? */] {
        // global states via OnceLock (alternatives: std::thread_local)
        static CELL: OnceLock<[Format; Self::LEN]> = OnceLock::new();
        CELL.get_or_init(|| { [
            // formatter for header
            Format::new().set_min_width(8).set_max_width(60),
            // formatter for field cost_a
            Format::new().set_num_format("#,##0.00"),
        ]})
    }
}

Thank you. That is a genuinely clever idea to convert the #[rust_xlsxwriter()] attributes into #[serde(serialize_with)] attributes.

However (as far as I can see), serialize_with is still not the right solution for the reasons I listed in the GitHub Comment above: I need to serialize the data unmodified but store a format "with" it in Excel.

I'll try to explain myself a bit more.

The current serialization scheme in rust_xlsxwriter is as follows:

  • The user sets up some "headers" that correspond to the fields in the struct that they wish to serialize, at some location in a worksheet.
  • The headers can have metadata like a header format (colors, fonts, alignment) and also information about the formatting of the values that will be written to the cells below the header.
  • As a side note, Excel stores data unmodified by a format and stores the format separately.
  • The headers can be set up via serialization (i.e., the crate uses the Serde derived serializer to figure out the Struct name and the names of the fields), Example 1.
  • Or deserialization (i.e., the crate uses the Serde derived deserializer to figure out the Struct name and the names of the fields), Example 2.
  • Once the headers are set up they serve as lookup values so that when data is serialized the serializer can say this is struct "Student", field "Age" and value "25" and the serializer can hand off that information and have it written to the right place.

The scheme looks like this:

That is all working now but if the user wants to pass metadata like a number format, or cell format (colors, borders, etc.) then they have to set up that linkage somewhat manually:

Snippet from Example 3:

    let custom_headers = 
        [CustomSerializeField::new("Price").set_value_format(&currency_format)];

    let header_options = SerializeFieldOptions::new()
        .set_header_format(&header_format)
        .set_custom_headers(&custom_headers);

    // Set the serialization location and custom headers.
    worksheet.deserialize_headers_with_options::<Produce>(1, 1, &header_options)?;

However, some of the initial users would like to specify this metadata using additional field attributes like this:

  // Add new trait ExcelSerialize for rust_xlsxwriter attributes.
  #[derive(Deserialize, Serialize, ExcelSerialize)] 
  struct Produce {
      #[serde(skip)]
      fruit: &'static str,
      #[serde(rename = "Value")]
      #[rust_xlsxwriter(set_num_format="#,##0.00")]
      cost: f64,
      #[serde(rename = "Date DMY")]
      #[rust_xlsxwriter(set_num_format="dd/mm/yyyy")]
      dmy: Option<NaiveDate>,
      #[serde(rename = "Date MDY",)]
      #[rust_xlsxwriter(set_num_format="mm/dd/yyyy")]
      mdy: NaiveDate,
      #[serde(rename = "Long Description")]
      #[rust_xlsxwriter(column_width=20)]
      long_description: String
  }

Which takes me back to the original problem. I can create a proc macro to builds an impl and function to get the metadata in a SerializeFieldOptions format (initial work here) but using the above example it would pick up field "fruit" which would be skipped by Serde and field "long_description" instead of "Long Description".

Based on your linked WIP issue , will this be sufficient?

#[rust_xlsxwriter(
   set_min_width=8, set_max_width=60,
   derive(Deserialize, Serialize) // write derive here to expand this later
)]
struct Produce {
    #[serde(skip)] // also parsed by rust_xlsxwriter macro
    fruit: &'static str,

    #[serde(rename = "Long Description")] // also parsed by rust_xlsxwriter macro
    #[rust_xlsxwriter(column_width=20)]
    long_description: String,
}

expansion by rust_xlsxwriter macro :point_down:

#[derive(Deserialize, Serialize)]
struct Produce {
    #[serde(skip)] 
    fruit: &'static str,

    #[serde(rename = "Long Description")]
    long_description: String,
}

impl ExcelSerialize for Produce {
    fn to_rust_xlsxwriter() -> rust_xlsxwriter::SerializeFieldOptions {
        let custom_headers = [
            // skip fruit
            rust_xlsxwriter::CustomSerializeField::new("long_description")
                .rename("Long Description") // #[serde(rename = "Long Description")]
                .set_column_width(20), // #[rust_xlsxwriter(column_width=20)]
        ];
        rust_xlsxwriter::SerializeFieldOptions::new()
            .set_struct_name("Produce")
            .set_custom_headers(&custom_headers)
    }
}

FYI if this is sufficient then it doesn't need to be a attribute macro because it doesn't modify the struct definition, it only generates additional code. Hence, it can just be a derive macro with some helper attributes.

Thanks. I didn't realize it :sweat: So the interface for users is

#[derive(Deserialize, Serialize, ExcelSerialize)]
#[rust_xlsxwriter(set_min_width=8, set_max_width=60)]
struct Produce {
    #[serde(skip)] // also parsed by ExcelSerialize macro
    fruit: &'static str,

    #[serde(rename = "Long Description")] // also parsed by ExcelSerialize macro
    #[rust_xlsxwriter(column_width=20)]
    long_description: String,
}

That is what I'm currently doing. The WIP code is here (note that it is very WIP and there isn't currently any error handling and there is still a lot of functionality missing).

If anyone wants to try out the code and functionality you can do it like this:

git clone https://github.com/jmcnamara/rust_xlsxwriter.git
cd rust_xlsxwriter/
git checkout derive

cargo run --example app_serialize_proc_macro
open serialize.xlsx  # Or your system equivalent of open

cargo expand --example app_serialize_proc_macro

This issue I will have is that I will also need to parse the Serde attributes to account for various "skip" and "rename" variations. Hence, this thread/question. I can (and probably will) do that but it will introduce duplication and fragility.

#[rust_xlsxwriter(
set_min_width=8, set_max_width=60,
derive(Deserialize, Serialize) // write derive here to expand this later
)]

I will check but I don't think that end users will like this as an interface. It should preferably be:

#[derive(Deserialize, Serialize, ExcelSerialize)]

Also, this:

    #[serde(rename = "Long Description")]
    long_description: String,
}

Would need to be parsed/expanded to this:

impl ExcelSerialize for Produce {
    fn to_rust_xlsxwriter() -> rust_xlsxwriter::SerializeFieldOptions {
        let custom_headers = [
            rust_xlsxwriter::CustomSerializeField::new("Long Description"),
        // ...
        ];

    // ...

I.e., the custom field needs to be set up with the name the Serde serializer sees since it is used as a lookup as well as a header.

I've updated the WIP code to make a working example that is a bit more realistic.

Here is an example of serializing a struct with serde and rust_xlsxwriter traits and attributes:

use rust_xlsxwriter::{ExcelSerialize, Workbook, XlsxError};
use rust_xlsxwriter_derive::ExcelSerialize;
use serde::Serialize;

fn main() -> Result<(), XlsxError> {
    let mut workbook = Workbook::new();

    // Add a worksheet to the workbook.
    let worksheet = workbook.add_worksheet();

    #[derive(ExcelSerialize, Serialize)]
    #[allow(dead_code)]
    struct Produce {
        #[rust_xlsxwriter(rename = "Item")]
        fruit: &'static str,

        #[rust_xlsxwriter(rename = "Price")]
        #[rust_xlsxwriter(num_format = "$0.00")]
        cost: f64,

        #[serde(skip)]
        in_stock: bool,
    }

    // Create some data instances.
    let items = [
        Produce {
            fruit: "Peach",
            cost: 1.05,
            in_stock: true,
        },
        Produce {
            fruit: "Plum",
            cost: 0.15,
            in_stock: false,
        },
        Produce {
            fruit: "Pear",
            cost: 0.75,
            in_stock: true,
        },
    ];

    worksheet.set_serialize_headers::<Produce>(0, 0)?;
    worksheet.serialize(&items)?;

    // Save the file.
    workbook.save("serialize.xlsx")?;

    Ok(())
}

Note, the mix of rust_xlsxwriter and serde attributes. This runs against the code on the rust_xlsxwriter derive branch.

This produces the following output:

screenshot

1 Like

I resolved this by extending the proc macro to handle serde attributes (the ones I needed) as well as crate specific attributes.

You can see the results here.

Thanks to @vague for the help.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.