Please help with generic types, Serde and formatting

#1

Let’s consider the following custom type:

struct SliceFmt<'a, T>
where
    T: Serialize,
{
    pub slice: &'a [T],
    pub fmt: &'a str,
}

I need to implement the Serialize trait for SliceFmt with the following behavior:

  • When fmt = "s", slice should be serialized with no special formatting. This of course works out of the box for all T: Serialize.
  • When fmt = "fN", slice should be serialized with floating-point format with N numbers after the decimal. This requires that T is either a float or an integer (otherwise serialize should return an error).
  • When fmt = "eN" or fmt = "EN", same as fN but with lower or upper exponential format.

Is this feasible at all or am I taking the wrong path? Should I start from a Format trait instead, and implement it for all primitives that I want to allow to be referenced in SliceFmt?

#2

Feels a bit gross, but you could maybe do this with Any and downcast_ref or the like.

Keep in mind that Serialize doesn’t get to decide what floats/integers look like, the format does, and you have no way to control that (insofar as I’m aware). You’d probably have to serialize as a string.

#3

This is an interesting problem.

Perhaps another option might be to take a closure and let it dictate the formatted output (as @DanielKeep mentioned, you probably need to serialize as a String):

struct SliceFmt<'a, T, F: Fn(&T) -> String> {
    pub slice: &'a [T],
    f: F,
}
#4

I think start with impl of Display and/or Into::<String> and see where that gets you in terms ergonomics, before looking at Serialize and the more detailed mechanics of serde. As noted, if the format you’re serialising to will produce a string already, you might be ~done. Otherwise, I think you’re implementing/customising a specific serde output format, rather than Serialize generally.

#5

Ok, this is what I came up with:

use serde::ser::{self, Serialize, SerializeSeq, Serializer};
use std::{
    any::{Any, TypeId},
    io,
};

#[derive(Debug, PartialEq, Clone, Copy)]
enum FormatVar {
    Default,
    Float(usize),
    LowerExp(usize),
    UpperExp(usize),
}

impl Default for FormatVar {
    fn default() -> Self {
        FormatVar::Default
    }
}

#[derive(Debug, PartialEq, Clone, Copy, Default)]
struct DataVar<'a, T>
where
    T: Any + Serialize,
{
    name: &'a str,
    data: &'a [T],
    format: FormatVar,
}

impl<'a, T> DataVar<'a, T>
where
    T: Any + Serialize,
{
    pub fn new(name: &'a str, data: &'a [T], format: FormatVar) -> Self {
        Self { name, data, format }
    }
}

impl<'a, T> Serialize for DataVar<'a, T>
where
    T: Any + Serialize,
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        macro_rules! serialize_fmt {
            ($fmt:expr, $n:expr, $($ty:ident),*) => {
                {
                    let mut seq = serializer.serialize_seq(Some(self.data.len()))?;
                    $(
                        if TypeId::of::<T>() == TypeId::of::<$ty>() {
                            for e in self.data {
                                seq.serialize_element(&format_args!(
                                    $fmt,
                                    $n,
                                    *(e as &dyn Any).downcast_ref::<$ty>().unwrap() as f64
                                ))?;
                            }
                        } else
                    )*
                    {
                        return Err(ser::Error::custom(format!("format \"{:?}\" not supported for variable \"{}\" data type", self.format, self.name)))
                    }
                    seq.end()
                }
            };
        }
        match self.format {
            FormatVar::Default => self.data.serialize(serializer),
            FormatVar::Float(n) => serialize_fmt!(
                "{:.*}", n, i8, i16, i32, i64, i128, u8, u16, u32, u64, u128, f32, f64
            ),
            FormatVar::LowerExp(n) => serialize_fmt!(
                "{:.*e}", n, i8, i16, i32, i64, i128, u8, u16, u32, u64, u128, f32, f64
            ),
            FormatVar::UpperExp(n) => serialize_fmt!(
                "{:.*E}", n, i8, i16, i32, i64, i128, u8, u16, u32, u64, u128, f32, f64
            ),
        }
    }
}

It’s quite cumbersome but it works. Do you think it could be improved? Another simpler option (instead of using Any and downcast_ref) could be to call for each sequence element first the serialize method and then parse::<f64>, but I excluded it because I guess it would be slower, what do you think?

A last issue that I need to fix is that formatted numbers are serialized as strings, this is a problem for instance with JSON format which retains type information (not for CSV).

Since serializing formatted numbers could be of general use, I suggested to add this capability on Serde’s repository:

#6

Take a look at the example of writing a custom format serializer in the serde docs, specifically at the serialize_newtype_variant method. I’m not sure exactly how, but you want to override just this one. I think your approach above is too broad, and generating strings on the “wrong side” of the serde data model. It probably means writing several different formats.

#7

I’m afraid I can’t figure out exactly what you mean, can you please give me some more hints?

#8

Sorry, I don’t have any more specific direct solutions, but I can elaborate on what I was referring to.

In the Serde Data Model: (trimmed/formatted for emphasis)

  • The Serialize implementation for the data structure is responsible for mapping the data structure into the Serde data model
  • The Serializer implementation for the data format is responsible for mapping the Serde data model into the intended output representation.

I think you’ve implemented Serialize for your data structure, and mapped your FormatVar enum to a formatted string in the serde data model. JSON is then (correctly) emitting that with quotes; CSV doesn’t (but there are options to always quote strings that would, if enabled).

Your enum needs to cross the serde data model boundary (between Serialize and Serializer) retaining the type information that indicates which format is desired.

I think you probably need to Implement a Serializer for your intended output format (including formatting). Probably several, if you have several output formats.

Once it gets there, you need the serialize_newtype_variant method to do the formatting you want, for your enum: ie, emit an appropriately formatted number, rather than the externally-tagged enum format shown in the example, duplicated here:

fn serialize_newtype_variant<T>(
        self,
        _name: &'static str,
        _variant_index: u32,
        variant: &'static str,
        value: &T,
    ) -> Result<()>
    where
        T: ?Sized + Serialize,
    {
        self.output += "{";
        variant.serialize(&mut *self)?;
        self.output += ":";
        value.serialize(&mut *self)?;
        self.output += "}";
        Ok(())
    }

Unfortunately, that’s about where my suggestion ends; I don’t know how to achieve this, at least not without duplicating and customising the serde_json code and any other formats you need.