Serde deserializing a generic enum

Hi,

I have designed a conveient generic enum that I want to deserialize from a config file. I am still learning Rust and Serde so this is both a help and code review request.

My program is a process scheduler that watches processes based on some given criteria (name, running/not_running, resource usage ...) and executes some actions when criteria matches.

sysinfo::Process can be matched by (name, cmdline or exe_path).

I am using serde to deserialize straight into the appropriate matcher for a process matching = ...

Here is an example toml config file:

[[profiles]]
# this matches PatternIn<String>
matching = { cmdline = "foo_seen" }

[[profiles.commands]]
condition = {seen = "5s"}
exec = ["echo", "seen"]
exec_end = ["echo", "end"]

[[profiles]]
# I want to implement the following 
matching = { cmdline = "foo_seen", regex = true}

I designed the following types to be able to match on String, Regex or other future types.


pub trait MatchBy<Criteria>
where
    Criteria: Display,
{
    fn match_by(&self, matcher: Criteria) -> bool;
}

#[derive(Deserialize, Clone, Debug)]
#[serde(rename_all = "snake_case")]
pub enum PatternIn<P> {
    ExePath(P),
    Cmdline(P),
    Name(P),
}

// This is the main matcher enum that will hold all types of matchers. It is used in the Config type with Deserialize derive trait.
#[derive(Deserialize, Debug, Clone)]
#[serde(untagged)]
pub enum ProcessMatcher {
    StringPattern(PatternIn<String>),
    //RegexPattern(PatternIn<Regex>)  // needing help with this one
    //CPUUsage
    //RamUsage
    //Network
}

// the struct where matching is serde parsed into
#[derive(Debug, Deserialize, Clone)]
pub struct Profile {

    /// pattern of process name to match against
    // pub matching: ProfileMatching,
    pub matching: ProcessMatcher,

...
}


trait MatchProcByPattern<P> {
    fn matches_exe(&self, pattern: P) -> bool;
    fn matches_cmdline(&self, pattern: P) -> bool;
    fn matches_name(&self, pattern: P) -> bool;
}

impl<P> MatchBy<PatternIn<P>> for sysinfo::Process
where
    sysinfo::Process: MatchProcByPattern<P>,
    P: Display
{
    fn match_by(&self, matcher: PatternIn<P>) -> bool {
        match matcher {
            PatternIn::ExePath(pat) => self.matches_exe(pat),
            PatternIn::Cmdline(pat) => self.matches_cmdline(pat),
            PatternIn::Name(pat) => self.matches_name(pat),
        }
    }
}


impl MatchBy<ProcessMatcher> for sysinfo::Process {
    fn match_by(&self, matcher: ProcessMatcher) -> bool {
        match matcher {
            ProcessMatcher::StringPattern(pat) => self.match_by(pat),
        }
    }
}

I implemented the necessary code to match PatternIn<String>.


impl MatchProcByPattern<String> for sysinfo::Process {
    fn matches_exe(&self, pattern: String) -> bool {
        let finder = memmem::Finder::new(&pattern);
        self.exe()
            .and_then(|exe_name| finder.find(exe_name.as_os_str().as_bytes()))
            .is_some()
    }

    fn matches_cmdline(&self, pattern: String) -> bool {
        let finder = memmem::Finder::new(&pattern);
        finder.find(self.cmd().join(" ").as_bytes()).is_some()
    }

    fn matches_name(&self, pattern: String) -> bool {
        self.name().contains(&pattern)
    }
}

impl MatchProcByPattern<Regex> for sysinfo::Process {
    ... 
}

For the regex part however, the config parsing will be decoupled from the enum variant. I want the user to use the regex = true bool alongside the PatterIn variant which will result after deserialization to a PatternIn<Regex>. My solution has been the following:


#[derive(Deserialize, Debug, Clone)]
#[serde(untagged)]
pub enum ProcessMatcher {
    StringPattern(PatternIn<String>),
    #[serde(deserialize_with = "deserialize_regex_pattern")]
    RegexPattern(PatternIn<Regex>)  // needing help with this one
}

fn deserialize_regex_pattern<'de, D>(deserializer: D) -> Result<PatternIn<Regex>, D::Error>
where
    D: Deserializer<'de>,
{
    #[derive(Deserialize, Debug)]
    struct Pattern {
        #[serde(flatten)]
        pattern: PatternIn<String>,
        regex: bool
    }

    let pd: Pattern = Pattern::deserialize(deserializer)?;
    if !pd.regex {
        Err(de::Error::custom("not a regex pattern"))
    } else {
        match pd.pattern {
            PatternIn::Cmdline(pat) => {
                match pat.parse::<Regex>() {
                    Ok(regex) => Ok(PatternIn::Cmdline(regex)),
                    Err(err) => Err(de::Error::custom(err))
                }
            },
            PatternIn::Name(pat) => {
                match pat.parse::<Regex>() {
                    Ok(regex) => Ok(PatternIn::Name(regex)),
                    Err(err) => Err(de::Error::custom(err))
                }
            },
            ...
        }
    }
}

As you can see the deserialize_regex_pattern function needs to have a match for every kind of PatternIn resulting in a lot of code repeat.

Questions:

  1. Is there a way to avoid code duplication ?
  2. Is there an other way to achieve this without having to define an extra enum to handle the regex boolean ?

There might be a cleaner way of doing this in general, but you could consider to rewrite your deserialize_regex_pattern function into something like this:


fn deserialize_regex_pattern<'de, D>(deserializer: D) -> Result<PatternIn<Regex>, D::Error>
where
    D: Deserializer<'de>,
{
    // First deserialize into a `toml::Table`.
    let mut table = toml::Table::deserialize(deserializer)?;

    // Then remove the `regex` boolean value from the table if there is any.
    let regex = table
        .remove("regex")
        .and_then(|regex| regex.as_bool())
        .unwrap_or_default();

    if !regex {
        Err(de::Error::custom("not a regex pattern"))
    } else {
        table
            // Since `Regex` does not implement the `Deserialize` trait, we first need to convert the `table` into a `PatternIn<String>`.
            .try_into::<PatternIn<String>>()
            .map_err(|err| {
                regex::Error::Syntax(format!(
                    "could not convert pattern into a `Pattern<String>`: {}",
                    err.to_string(),
                ))
            })
            // Then, we can map the `PatternIn<String>` into a `PatternIn<Regex>` and return it.
            .and_then(|pattern| match pattern {
                PatternIn::ExePath(pat) => pat.parse::<Regex>().map(PatternIn::ExePath),
                PatternIn::Cmdline(pat) => pat.parse::<Regex>().map(PatternIn::Cmdline),
                PatternIn::Name(pat) => pat.parse::<Regex>().map(PatternIn::Name),
            })
            .map_err(|err| {
                de::Error::custom(format!("could not convert pattern into a regex: {}", err))
            })
    }
}

As you can see there is no need to deserialize into a Pattern struct just for the sake of 'catching' the regex boolean. When you deserialize into a toml::Table, you can first check whether the regex boolean is actually there. If that is the case (and its value is true), then you can continue to convert the toml::Table into PatternIn<String> first. This in-between step is necessary because Regex does not implement the Deserialize trait.

Finally, if that succeeded you can convert the PatternIn<String> into the Pattern<Regex> through the match statement. Most of the duplicated code can be removed this way but since PatternIn<String> and Pattern<Regex are two distinct types you will still need to explicitly map
PatternIn<String>::Variant to PatternIn<Regex>::Variant for each PatternIn Variant.

2 Likes

Brillant ! Thanks. This looks much better indeed.

One question, you used toml::Value because I am already using toml. I suppose if I was converting to Json or Yaml you would use the the equivalent generic data container for that format. What if I want to implement parsing for both toml and yaml, do I need to try parsing to the Value of both formats ?

In conclusion, I was hoping there was a way to avoid having a new match clause that maps in every kind of PatternIn but it seems unavoidable.

I was wondering if there is some library or technique that could achieve something simlar to the following:

match pattern{
    PatternIn::{Kind}(pat) => MappedKind(pat.parse::<Regex>())
}
1 Like

Yes you are right, you can use a generic data container like config::Value to do this. config supports the main Config file formats, so you could implement it like this:

fn deserialize_regex_pattern<'de, D>(deserializer: D) -> Result<PatternIn<Regex>, D::Error>
where
    D: Deserializer<'de>,
{
    // First deserialize into a `config::Value`. This type 'understands' all main configuration formats, see: https://docs.rs/config/latest/config/
    let mut table = config::Value::deserialize(deserializer)?
        // Using the `into_table` method, the value will be converted into a `HashMap<String, config::Value>`.
        .into_table()
        .map_err(|err| {
            de::Error::custom(format!("could not convert value into a table: {}", err))
        })?;

    // Then remove the `regex` boolean value from the table if there is any.
    let regex = table
        .remove("regex")
        .and_then(|regex| regex.clone().into_bool().ok())
        .unwrap_or_default();

    if !regex {
        Err(de::Error::custom("not a regex pattern"))
    } else {
        // Now that the regex is removed from the table, we can wrap it in a `config::Value` object again so that we can
        // easily deserialize it into `PatternIn<String>>`.
        config::Value::new(None, ValueKind::Table(table))
            // Since `Regex` does not implement the `Deserialize` trait, we first need to convert the `table` into a `PatternIn<String>`.
            .try_deserialize::<PatternIn<String>>()
            .map_err(|err| {
                regex::Error::Syntax(format!(
                    "could not convert pattern into a `Pattern<String>`: {}",
                    err.to_string(),
                ))
            })
            // Then, we can map the `PatternIn<String>` into a `PatternIn<Regex>` and return it.
            .and_then(PatternIn::<Regex>::try_from)
            .map_err(|err| {
                de::Error::custom(format!("could not convert pattern into a regex: {}", err))
            })
    }
}

impl TryFrom<PatternIn<String>> for PatternIn<Regex> {
    type Error = regex::Error;

    fn try_from(value: PatternIn<String>) -> Result<Self, Self::Error> {
        match value {
            PatternIn::ExePath(pat) => pat.parse::<Regex>().map(PatternIn::ExePath),
            PatternIn::Cmdline(pat) => pat.parse::<Regex>().map(PatternIn::Cmdline),
            PatternIn::Name(pat) => pat.parse::<Regex>().map(PatternIn::Name),
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_process_matcher_works_for_toml() {
        // [dependencies]
        // toml = "*"

        let profile: Profile = toml::from_str(
            r#"
            matching = { cmdline = "foo_seen", regex = true }
        "#,
        )
        .unwrap();

        assert!(matches!(
            profile.matching,
            ProcessMatcher::RegexPattern(PatternIn::<Regex>::Cmdline(_))
        ));
    }

    #[test]
    fn test_process_matcher_works_for_json() {
        // [dependencies]
        // serde_json = "*"

        let profile: Profile = serde_json::from_str(
            r#"{

            "matching": {
                "cmdline": "foo_seen",
                "regex": true
            }
        }"#,
        )
        .unwrap();

        assert!(matches!(
            profile.matching,
            ProcessMatcher::RegexPattern(PatternIn::<Regex>::Cmdline(_))
        ));
    }

    #[test]
    fn test_process_matcher_works_for_yaml() {
        // [dependencies]
        // serde_yaml = "*"

        let profile: Profile = serde_yaml::from_str(
            r#"
            matching:
              cmdline: foo_seen
              regex: true
        "#,
        )
        .unwrap();

        assert!(matches!(
            profile.matching,
            ProcessMatcher::RegexPattern(PatternIn::<Regex>::Cmdline(_))
        ));
    }
}

Now as you can see, I've also moved the match statement into a TryFrom impl block. If you indeed want to abstract away the match statement as much as possible, then to my knowledge the only way of doing that is through the use of macro's.

For example, you could use the following declarative macro:

macro_rules! impl_try_from_enum {
    ([ $( $variant:ident ),* ]) => {
        impl TryFrom<PatternIn<String>> for PatternIn<Regex> {
            type Error = regex::Error;

            fn try_from(value: PatternIn<String>) -> Result<Self, Self::Error> {
                match value {
                    $(
                        PatternIn::<String>::$variant(val) => {
                            val.parse().map(PatternIn::<Regex>::$variant)
                        },
                    )*
                }
            }
        }
    };
}

impl_try_from_enum!([ExePath, Cmdline, Name]);

What you see will generate the exact same behavior for your PatternIn<P> type as the TryFrom block from earlier. You just need to make sure that you add all the PatternIn variants in the array here: impl_try_from_enum!([ExePath, Cmdline, Name]);. Obviously this also means that all variants need to have the same VariantName(P) structure.

If you would like to take it even further, you could implement a Procedural Macro, however I would argue that is not worth it for this use case.

Pardon me if I've missed an important subtlety here, but I haven't been convinced that a custom deserializer is called for in this scenario. This case looks like "parse a value already expressible in the data file, then postprocess it". In my experience serde(from) and serde(try_from) handle this case with a minimum of ceremony.

Here's the code I would write when faced with a scenario like this. First, define the "raw" data structures that directly express what the incoming data should look like.

#[derive(Deserialize, Clone, Debug)]
#[serde(rename_all = "snake_case")]
pub enum PatternInRaw {
    ExePath(String),
    Cmdline(String),
    Name(String),
}

#[derive(Deserialize, Debug)]
pub struct ProcessMatcherRaw {
    #[serde(flatten)]
    pattern: PatternInRaw,
    regex: bool,
}
// If the CPUUsage etc. variants require different shapes, change this to an enum.

Then define desired output data structures. These have serde(from) or
serde(try_from) pointing at the raw data structures.

// No Deserialize; won't be converted directly.
#[derive(Clone, Debug)]
pub enum PatternIn<P> {
    ExePath(P),
    Cmdline(P),
    Name(P),
}

#[derive(Deserialize, Debug, Clone)]
#[serde(try_from = "ProcessMatcherRaw")]
pub enum ProcessMatcher {
    StringPattern(PatternIn<String>),
    RegexPattern(PatternIn<Regex>),
}

Finally, implement the appropriate From and TryFrom traits. It's all straightforward Rust code from here.

fn convert_pattern<P, E, F>(raw: PatternInRaw, convert: F) -> Result<PatternIn<P>, E>
    where F: FnOnce(String) -> Result<P, E>
{
    Ok(match raw {
        PatternInRaw::ExePath(s) => PatternIn::ExePath(convert(s)?),
        PatternInRaw::Cmdline(s) => PatternIn::Cmdline(convert(s)?),
        PatternInRaw::Name(s) => PatternIn::Name(convert(s)?),
    })
}

fn parse_regex(raw: String) -> Result<Regex, de::Error> {
    match raw.parse::<Regex>() {
        Ok(regex) => Ok(regex),
        Err(err) => Err(de::Error::custom(err))
    }
}

impl std::convert::TryFrom<ProcessMatcherRaw> for ProcessMatcher {
    type Error = de::Error;

    fn try_from(raw: ProcessMatcherRaw) -> std::result::Result<Self, Self::Error> {
        if raw.regex {
            let pattern = convert_pattern(raw.pattern, parse_regex)?;
            Ok(ProcessMatcher::RegexPattern(pattern))
        } else {
            let pattern = convert_pattern(raw.pattern, Ok)?;
            Ok(ProcessMatcher::StringPattern(pattern))
        }
    }
}

My experience with Serde, which admittedly is not extensive, is that the custom Serializer and Deserializer traits are most applicable when the data itself is nonstandard. If you do have an actual JSON/TOML file coming in, the From/Into features can usually handle it.

The only exception I've found so far is certain string or struct scenarios where a naive From/Into will infinitely recurse.

2 Likes

Thank you very much for the detailed explanation. This plus @MellowMelon answer cleared out a lot of details for me.

Thanks. I very much like how simple and straightforward is this method. I can now understand the purpose of from and try_from in serde in a concrete way.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.