[Serde-rs] How to deserialize YAML to HashMap<<HashSet<String>>, Option<bool>>?

Hello!

I have a YAML:

duplicate_patterns:
  - pattern:
      - "film01_Taylor_2017_(4K)"
      - "film01_Taylor_2017_(1080P)"
    is_duplicate: true
  - pattern:
      - "film02_Taylor_2019_(4K)"
      - "film02_Taylor_2019_(720P)"
    is_duplicate: true

and struct

#[derive(Deserialize, Serialize)]
pub struct DuplicatePatterns {
    duplicate_patterns: HashMap<<HashSet<String>>, Option<bool>>,
}

Goal:

dup_pattern = { 
    key = HashSet {
                "film01_Taylor_2017_(4K)",
                "film01_Taylor_2017_(1080P)",
          }
    value = true/false/None
}

How do I convert the yaml to my rust struct?

Please help me. Thanks.

PS. Now I have a method to convert YAML to my struct only by defining another vector-based struct and then converting it into my hash-based struct. But it is not beautiful at all.

You can't use a HashSet as the key in a HashMap since it doesn't have a fixed order. You can use a BTreeSet instead, perhaps. I'm also not sure what you want the resulting value to be.

1 Like

Your YAML file looks nothing like the DuplicatePatterns struct you shared.

1 Like

Yes, I know. The yaml cannot be changed because it is generated by another system. A practical way is to do a pre-converting before I deserialize it.

I JUST want to show the content/format of the YAML and the goal of my rust code.

Fortunately, I implemented Hash for struct DuplicatePattern. Like frozenset in Python, I want to set this struct to immutable and then it would allowed get-hash, but now I just sort it and then calculate the hash of each element.

// I know it is ugly.
impl Hash for DuplicatePattern {
    fn hash<H: Hasher>(&self, state: &mut H) {
        let mut sorted_elements = self.pattern.iter().collect::<Vec<&String>>();
        sorted_elements.sort();
        for element in sorted_elements {
            element.hash(state);
        }
    }
}

Another question. I just want to determine the duplicate pattern

  • whether it is contained in the pattern-db,
  • if it is, whether the boolean value it is related to is true or false.

I believe using the HashMap struct would make it fast to match.

The reason why I choose HashSet to be the key: the format of all entries of each identical film is presented randomly so inserting them into a HashSet would be helpful. I would do some research of BTreeSet and I hope it helpful.

Can you write out the value (or debug print it) that you want to get when you deserialize the YAML given above? I can't provide any help without that.

Something like this maybe?

2 Likes

Firstly, thank you for providing information about std::collections::BtreeSet. I replace HashSet with BtreeSet and everything works fine and even faster than before. Great!

What I can only provide here is the format of the YAML. Now I select a not-smart method: create a middle-struct that can be easier to use during deserializing and then convert this middle-struct to the target struct. A good news is that the quantity of YAML entries is not large.

Thank you for your high-quality code!

I do it with the idea the same as yours. You create a middle-struct named Entry and then convert it into another struct. So did I. :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.