Approach for deserializing and hashing f32 with limited range of values?

I'm parsing some JSON from user input into a struct that I'd like to be hashable.

One of the input fields is a threshold, which can be a value from 0.0 to 1.0 inclusive, encoded as a float in JSON.

Any suggestions on what approach to take to deserialize this into a hashable struct? My first idea was to store the underlying bits of the float and store those instead, e.g.:

use std::hash::Hash;
use serde::{Deserialize, de::Deserializer};

#[derive(Deserialize, Hash)]
struct Input {
    #[serde(deserialize_with = "deserialize_f32_as_u32")]
    threshold: u32
}

pub fn deserialize_f32_as_u32<'de, D>(deserializer: D) -> Result<u32, D::Error>
where
    D: Deserializer<'de>,
{
    let value: f32 = Deserialize::deserialize(deserializer)?;
    Ok(value.to_bits())
}

impl Input {
    fn threshold(&self) -> f32 { f32::from_bits(self.threshold) }
}

Any issues with this approach, or any better suggestions? Perhaps there's also a way of ensuring that the input is constrained between 0.0 and 1.0, via some type or crate?

Do you actually need the real numbers from 0.0 through 1.0?

Were I in your shoes I would treat the value as the numerator of a fraction then select a denominator that provides the precision I need.

5 Likes

If two values compare equal, they must hash equal. Otherwise HashMaps etc will not work correctly with those values, i.e. the whole point of the Hash trait is a bit ruined.

My go-to example is -0.0 == 0.0 which compare true but are different in their bitwise representation. I think both of those are in your domain, after all -0.0 >= 0. is also true (greater or equal, because it's equal).

If we can't use that simple example, then other examples would have to be found where the representation is not unique.

You need some kind of normalization before converting to u32. Maybe quantization to a grid, or any other solution where it's easier to understand equality and how it corresponds to the hash.

3 Likes

Definitely not - anything to 2 or 3 decimal places would probably be fine. It's for some approximate processing, so an exact floating point number isn't necessary, and your approach sounds good.

Good point, probably best that I normalise/quantise or implement something as above instead.

At the moment my struct doesn't implement Eq, only Hash (the hash is used to generate a unique ID for input that's used to look up results in a cache), but you're right in that I shouldn't risk causing problems down the line should I ever add Eq/PartialEq to the struct.

1 Like

Armed with a denominator and a plan it's time for a new datatype! You even have a name for it: Threshold

1 Like

In case you want full support of float ordering, you can use float-ord to represent a comparable float number, or use my num-order crate for float comparison and hashing.

1 Like

There is. If you handle de-serialization yourself you can return an error for out-of-bound values. An example...

impl<'de> Deserialize<'de> for ProcessID {
    fn deserialize<D>(deserializer: D) -> Result<ProcessID, D::Error>
    where
        D: Deserializer<'de>,
    {
        let pid = deserializer.deserialize_i64(OsPidVisitor)?;
        Ok( ProcessID {
            is_real: true,
            pid: pid,
        } )
    }
}

Instead of returning Ok(_) you'd return an appropriate Err(_).

You'll also have to create a visitor. That's the OsPidVisitor. If you need an example just say. I think I used an example for serde as the starting point.

(Crud. I see that's basically what you posted earlier. Sorry about that.)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.