Serde - Calculate one field based on deserialized others

Take the following example:

#[derive(Deserialize)]
struct Rectangle {
    length: f32,
    width: f32,
    // Calculate `length * width`
    area: f32,
}

I wish to deserialize length and height, while calculating area from the above two variables. I'm wondering what approaches I have to do this?

Some solutions I've come across are to deserialize to an intermediate struct, which is then transformed to a newtype with the additional area field. Another approach is to implement Deserialize myself, which I'm not against, but I'm not familiar with this approach, and it looks overly complicated and generally discouraged unless necessary. Is there a simpler approach?

First, make sure you actually need the area field. Normally, you'd make that a method instead of a field.

You can use from to use the intermediate struct.

use serde::Deserialize;

#[derive(Deserialize)]
#[serde(from = "RectangleSerde")]
pub struct Rectangle {
    pub length: f32,
    pub width: f32,
    pub area: f32,
}

#[derive(Deserialize)]
struct RectangleSerde {
    length: f32,
    width: f32,
}

impl From<RectangleSerde> for Rectangle {
    fn from(value: RectangleSerde) -> Rectangle {
        let RectangleSerde { length, width } = value;
        Rectangle {
            length,
            width,
            area: length * width,
        }
    }
}
9 Likes

None. You shouldn't do this. You should not have redundant data in your data models; this should be a method instead.

1 Like

For my particular case, while it isn't identical to Rectangle above, it is quite similar, and will be used in various bevy systems multiple times per frame. Therefore I feel it is better to (re) calculate this variable only on rare mutations. Regardless, the calculation would only be a few multiplications, divisions and modulo on each call. I know this does sound like premature optimization, but it just seems needless to recalculate something potentially thousands of times a second. Would storing this variable still be discouraged in this case?

Yes, totally.

A few arithmetic operations will literally take nanoseconds. If you are using them for graphics, then the couple dozens of milliseconds that you have per frame are more than enough, since the cost of these operations is multiple orders of magnitude less than the available time. It is likely that deserialization alone will dominate these computations.

It's not worth adding such complexity and sacrificing guaranteed data integrity for saving on these operations.

assuming we're not doing this for (hundreds of) thousands of elements in several of ECS systems (bevy being mentioned above), keeping the CPU utilization nice and warm like many AAA games do nowadays... :slight_smile:
edit: no argument on redundancy of data of course, just saying that "sometimes..."

A small nuance:
The arithmetic operations themselves are indeed expected to take nanoseconds apiece.

However, if it's a large amount of data (>> CPU cache capacity) the loads and stores will absolutely eclipse that, possibly even going from nanoseconds to microseconds depending on the situation.

All that said, OP really should measure first. Without measurement it's just firing a gun blindly in the dark, more or less: dangerous at the best of times.

5 Likes

My expectation is that if you have enough rectangles that the multiplications start taking a non-negligible amount of time, then you also have enough of them that memory is a bottleneck. And if that's actually the case then a 50% increase in memory usage due to storing an additional f32 is likely going to be a much bigger factor than the multiplications.

But this is just my expectation, you should always measure to see what actually makes a difference.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.