Hash function performance

I'm computing a hash of a graphics mesh object, to be used as a lookup key for combining duplicate objects. The hashing, with sha-256, is a major bottleneck. It's taking longer to hash the mesh to get a hash key than to build it up from the input data.

So here's the code:

pub type KeyHash = sha3::Sha3_256;

...

pub fn keytype_from_mesh_coords(mesh_coords: &MeshCoords) -> KeyType {
    let mut hasher = KeyHash::new();
    for v in &mesh_coords.vertex_positions {
        hasher.update(&v[0].to_le_bytes());
        hasher.update(&v[1].to_le_bytes());
        hasher.update(&v[2].to_le_bytes());
    }
    for v in &mesh_coords.vertex_normals {
        hasher.update(&v[0].to_le_bytes());
        hasher.update(&v[1].to_le_bytes());
        hasher.update(&v[2].to_le_bytes());
    }
    for ix in &mesh_coords.index_data {
        hasher.update(&ix.to_le_bytes())
    }

    for ix in &mesh_coords.uv_data_array {
        hasher.update(&ix[0].to_le_bytes());
        hasher.update(&ix[1].to_le_bytes())
    }
    hasher.finalize_fixed()
}

/// 3-element vector, 32-bit values.
#[derive(Copy, Clone, PartialEq, Default, Serialize, Deserialize)]
pub struct LLVector3(pub [f32; 3]);

///  vector3 to 12 bytes (32 bit floats)
impl LLVector3 {
    /// Field access
    pub fn get_x(&self) -> f32 {
        self.0[0]
    } // because the built-in notation is awful
    pub fn get_y(&self) -> f32 {
        self.0[1]
    }
    pub fn get_z(&self) -> f32 {
        self.0[2]
    }

    ...

    /// Serialize to bytes
    pub fn to_le_bytes(&self) -> [u8; 12] {
        let x = self.get_x().to_le_bytes();
        let y = self.get_y().to_le_bytes();
        let z = self.get_z().to_le_bytes();
        [
            x[0], x[1], x[2], x[3], y[0], y[1], y[2], y[3], z[0], z[1], z[2], z[3],
        ] // as 12 bytes
    }

All that's going on here is that there are Vec of 2 and 3 element arrays of f32 values, and I want a hash of the entire Vec.

First, converting three floats to 12 bytes takes too much fooling around. It would be nice if .as_le_bytes() worked on fixed-size arrays, but it doesn't.

Second, sha-256 may be too slow for this application.

Suggestions? Safe code only, please.

Here's a Tracy profile of the program showing how long this takes.

What are you trying to achieve with the hashing here? I can't imagine any scenario where mixing graphics primitives and cryptographic hash functions is a good idea. Would using DefaultHasher get the job done?

edit: Also, you're building with --release right?

The idea is that the stuff I'm storing is big enough that you don't want the hash lookup comparing the data with Eq or PartialEq during lookups. With a long cryptographic hash, you can use the hash alone, rather than the data itself, as the key. DefaultHasher isn't bad - it's SipHash - but it's 64 bits. As a rule of thumb, you want at least 128 bits before you can assume uniqueness from random values. There's a 128-bit variant of SipHash. That would be fine. Is that available in Rust?

And can a hasher be applied to Vec<[f32;3]> without tearing apart each float into bytes?

Yes, I'm building with --release.

Searching Siphash128 Rust in search engine and this came out the first result.

Is it faster if you concatenate the slices then call hasher.update only once? I'm not sure what this function does exactly because sha3 uses macros but maybe it splits the data into blocks at each call.

You can also use a zerocopy serializer to serialize MeshCoords faster.

Yeah, sha256 won't work for that, it's much too slow. There are plenty of hash functions of varying bit sizes available in crates.io. For example, the siphasher crate supports 128-bit siphashing with a few variants of it.

A [f32; 3] consists of 12 bytes and a 128bit hash contains 16 bytes. This means you don't need an expensive hash function since you can simply transmute the three floats into the u128. It will be a Perfect hash function - Wikipedia

Edit: I may have misread your question. . you are actually considering way more floats into the hash?

That looks like the algorithm of choice. It even offers

write(&mut self, msg: [&u8])

for adding data. Now, is there some elegant way to turn Vec<[f32;3]> into [u8]? It's basically a no-op, of course. It's too bad you can't just apply .as_ne_bytes() to arrays. Is there something I can write which the compiler will optimize down to not doing any processing?

bytemuck::cast_slice(), probably.

1 Like

Ref: Rust Playground

OK, did all that, and it makes sense. But there's a gotcha. For some reason, "Eq" and "Hash" are not derived for Hash128, the output of the hasher. So they can't be compared, or used as keys in a HashMap.

This is strange. "Hash128" is just two u64 values:

/// A 128-bit (2x64) hash output
#[derive(Debug, Clone, Copy, Default)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Hash128 {
    pub h1: u64,
    pub h2: u64,
}

Why no derived "Eq" and "Hash"?

However, Hash128 can be converted to u128, so we can make this work.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.