How to implement its own hash function?


#1

Hi all,

As part of a project for an AI for a strategy game (blobwar), I would like to implement the hashing of Zobrist in order to store the different configurations of the game in a HashMap.
After many research in the Rust Documentation, I found that I had to define the functions write and finish in the trait Hasher. The issue is that I don’t get how to define them. With the default Hasher, where does write store the data ? Will I have to code my hashing function in the method finish, and in that case, how do I access to the data ?

Thanks in advance for your answers.


#2

If you have a type called Zobrist and you want to hash it, then you need to implement the Hash trait for it. You can also #[derive(Hash)] it if the default hash is appropriate.

The Hasher trait is if you want to define a custom hash combiner/algorithm, which it doesn’t sound like you’re after here.


#3

I may have not been clear enough.
Actually I have a type configuration and I have to implement the Hash trait for it.
But the default hash it not appropriate here, I would like to calculate the hash value for this type Configuration with Zobrist algorithm. That is why I wonder how to use Hasher trait to do this.


#4

Oh, so I was completely off :slight_smile:

I’ve never heard of this hash algorithm. But yes, you’d implement Hasher for it. You’ll receive a stream of data from the callers via the various write methods and then you’re expected to produce the hash value from finish().

Since I’m not familiar with the hashing algorithm I don’t know where/how you should store the incoming data. What does the algorithm look like in pseudocode (or some other language that you know)?


#5

Just yesterday I had to hack with hashing for a benchmark Rust Vs C++.
Implement its own algorithm require you to make a struct with the Hasher trait and another struct with the BuildHasher trait (binded to the precedent struct).
You can find all the information in the std::hash::Hasher and in the std::hash::BuildHasher traits.

If you need more specific help, just ask!
Just in case, here you can find an implementation of the C++ hashing algorithm made by me: https://github.com/eisterman/rust-vs-cpp-bench/blob/master/huffman_encoding/main.rs


#6

If you want to know more about this hashing algorithm, you may check this link from wikipedia https://en.wikipedia.org/wiki/Zobrist_hashing .

I read this documentation many times, but I didn’t understand how to define finish() and write().
Thanks for you code, I understand that the hasher (which is a CppHasher here?) contains the hash value calculated by the write functions and finish only returns this value, is that right ?
And, in order to write some data from a struct, I just have to define write_struct function ?


#7

Hash and Hasher are based around computing a hash of a stream of bytes (or of typed integer values of other sizes); if your data has more structure than that, I don’t think you can use them very well for this.

A workaround would be to: compute the Zobrist hash directly in your Hash implementation (manual impl of that trait required), and the Hasher would either re-hash that Zobrist hash result, or as an additional workaround, a special case Hasher (manual impl required) could just take the value as computed.


#8

Let us forget Zobrist for now, I think I may me able to change my struct in order to use stream of bytes later.
I still don’t get how to define hash(), write() and finish().
Considering a simple case : I want to hash some positives integers, and the hash value would just be their square. How could I implement this ?


#9

Write is the function for the String / Stream of Bytes hash
Finish do some final operations on the internal state of the Hasher object and return the final Hash.
Write_type is the function for the primitive type hash.

The hash of every other struct is a composition of the hash of his fundamental composition.

The BuildHasher build the Hasher with the initial internal state.

For the Zobrist you need ALL this feature, it’s not simple…

For the question about the simple case example, it’s impossible to hash only some fundamental type. The hash function in the Hasher trait need to be well defined for all the fundamental type plus streams of byte. So, if i have understand correctly, here you can find a snippet of my old CppHasher adapted for returning the square sum of all the hashed number of byte. For the signed int i’ve added a .abs().


I hope to have understand correctly the simple example you asked :sweat_smile:


#10

I meant their square sum yes.
Thanks for your explanations and your example, I think I have a better understanding now but there is still something which confuses me.
Considering that these integers are stored in a vector, which is the only attribute of a struct Vec_of_int, how would you define the trait Hash, and in particular the hash function ?
I know one could use #Derive but that is not the answer I am expecting, as I will have to define it myself for my struct which is more complex.


#11

The Derive Hash will do the same thing I will explicitly do :disappointed_relieved:
Take all the attribute of the struct, and one per one, do hashing:

use std::hash::{Hash,Hasher};

struct ciao {
    a: usize,
    b: String, 
    c: Vec<i32>, 
} 

impl Hash for ciao {
    fn hash<S: Hasher>(&self, state: &mut S) {
        self.a.hash(state);
        self.b.hash(state);
        self.c.hash(state);
   } 
}

The Hash trait means "this struct can be hashed by a generic Hasher".
The real hashing algorithm is in the struct implementing the Hasher trait.

This is why generally #[derive(Hash)] is the good choice, even with a personal and exotic Hasher


#12

Ok thanks.
And I have several attributes, how do I use them in the function write() ? What does bytes:&[u8] represent ? For example if my struct contains strings I dont’ see how it would work.


#13

I finally added in my struct an attribute value which is the hash value that I compute myself, and in my method hash(), I only hash this integer, maybe it could work.


#14

Hasher has not been studied for this type of application… Hasher implement an hashing algorithm, independently from the struct to hash!

But if it works, good! :slight_smile:

The “”“right”"" approach was to build a generical Hasher who can hash every type implementing hash, but sometimes for some exotic hasher is impossible to stay “on the rail”.

If it’s possible, I will be happy to see the final code ^-^


#15

Aaaarg, I wish I saw this thread sooner. I am familiar with Zobrist hashing, and have this to say.

I cannot see any reason why somebody would want to implement Zobrist hashing through the Hasher API. At all. There is no point, because if you have any legitimate reason to be using Zobrist hashing, then you would never want to be placing these items into a HashSet or HashMap.

Let me explain for the uninitiated, because the Wikipedia page kind of gives the wrong idea. Wikipedia shows this example:

constant indices
    white_pawn := 1
    white_rook := 2
    # etc.
    black_king := 12
   
function init_zobrist():
    # fill a table of random numbers/bitstrings
    table := a 2-d array of size 64×12
    for i from 1 to 64:  # loop over the board, represented as a linear array
        for j from 1 to 12:      # loop over the pieces
            table[i][j] = random_bitstring()
   
function hash(board):
    h := 0
    for i from 1 to 64:      # loop over the board positions
        if board[i] != empty:
            j := the piece at board[i], as listed in the constant indices, above
            h := h XOR table[i][j]
    return h

What’s misleading about this snippet is that the function hash(board) is not something you should be frequently calling, if ever. Every time you call hash(board) it completely defeats the purpose of the Zobrist hash.

The following might be a more accurate description:

constant indices
    # same as before

function zobrist_init():
    # same as before

# This is the "hash" function from before, but with a more accurate name.
#
# Most legitimate use cases of Zobrist hashing do not even need this at all,
# as you can often get away by declaring the initial state of the board
# (at the beginning of whatever algorithm uses Zobrist) to have a hash of 0.
function zobrist_compute_from_scratch(board):
    h := 0
    for pos from 1 to 64:
        h := zobrist_update(h, pos, board[pos])
    return h

# Called to update the hash in response to "toggling" (inserting or removing)
# the existence of a given piece at a given position.
function zobrist_update(hash, position, piece):
    if piece != empty:
        hash := hash XOR table[position][piece]
    return hash

# Helper function that updates both the board and hash
function board_set_piece(board, hash, position, new_piece):
    old_piece := board[position]
    board[position] := new_piece

    hash := zobrist_update(hash, position, old_piece) # remove old piece
    hash := zobrist_update(hash, position, new_piece) # insert new piece
    return (board, hash)

The whole entire point of a Zobrist hash is that it can be computed incrementally as changes are made to an object. It provides a super speedy heuristic for answering questions like “is our algorithm stuck in a loop?” by allowing you to compare a board’s current state to previous states without O(size) time per comparison or even O(size) setup per state visited.


I used zobrist hashes in a simulation on a large atomic network as part of an effort to make sure that not a single part of my code performed O(network_size) work per step of the simulation. I never stored copies of old states; I only stored the “deltas”, i.e. which atoms were changed, even sometimes applying them in reverse if I needed to inspect a previous state. These same deltas were used to update a zobrist hash which was used to detect whenever the simulation got stuck in a cycle.


Storing items in a HashSet or HashMap using Zobrist hashes (if you even could do it!) would completely defeat the purpose because HashSet and HashMap still need to check that two items are equal to protect against hash collisions. This O(n) equality test will invalidate any performance gains you would have gotten from having incrementally computed hashes.


#16

One last thing, since so far I’ve only told you what you shouldn’t do. Here’s what you should do:


Feel free to use the Zobrist hashes themselves as keys in a (regular) HashSet or HashMap. I know, it seems silly to think that the hashmap will be computing hashes of your hashes, but you have to consider that these two types of hashes serve different goals.[1] Depending on the application, it’s possible that a BTreeSet/BTreeMap will be faster. If you’re really concerned about this, you should benchmark.

For me, I didn’t really have anything to gain by trying to optimize my use of Zobrist hashes, because the major costs were elsewhere in the application. I just used a VecDeque as a circle buffer and did equality searches (provided now as the contains method). Later, when I decided to extend the history length, I put the keys into something that I described at the time as “probably overkill,” which used both a VecDeque and a HashMap as a counter. Meh.


[1] Though they do have similar properties… Perhaps you could get away with…
erm… on second thought, never mind. You can be sure that whatever I was about to say was a terrible idea that nobody should ever even consider. :angel:


#17

Thanks a lot for your advices and your precisions on Zobrist algorithm !!!
I knew I had to update hash values but it is more clear now. I finally store the zobrist hash in each configuration and use them as keys for the HashMap, which is in deed easier.
It seems to work now !
Regarding how complex it is to use Hasher in order to implement its own hash function, I think using hash values as keys may be a solution most of the time.


#18

Indistinctly mumbles something about hash collisions.