Efficiently storing a Go board

zeroexcuses · April 9, 2020, 6:28am

Consider:

pub enum Corner {
  Empty, Black, White,
}

and a board as Vec of length 19*19=361.

Question: Is there a way to tell Rust to use only 2 bits per enum Corner, where Vec ends up being 742 bits rounded up to nearest alignment ?

// Aside: I realize we can store this using only ceiling(361 * log_2(3)) bits, but I'm willing to waste some storage in exchange for easier implementation.

alice · April 9, 2020, 6:37am

361 bytes is not very much. Have you considered using a byte per cell? It will make implementation much simpler.

To use two bits per cell, you would have to use a byte array directly and use bit manipulation.

daboross · April 9, 2020, 6:41am

There isn't a way to store an enum in less than a byte, as rust generally requires you to be able to take references to all values, and you can only reference byte-aligned bits of data.

With that said, you should definitely be able to implement this manually! I recommend using a BitVec from the bitvec crate, and then manually creating set/get functions which translate between your enum and the 2-bit code. BitVec should be significantly nicer to work with than a Vec<u8> (or [u8; 91])

Michael-F-Bryan · April 9, 2020, 12:01pm

I think the easiest way would be to code it manually. You could store the entire board in a [u8; ceil(361 * log_2(3))] array then write getters/setters (or even overload std::ops::Index!) to access each corner.

Rust doesn't really give you syntactic sugar for accessing things at a per-bit level (e.g. C's bit fields), and (as far as I'm aware) it's not possible for the compiler to "pack" 4 Corner values into a single byte... Although I'm sure teaching LLVM/rustc to use "niches" for optimisation like that would be a fun project.

trentj · April 9, 2020, 1:53pm

This does not really answer your question, but std::vector<bool> in C++ is specialized to do this kind of space-efficient bit packing, and it is widely considered to have been a bad idea. There are several reasons but they basically boil down to:

You can't take a reference (bool&) to a single element because it is represented by a partial byte.
You can't safely write to different elements of the same container concurrently because they might be stored in the same byte.

This makes vector<bool> really gross to deal with in certain generic and multi-threaded code. I'm glad Rust has decided to rule out possibilities like this in favor of more consistent and intuitive behavior for Vec.

zeroexcuses · April 9, 2020, 1:54pm

@alice: I'm implementing a go playing bot. I don't know if this will end up being memory or CPU limited, but a factor of 4 for storing each 'node' is (1) nontrivial and (2) may slow down 4x the "ko / no-previous-state check"

zeroexcuses · April 9, 2020, 1:56pm

@Michael-F-Bryan : I believe the down side with ceil(361 * log_2(3)) is that toggling a single cell between empty/black/white ends up being much slower (since it might touch multiple bytes)

zeroexcuses · April 9, 2020, 2:17pm

@daboross: How does the bitvec crate compare to std::collections::BitVec - Rust ?

RustyYato · April 9, 2020, 2:24pm

std no longer has a bitvec, you must use the bitvec crate.

Michael-F-Bryan · April 9, 2020, 2:25pm

I guess that will all depend on your access patterns and whether it's possible to apply operations in bulk, but I don't think it would be much slower. If it takes 2 bits to express a tile and the smallest possible load is 8 bits, updating a single tile should only require loading a byte (because 4 tiles fit evenly into a byte), applying a bitmask, then storing it again.

I'm not familiar with the rules of Go, but if changing one tile may also update the state of an adjacent tile (either left-right or up-down or diagonally) you may want to consider using a different way of locating tiles in the array. For example, instead of a typical row-major or column-major ordering, you could use a scheme like the Hilbert Curve which will locate nearby tiles closer to each other in memory. That way the area you're operating on is more likely to be in L1 cache... Of course whether this will helps with your memory bottleneck all depends on the board size and your computer's cache.

I have a feeling they're the same project. The link you provide shows bitvec was an unstable feature in Rust 1.2.0 and I'm pretty sure it's been moved out of the standard library to the bitvec crate on crates.io since then.

trentj · April 9, 2020, 3:22pm

I think you're optimizing for the wrong thing. The ko check is probably one of the cheapest things you do regardless of storage size: hash the board state and look it up in a table, done. It might even be vectorized. Compare that to, say, checking whether a color group is captured, which you will probably do many many times, and could be significantly faster with indexed byte arrays rather than bit-twiddling.

That said, there's another option you might consider: have two BitVecs, one for black stones and one for white stones. Total size = 736 bits, and I'm guessing it will make certain tasks a bit easier, like finding connected groups. The downside is it becomes possible to represent impossible board states by having a black stone and a white stone in the same space.

(Disclaimer: I've never written a Go bot.)

jcdickinson · April 9, 2020, 5:39pm

I haven't turned it into a crate (and probably won't), but here's a vector of non-byte-lengthed integers. It does exactly what you're asking:

github.com

jcdickinson/racemus/blob/master/racemus-binary/src/var_vec.rs

#[derive(Debug, Copy, Clone)]
struct VarVecLayout {
    bits_per_entry: usize,
    capacity: usize,
    value_mask: u64,
}

impl VarVecLayout {
    #[inline]
    fn new(capacity: usize, bits_per_entry: u8) -> Self {
        assert!(bits_per_entry <= 64);
        let bits_per_entry = bits_per_entry as usize;

        VarVecLayout {
            bits_per_entry,
            capacity,
            value_mask: std::u64::MAX >> (64 - bits_per_entry),
        }
    }

This file has been truncated. show original

VarVec2::with_capacity(19*19, ceil_log2(3))

josh · April 9, 2020, 6:18pm

You might consider storing this as two separate bit vectors, one for each side. If you need the complete bit vector, you can always or the two together. That representation makes it easier to quickly look at formations of one color or the other. That would likely prove simpler than trying to store each cell as a 2-bit bitfield.

Edit: I just realized that @trentj made a similar suggestion.

system · July 8, 2020, 6:18pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enum Size Optimization help	3	345	January 31, 2024
Better type systems for more reliable bit packing and twiddling	4	1376	January 12, 2023
Fixed size layout container from Vec<Enum<T>> help	5	441	June 29, 2021
Memory layout for enums and boxes help	7	254	December 5, 2023
How To storage a binary data	3	3263	January 12, 2023

Efficiently storing a Go board

Related Topics