Page storage for Rustdb

I wrote a new blog, Page storage for Rustdb – Rusty thoughts copy included below. All comments welcome! Does this make sense to you?!

Any general purpose database will typically store information in pages, like pages in a book. When a page becomes full, it will generally need to be split somehow (typically divided into two smaller halves) before new data can be added.

Most database implementations I have looked at use fixed size pages, however this has the drawback that after a page is split, it may be only half full, meaning that 50% of the storage is unused. While it may be possible to “repack” pages to reduce wasted storage, Rustdb has an efficient implementation of variable size pages, in fact two different implementations of the PageStorage trait.

The first option is “CompactFile“. This (roughly speaking) splits pages into 1-16 kilobyte chunks. However more recently I have developed another option, “BlockPageStorage” which first divides the file into a small number of variable size files ( each built out of fixed size blocks ), and then allocates pages from these files, with each file being used to store pages of a particular size. So this is implemented as three structs: BlockStg which managed fixed size blocks, DividedStg which implements multiple files out of fixed size blocks and finally BlockPageStorage which uses the files to manage variable size pages. Since blocks are fixed size, when there is a free block, the last block in the file can be relocated to fill the gap. Similarly, when there is a free page, the last page in the subfile can be relocated to fill the gap. Thus the amount of unused file storage is minimal.

The choice of block size and page sizes is not fixed and can be configured. There is some advantage in choosing a block size which can be divided exactly by a range of integers – this means that when a page is read, the read does not cross a block boundary, so there is a single read to the underlying file system (disregarding “small” reads which are required to establish the page location, which can be buffered). For example, a block size of 27,720 ( = 5 x 7 x 8 x 9 x 11 ) can be divided evenly by integers in the range 6 to 12, producing pages of size 2310, 2520, 2772, 3080, 3465, 3960 and 4620 bytes.

4 Likes