Did some looking around for a persistent array that can be used in IPLD's data-model. Found im package and its Vector
type which is based on the RRB-Tree.
And I couldn't find insert() and delete() that return the modified tree keeping the older version persistent. Also, I think due to the RRB-Tree limitation the mutable version of insert() and remove() are costly operation. Please correct me if I am wrong here.
ppar is persistent array implementation using a modified variant of rope-data-structure. Fundamentally, it can be viewed as a binary-tree of array-blocks, where each leaf-node is a block of contiguous item of type T
, while intermediate nodes only hold references to the child nodes, left and right. To be more precise, intermediate nodes in the tree are organised similar to rope structure, as a tuple of (weight, left, right) where weight is the sum of all items present in the leaf-nodes under the left-branch.
Both unit-testing and fuzzy-testing (using the im::Vector as the reference) is passing for this algorithm. Here are some performance characteristics of ppar::Vector
compare to im::Vector
.
ppar::Vector performance characterization
-----------------------------------------
append-load(1000000 items) : 9.094956ms
random-load(1000000 items) : 6.128µs
get(1000000 ops) : 127ns
insert(1000000 ops) : 8.936µs
set(1000000 ops) : 6.093µs
delete-insert(1000000 ops) : 11.742µs
overhead : "33.87%"
overhead after 90% delete : "33.91%"
- Takes around 9ms to convert a Vec of len 1-million items to
ppar::Vector
. - Random insert operation take around 8us.
- Random get operation take around 100ns.
- Random set operation take around 8us. Note that these set operation update an existing offset.
- Random delete operation take around 4us. Note that in the above perf data, after a delete an insert is performed at the same offset.
- For 8-byte u64 type, overhead is around 33%.
- I believe the worst-case overhead can go upto 100%, that is 2x, the actual data, when compared to
std::vec::Vec<T>
.
On same machine same set of operation on a im::Vector<T>
gives following numbers:
append-load(1000000 items) : 37.651219ms
get(1000000 ops) : 123ns
update(1000000 ops) : 10.344µs
insert(1000000 ops) : 735.307µs
delete-insert(1000000 ops) : 1.426266ms
While append-load
operation is bulk-operation, other ops are individual metric.
Ref:
crate: https://crates.io/crates/ppar
repo: https://github.com/iprs-dev/ppar
fuzzy-testing: https://github.com/iprs-dev/ppar/blob/master/src/bin/check.rs
perf-testing: https://github.com/iprs-dev/ppar/blob/master/src/bin/perf.rs