Did some looking around for a persistent array that can be used in IPLD's data-model. Found im package and its Vector type which is based on the RRB-Tree.
And I couldn't find insert() and delete() that return the modified tree keeping the older version persistent. Also, I think due to the RRB-Tree limitation the mutable version of insert() and remove() are costly operation. Please correct me if I am wrong here.
ppar is persistent array implementation using a modified variant of rope-data-structure. Fundamentally, it can be viewed as a binary-tree of array-blocks, where each leaf-node is a block of contiguous item of type T, while intermediate nodes only hold references to the child nodes, left and right. To be more precise, intermediate nodes in the tree are organised similar to rope structure, as a tuple of (weight, left, right) where weight is the sum of all items present in the leaf-nodes under the left-branch.
Both unit-testing and fuzzy-testing (using the im::Vector as the reference) is passing for this algorithm. Here are some performance characteristics of ppar::Vector compare to im::Vector.
ppar::Vector performance characterization
-----------------------------------------
append-load(1000000 items) : 9.094956ms
random-load(1000000 items) : 6.128µs
get(1000000 ops) : 127ns
insert(1000000 ops) : 8.936µs
set(1000000 ops) : 6.093µs
delete-insert(1000000 ops) : 11.742µs
overhead : "33.87%"
overhead after 90% delete : "33.91%"
- Takes around 9ms to convert a Vec of len 1-million items to
ppar::Vector. - Random insert operation take around 8us.
- Random get operation take around 100ns.
- Random set operation take around 8us. Note that these set operation update an existing offset.
- Random delete operation take around 4us. Note that in the above perf data, after a delete an insert is performed at the same offset.
- For 8-byte u64 type, overhead is around 33%.
- I believe the worst-case overhead can go upto 100%, that is 2x, the actual data, when compared to
std::vec::Vec<T>.
On same machine same set of operation on a im::Vector<T> gives following numbers:
append-load(1000000 items) : 37.651219ms
get(1000000 ops) : 123ns
update(1000000 ops) : 10.344µs
insert(1000000 ops) : 735.307µs
delete-insert(1000000 ops) : 1.426266ms
While append-load operation is bulk-operation, other ops are individual metric.
Ref:
crate: https://crates.io/crates/ppar
repo: https://github.com/iprs-dev/ppar
fuzzy-testing: https://github.com/iprs-dev/ppar/blob/master/src/bin/check.rs
perf-testing: https://github.com/iprs-dev/ppar/blob/master/src/bin/perf.rs

