For 2-3 weeks now,I have been trying to implement a high performance concurrent ordered map called Masstree in Rust (the original is written in C++ and has some practical use too as much as I am aware). I need some advice for the following problems. I have been lurking for a month on this forum so I know that there are some very experienced regulars here :'D
My first problem is that, I am not sure about what the standard crates/techniques I should use/know about if I am working on a very complex concurrent data structure (like a trie of B+trees in this case)? I used miri with the strict provenance flag to catch bad memory access patterns, leaks and potential UB's. I have stress tests and benches. I tried loom and shuttle (have basic tests working, but struggle to model complex scenarios). What else could I be using to test the soundness and stability of my implementation? I tried using perf and cargo-flamegraph to find the bottlenecks, but I can't get them to work properly.
I currently have some very rare transient test failures in concurrent stress tests and for write ops I am outperforming other data structures in under 8 threads, but going above that leads to some very complex and subtle issues (like leaf split storms, excessive CAS retry contentions etc). For example, at 16 threads, fastest write is 40ms but slowest is 5 seconds (120x variance). I am trying to fix them by adding logs and checking where the logical flow is going wrong. But this is becoming too hard and unmaintainable.
I will appreciate any suggestions on a more sustainable design/development workflow. I want to implement it seriously and I sort of feel optimistic about it becoming a crate that others might find useful, especially after some unusually impressive benchmark results (I need some advice here too to make them more fair and rigorous, and to ensure that I am not misinterpreting things here). Here is the repo , if someone is interested but needs to take a look at the code to suggest what the proper tools may be in this case.
If you couldn't get cargo-flamegraph to run, I'm sure you can get help here if you post the errors.
An approach that has helped me is to add validation code that checks all the internal invariants you can think of (and panics with context info when it detects a problem), and enable it during long running stress tests. The validation will slow it down and reduce concurrency, and that might even prevent the problems from happening, but if it does find a problem you'll have a good clue about what's wrong.
I have these, A LOT :'D ... but hanks for bringing it up again, it made me think about adding some tree-wide validations, which might be the thing that I actually need here!
Loom is a testing tool for concurrent Rust code. It runs a test many times, permuting the possible concurrent executions of that test under the C11 memory model. It uses state reduction techniques to avoid combinatorial explosion.
They mentioned using loom and shuttle (and not being able to model complicated scenarios).
I've used loom though not shuttle, and while I had to keep thread counts really low in order for the number of possibilities in my loom tests to not explode so much that it'd basically never finish the tests, the fact that every way the threads can interact (through loom's mock types, in a particular test) is tested means that loom is still very useful to me.
Seems like OP already knows much more than me on this topic, though, so I don't think I'm able to offer any advice.
I tried them, but they seem to be too much work at this point compared to just feature gated logging and debug counters for sanity checks and validation. But I may need to look into it more seriously after I have got the whole structure implemented and working.