Why did serde-json
and toml
both decide to use BTreeMap
rather than HashMap
for their object/table types? What makes HashMap
better for general-purpose use, and BTreeMap
better for this specific case?
I checked just now and BTreeMap is significantly faster than HashMap for many of the benchmarks in our json-benchmark
. (In the table lower is better.)
BTreeMap | HashMap | |
---|---|---|
parse canada.json
|
10.9ms | 11.9ms |
stringify canada.json
|
11.3ms | 11.6ms |
parse citm_catalog.json
|
5.5ms | 11.2ms |
stringify citm_catalog.json
|
1.4ms | 3.3ms |
parse twitter.json
|
2.5ms | 2.9ms |
stringify twitter.json
|
0.6ms | 0.9ms |
Notice though that the serde_json public API does not expose BTreeMap. Everything is done through an opaque serde_json::Map
which is backed by either a BTreeMap or a LinkedHashMap depending on cfgs. We are free to back it with some other type of map in a future minor version if we find something better.
But generally BTreeMap is a nice safe default for serde_json::Value
. It is nice to have map entries serialized in a predictable (sorted) order. Generally people who have more specialized requirements will use derive(Serialize, Deserialize)
instead of serde_json::Value
, or deserialize directly into a map type of their choice (possibly HashMap with a fast hasher) without going through serde_json::Value
.
What hash function are you using in HashMap? Default Rust one, or faster like FNV?
Second thing to notice here is that most benchmarks have only small objects (few keys) and large arrays. So difference between HashMap and BTreeMap might not manifest in right way.
I just tried to build a json library where you can decide everything. It does not even depend on "std".
Maybe not optimized enough for now.