HashMap vs. BTreeMap


#1

Why did serde-json and toml both decide to use BTreeMap rather than HashMap for their object/table types? What makes HashMap better for general-purpose use, and BTreeMap better for this specific case?


#2

I checked just now and BTreeMap is significantly faster than HashMap for many of the benchmarks in our json-benchmark. (In the table lower is better.)

  BTreeMap HashMap
parse canada.json 10.9ms 11.9ms
stringify canada.json 11.3ms 11.6ms
parse citm_catalog.json 5.5ms 11.2ms
stringify citm_catalog.json 1.4ms 3.3ms
parse twitter.json 2.5ms 2.9ms
stringify twitter.json 0.6ms 0.9ms

Notice though that the serde_json public API does not expose BTreeMap. Everything is done through an opaque serde_json::Map which is backed by either a BTreeMap or a LinkedHashMap depending on cfgs. We are free to back it with some other type of map in a future minor version if we find something better.

But generally BTreeMap is a nice safe default for serde_json::Value. It is nice to have map entries serialized in a predictable (sorted) order. Generally people who have more specialized requirements will use derive(Serialize, Deserialize) instead of serde_json::Value, or deserialize directly into a map type of their choice (possibly HashMap with a fast hasher) without going through serde_json::Value.


#3

What hash function are you using in HashMap? Default Rust one, or faster like FNV?


#4

Second thing to notice here is that most benchmarks have only small objects (few keys) and large arrays. So difference between HashMap and BTreeMap might not manifest in right way.