HashMap vs. BTreeMap


Why did serde-json and toml both decide to use BTreeMap rather than HashMap for their object/table types? What makes HashMap better for general-purpose use, and BTreeMap better for this specific case?


I checked just now and BTreeMap is significantly faster than HashMap for many of the benchmarks in our json-benchmark. (In the table lower is better.)

  BTreeMap HashMap
parse canada.json 10.9ms 11.9ms
stringify canada.json 11.3ms 11.6ms
parse citm_catalog.json 5.5ms 11.2ms
stringify citm_catalog.json 1.4ms 3.3ms
parse twitter.json 2.5ms 2.9ms
stringify twitter.json 0.6ms 0.9ms

Notice though that the serde_json public API does not expose BTreeMap. Everything is done through an opaque serde_json::Map which is backed by either a BTreeMap or a LinkedHashMap depending on cfgs. We are free to back it with some other type of map in a future minor version if we find something better.

But generally BTreeMap is a nice safe default for serde_json::Value. It is nice to have map entries serialized in a predictable (sorted) order. Generally people who have more specialized requirements will use derive(Serialize, Deserialize) instead of serde_json::Value, or deserialize directly into a map type of their choice (possibly HashMap with a fast hasher) without going through serde_json::Value.


What hash function are you using in HashMap? Default Rust one, or faster like FNV?


Second thing to notice here is that most benchmarks have only small objects (few keys) and large arrays. So difference between HashMap and BTreeMap might not manifest in right way.