[Solved] Bincode 135MB Hashmap in 4 seconds normal?

  1. I have a struct that look slike:
#[derive(Debug, Serialize, Deserialize)]
pub struct WordU32Freq1Entry {
    pub idx: u32,
    pub cnt: u32,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct WordU32Freq1 {
    pub storage: HashMap<String, WordU32Freq1Entry>,
}
  1. I have an object of type WordU32Freq1 that is 135MB. I am running "cargo test" via
        cargo test --release b_stem_to_wf1::test_full_stage2 -- --nocapture

  1. This is taking time:
real    0m3.952s
user    0m3.244s
sys     0m0.670s

  1. Is the above normal? I'm reading from a SSD. I'm expecting this to be IO bound and be done in < 0.1 s

  2. This code is already being run in --release.

  3. Is this normal for bincode, or am I doing something stupid?

I did some more benchmarking, and got:

reading contents: 80ms
decoding: 2.87 s

The 2.87 s bindecode seems very slow for data that is already in memory.

It's not normal, unless you're reading straight from disk and your disk is some old HDD or maybe SD on Raspberry PI.

In case you're using File, remember it's slow for many small reads unless you wrap it in BufReader (or read in the whole file into memory first)

1 Like

Hi @kornel, I don't know if you saw the previous msg

80ms read time for 125MB actually seems perfectly reasonable. It's the 2.87 decoding that I think is unreasoanble. The code looks like:

    let n0 = Instant::now();
    let content = FileUtil::read_file_as_u8( &format!("{}/wu32f.bin", in_dir) );

    let n1 = Instant::now();
    let x: WordU32Freq1 = deserialize( &content ).unwrap();

    let n2 =Instant::now();

    println!("Reading took: {:?}", n1.duration_since(n0) );
    println!("Decoding took: {:?}", n2.duration_since(n1) );

One more fun:

Reading took: 80.793881ms
Decoding took: 2.901920444s

If I run:

    let n0 = Instant::now();
    let content = FileUtil::read_file_as_u8( &format!("{}/wu32f.bin", in_dir) );

    let n1 = Instant::now();
    let x: WordU32Freq1 = deserialize( &content ).unwrap();

    let n2 =Instant::now();

    println!("Reading took: {:?}", n1.duration_since(n0) );
    println!("Decoding took: {:?}", n2.duration_since(n1) );

    let n0 = Instant::now();
    let content = FileUtil::read_file_as_u8( &format!("{}/AA.bin", in_dir) );
    let n1 = Instant::now();
    let aa: Vec<Vec<String>> = deserialize(&FileUtil::read_file_as_u8(&format!("{}/AA.bin", in_dir))).unwrap();
    let n2 =Instant::now();

    println!("Reading took: {:?}", n1.duration_since(n0) );
    println!("Decoding took: {:?}", n2.duration_since(n1) );

I get:

Reading took: 79.107565ms
Decoding took: 2.905976255s
Reading took: 79.896077ms
Decoding took: 1.3838087s

Both files are ~150MB. It seems decoding bincode Vec<Vec> is twice as fast as decoding HashMap ?

Both 150MB chunks takes ~80ms to read, which I'm okay with. The deserialize time though, I'm still unhappy with.

Try running it through a profiler (you can enable debug info in release builds, or optimization in debug builds - see Cargo profiles). Rust is compatible with profilers for C.

I'm using msgpack, which is sort of similar, and it's nearly instant for 500MB files.

1 Like

Do you have a minimal example I can copy/paste?
I'm looking at https://github.com/3Hren/msgpack-rust and a bit confused by

rmp, rmp-serialize, rmps, rmpv

I think https://github.com/3Hren/msgpack-rust/blob/master/rmp-serde/tests/decode_derive.rs is closest to what I want, but it's not clear to me how the 4 libs interact.

Also, why is there rmp_serialize, which uses rustc_serialize, which appears to be depreciated in favor of serde?

I don't use 4 libs, only the one for Serde that serialises and deserializes Serde's structs. I can't give links - I'm on mobile currently.

1 Like

I think I got msgpack working. I get:

``
Reading took: 3.840637ms
Decoding took: 139.259831ms
Reading took: 38.557177ms
Decoding took: 1.846417399s


wu32f.bin is now only 4.5MB, explaining why it's reading faster; AA.bin is down to 70Mb (instead of 150MB)

the decoding time is still rather high

EDIT: the improvement in the first benchmark is due to the file being smaller (~50-100x smaller). I was running it on a smaller part of the dataset.

The part that is still concerning is the decoding time (after all data is in memory) even with msgpack of the Vec<Vec>.

At this point, I'm hypothesizing that it may just be the overhead of creating lots of short strings. (We're doing one string per english word.)


    let n0 = Instant::now();
    let content = FileUtil::read_file_as_u8( &format!("{}/wu32f.bin", in_dir) );

    let n1 = Instant::now();
    let x: WordU32Freq1 = Deserialize::deserialize( &mut Deserializer::new(&content[..]) ).unwrap();

    let n2 =Instant::now();

    println!("HashMap IO took: {:?}", n1.duration_since(n0) );
    println!("HashMap Decoding  took: {:?}", n2.duration_since(n1) );

    let n0 = Instant::now();
    let content = FileUtil::read_file_as_u8( &format!("{}/AA.bin", in_dir) );
    let n1 = Instant::now();
    let aa: Vec<Vec<String>> = Deserialize::deserialize(&mut Deserializer::new(&content[..])).unwrap();
    let n2 =Instant::now();

    println!("VecVec IO took: {:?}", n1.duration_since(n0) );
    println!("VecVec Decoding took: {:?}", n2.duration_since(n1) );


    println!("HashMap: {}", x.storage.len());

    let mut cnt = 0;
    for i in aa {
        cnt = cnt + i.len();
    }

    println!("VecVec String Count: {}", cnt);

Gives timings of:

HashMap IO took: 61.161098ms
HashMap Decoding  took: 3.222107978s
VecVec IO took: 48.210029ms
VecVec Decoding took: 1.592420886s
HashMap: 6067165
VecVec String Count: 12921909

The HashMap is ~ 100MB, whilethe VecVec is 70MB.

At thispoint, I'm going t just attribute the cost to the high # of entires in the HashMap / VecVec

I would guess that practically all of the time is in HashMap insertion. Practically none of it should be due to bincode, messagepack, or string allocations which take almost no time on this scale.

I would recommend benchmarking in a way that allows you to compare deserialization time against the time for purely inserting the same data (already in memory) into an equivalent hash map. Example:

use serde::{Serialize, Deserialize};
use std::collections::HashMap as Map;
use std::time::Instant;

#[derive(Debug, Serialize, Deserialize)]
struct WordU32Freq1Entry {
    idx: u32,
    cnt: u32,
}

#[derive(Debug, Serialize, Deserialize)]
struct WordU32Freq1 {
    storage: Map<String, WordU32Freq1Entry>,
}

fn main() {
    let mut freq = WordU32Freq1 {
        storage: Map::default(),
    };
    for i in 0..6_000_000 {
        let entry = WordU32Freq1Entry { idx: i, cnt: i };
        freq.storage.insert(i.to_string(), entry);
    }

    let bytes = bincode::serialize(&freq).unwrap();
    assert_eq!(bytes.len(), 136_888_898);

    let start_deserialization = Instant::now();
    let freq: WordU32Freq1 = bincode::deserialize(&bytes).unwrap();
    println!("deserialization: {:?}", start_deserialization.elapsed());

    let entries: Vec<_> = freq.storage.into_iter().collect();

    let start_insertions = Instant::now();
    let mut storage: Map<String, WordU32Freq1Entry> = Map::default();
    for (key, value) in entries {
        storage.insert(key, value);
    }
    println!("insertions only: {:?}", start_insertions.elapsed());
}

You should find that those times are almost the same, which means insertion is taking almost all the time.

I would expect around 2-3x faster on this workload (depending on your exact distribution of key sizes) if you don't need resistance to hash collision attacks and can switch to fnv or some other fast hasher.

2 Likes

Good suggestion. I write code to rebuild HashMap and VecVec from scratch (no refs / sharing). I get:

running 1 test
HashMap IO took: 64.607594ms
HashMap Decoding  took: 3.263006263s
VecVec IO took: 42.217772ms
VecVec Decoding took: 1.676483586s
HashMap: 6067165
VecVec String Count: 12921909
HashMap Rebuild took: 2.938261858s
VecVec Rebuild took: 1.235551057s

So yeah, looks like a significant chunk of the time is spent building the HashMap / VecVec.