Serde + cbor very slow when used for saving and loading game maps

In the game I'm working on, I have a set of images representing the foreground and background, as well as entity placements around the map. I allow any image format supported by the Image crate. This is a convenient format for map-making, but I have a separate utility which precompiles the map into its canonical structure and serializes it to a file:

#[derive(Serialize, Deserialize)]
pub struct Graphics {
    pub z0: Vec<Color>,

#[derive(Serialize, Deserialize)]
pub struct Map {
    pub width: usize,
    pub height: usize,
    pub graphics: Graphics,
    pub player_spawn_points: [Position; 8],
    pub weapon_spawn_points: Vec<(Position, Weapon)>,

Currently I'm just testing with only the foreground included and it takes 15 seconds on a release build to load and deserialize this structure for a 3,264px x 2,448px image. Loading the original image as an uncompressed bitmap only takes 3 seconds.

I'm sure there are more clever ways to represent the map graphics, but even so the time taken here is very surprising. Am I choosing the wrong tool for reading in and writing out a large amount of binary data?

perf output:

-  100.00%        map_compiler                                                                                        β–’
   -   46.79%        map_compiler                                                                                     β–’
          15.76%        [.] serde_cbor::de::Deserializer<R>::parse_value                                              β–’
          12.87%        [.] <serde_cbor::read::IoRead<R> as serde_cbor::read::Read>::read_into                        β–’
          12.31%        [.] serde_cbor::read::IoRead<R>::next_inner                                                   β–’
           5.84%        [.] serde_cbor::de::Deserializer<R>::parse_array                                              β–’
           0.01%        [.] <serde_cbor::read::IoRead<R> as serde_cbor::read::Read>::read_to_buffer                   β–’
           0.00%        [.] core::str::from_utf8                                                                      β–’
   -   41.50%        [unknown]                                                                                        β–’
          41.08%        [k] 0xffffffffa9400163                                                                        β–’
           0.42%        [k] 0xffffffffa9400b07                                                                        β–’
   -   11.68%                                                                               β–’
          11.68%        [.] __libc_read                                                                               β–’
   +    0.02%                                                                                       β–’
   +    0.01%

It does seem that about half the time is spent on syscalls according to time as well. I stripped out everything from the map_compiler binary except the actual deserialization call:

fn main() {
    let b: Map = serde_cbor::from_reader(std::fs::File::open("maps/zorf").unwrap()).unwrap();
target/release/map_compiler  8.60s user 8.32s system 99% cpu 16.938 total

I've swapped in and it's much much faster.

Profile the (de)serializer. I'm guessing it's not (de)serializing a contiguous flat byte buffer, instead it's trying to store and parse a huge vec of numbers, each number on its own. I don't know what Color is, but if it's a primitive, then you could look into serde_bytes to tell the (de)serializer you have a flat memory buffer.

1 Like

It looks like you’re not using buffered input, the deserialiser may be doing a bunch of small reads to parse each element. Try wrapping the file in a BufReader and see if that affects the performance.

1 Like

I'm fairly certain your program is very busy with allocating memory for your Vec<Color>, because the Vec is created through deserialize, not deserialize_in_place. The latter allows you to create the Vec with Vec::with_capacity yourself and then pass it into the method through a mutable reference. If you know the exact length, you won't have to to reallocate even once, speeding up your deserialization tremendously and minimizing max. memory usage during deserialization. You may end up writing a custom Serialize and Deserialize implementation for Graphics, that somehow encodes the length, too. When I started customizing the Deserialize implementation for my own data structures, I used cargo expand to show the generated code, which I then copy & pasted into the file, followed by removing #[derive(Deserialize)]. You'll have to install the command from, first, though.

P.S.: I'm actually still in the middle of rewriting my own Deserialize implementation, because I found out about deserialize_in_place myself, just yesterday, otherwise I would've shown you some working code example.


Regarding the size of the Vec<Color>, one advantage of bincode is that it always knows the size of vectors before-hand.

1 Like

These are all great responses for understanding why there's such a difference. The profiling of the CBOR deserialization showed that it's spending a lot of time on sign-extended copy operations. I'm not sure why sign extension is necessary for these bytes.

Also, I hadn't considered cargo-expand.


Your suggestion made the loading at least 40 times faster over several runs. Combined with the speedup from switching to bincode, this is extremely fast.