Why does rust binary take so much space?

About half of the bin size is the statically linked std, which does not get the dead code treatment from lto. You can workaround this by following the instructions here: GitHub - johnthagen/min-sized-rust: ๐Ÿฆ€ How to minimize Rust binary size ๐Ÿ“ฆ

Before and after results:

$ cargo build --release && strip target/release/morr && ls -l target/release/morr
    Finished release [optimized] target(s) in 0.10s
-rwxr-xr-x  1 jay  staff  286628 Apr 17 00:59 target/release/morr*

$ xargo build --target x86_64-apple-darwin --release && strip target/x86_64-apple-darwin/release/morr && ls -l target/x86_64-apple-darwin/release/morr
    Finished release [optimized] target(s) in 0.04s
-rwxr-xr-x  1 jay  staff  162464 Apr 17 01:00 target/x86_64-apple-darwin/release/morr*

Now that it's down to 159 KiB, let's see what cargo-bloat has to say:

$ xargo bloat --release --target x86_64-apple-darwin -n 30 -w
    Finished release [optimized] target(s) in 0.06s
    Analyzing target/x86_64-apple-darwin/release/morr

 File  .text     Size                Crate Name
 5.3%  10.2%  12.2KiB                  std std::sys::unix::process::process_inner::<impl std::sys::unix::process::process_common::Command>::spawn
 3.2%   6.1%   7.3KiB                 morr morr::main
 2.8%   5.3%   6.4KiB            crossterm <crossterm::event::source::unix::UnixInternalEventSource as crossterm::event::source::EventSource>::try_read
 2.7%   5.1%   6.2KiB                  std std::sync::once::Once::call_once::{{closure}}
 1.7%   3.2%   3.9KiB            crossterm alloc::sync::Arc<T>::drop_slow
 1.7%   3.2%   3.9KiB            [Unknown] __mh_execute_header
 1.5%   2.8%   3.4KiB            crossterm crossterm::terminal::sys::unix::tput_value
 1.2%   2.4%   2.9KiB                 morr morr::draw
 1.1%   2.2%   2.6KiB                  mio mio::poll::Poll::poll
 1.1%   2.1%   2.5KiB                  std alloc::collections::btree::map::BTreeMap<K,V>::insert
 1.0%   1.8%   2.2KiB                  std ___rust_probestack
 0.7%   1.3%   1.5KiB          signal_hook alloc::collections::btree::map::BTreeMap<K,V>::insert
 0.7%   1.3%   1.5KiB                  std core::ptr::drop_in_place
 0.5%   1.0%   1.3KiB                 morr morr::line_reader::LineReader::read
 0.5%   1.0%   1.2KiB signal_hook_registry signal_hook_registry::GlobalData::load
 0.5%   0.9%   1.1KiB                  std <core::str::lossy::Utf8LossyChunksIter as core::iter::traits::iterator::Iterator>::next
 0.5%   0.9%   1.0KiB                  mio hashbrown::raw::RawTable<T>::reserve_rehash
 0.5%   0.9%   1.0KiB                  std std::sync::once::Once::call_inner
 0.5%   0.9%   1.0KiB                 morr morr::line_reader::LineReader::read_forw
 0.4%   0.8%   1.0KiB          signal_hook hashbrown::raw::RawTable<T>::reserve_rehash
 0.4%   0.8%     926B                  std <std::io::Write::write_fmt::Adaptor<T> as core::fmt::Write>::write_str
 0.4%   0.7%     922B            [Unknown] _main
 0.4%   0.7%     892B signal_hook_registry signal_hook_registry::handler
 0.4%   0.7%     879B     parking_lot_core parking_lot_core::word_lock::WordLock::lock_slow
 0.3%   0.6%     765B     parking_lot_core parking_lot_core::parking_lot::HashTable::new
 0.3%   0.6%     714B                  std <std::ffi::os_str::OsString as core::fmt::Debug>::fmt
 0.3%   0.6%     696B                  std core::str::slice_error_fail
 0.3%   0.5%     671B                  std std::panicking::rust_panic_with_hook
 0.3%   0.5%     664B                  std std::sys::unix::process::process_common::Stdio::to_child_stdio
 0.3%   0.5%     621B                  std std::sys::unix::fs::File::open_c
23.6%  45.3%  54.4KiB                      And 695 smaller methods. Use -n N to show more.
52.2% 100.0% 120.2KiB                      .text section size, the file size is 230.4KiB

There is still a lot of crossterm and std code in the bin. Let's see which crates actually take up the most space:

$ xargo bloat --release --target x86_64-apple-darwin -n 0 --message-format json | jq '[ .functions | group_by(.crate)[] | { crate: ([ .[].crate ] | unique)[], size: (map(.size) | add) } ] | sort_by(.size) | reverse'
    Finished release [optimized] target(s) in 0.04s
    Analyzing target/x86_64-apple-darwin/release/morr

[
  {
    "crate": "std",
    "size": 72384
  },
  {
    "crate": "crossterm",
    "size": 17664
  },
  {
    "crate": "morr",
    "size": 13375
  },
  {
    "crate": "mio",
    "size": 6478
  },
  {
    "crate": "signal_hook_registry",
    "size": 6112
  },
  {
    "crate": null,
    "size": 5089
  },
  {
    "crate": "signal_hook",
    "size": 3432
  },
  {
    "crate": "parking_lot_core",
    "size": 3172
  },
  {
    "crate": "std?",
    "size": 902
  },
  {
    "crate": "log",
    "size": 214
  },
  {
    "crate": "morr?",
    "size": 181
  },
  {
    "crate": "memchr",
    "size": 124
  },
  {
    "crate": "arc_swap",
    "size": 56
  },
  {
    "crate": "parking_lot",
    "size": 44
  },
  {
    "crate": "memmap",
    "size": 13
  }
]

Side note: Wow! jq is insanely powerful! I need to learn how to use it efficiently...

So there you go, a complete breakdown of which crates contribute to bin size! std accounts for 71 KiB, and crossterm 17 KiB. To dig in further, you will need to look at the call graph (e.g. find which crates are using so much of std).

For example, let's answer the following question: why is std::process::Command being used (the biggest function)? I used Hopper to disassemble the bin, but objdump or whatever could work, too. Turns out, crossterm::terminal::sys::unix::tput_value() is responsible, which is called by terminal::size(), which you call here: morr/main.rs at 3d7abdad201c9644084ce072dc9bcf4b10438fca ยท minnimumm/morr ยท GitHub

In other words, you can easily shave off 12.2 KiB (or more) just by not calling that function! Or better yet, if you replaced crossterm with something like crosscurses you can actually get the terminal size without a sub-shell call! This is by no means a silver bullet, of course, ncurses has its own flavor of bloat.

14 Likes