Why does rust binary take so much space?

I've already applied common suggestions

[profile.release]
opt-level = 'z'
lto = true
panic = 'abort'

And still my 400 lines of code rust application takes 334k after stripping with statically linked musl libc.

I've read that statically linked musl should take as less as 10KB.

cargo bloat -n 30 --release --target x86_64-unknown-linux-musl
    Finished release [optimized] target(s) in 0.02s
    Analyzing target/x86_64-unknown-linux-musl/release/morr

 File  .text     Size       Crate Name
 0.8%   5.3%  13.1KiB         std std::process::Command::output
 0.5%   3.4%   8.4KiB         std _ZN9libunwind10CFI_ParserINS_17LocalAddressSpaceEE17parseInstructionsERS1_mmRKNS2_8CIE_InfoEmRPNS2_20PrologInfoStackEntryEiPNS2_10PrologInfoE
 0.5%   3.3%   8.2KiB        morr morr::main
 0.5%   2.8%   7.0KiB         std std::sync::once::Once::call_once::{{closure}}
 0.4%   2.7%   6.7KiB   crossterm <crossterm::event::source::unix::UnixInternalEventSource as crossterm::event::source::EventSource>::try_read
 0.4%   2.3%   5.6KiB   [Unknown] elf_add
 0.3%   2.1%   5.3KiB   [Unknown] dwarf_lookup_pc
 0.3%   1.7%   4.2KiB         std rustc_demangle::v0::Printer::print_type
 0.3%   1.6%   4.0KiB         std alloc::collections::btree::map::BTreeMap<K,V>::insert
 0.3%   1.6%   4.0KiB         std <backtrace::symbolize::SymbolName as core::fmt::Display>::fmt
 0.3%   1.6%   4.0KiB   [Unknown] elf_zlib_inflate_and_verify
 0.2%   1.5%   3.8KiB         std _ZN9libunwind17DwarfInstructionsINS_17LocalAddressSpaceENS_16Registers_x86_64EE13stepWithDwarfERS1_mmRS2_
 0.2%   1.4%   3.4KiB         std rustc_demangle::try_demangle
 0.2%   1.3%   3.1KiB        morr morr::draw
 0.2%   1.2%   2.9KiB   [Unknown] fmt_fp
 0.2%   1.1%   2.8KiB         std _ZN9libunwind17DwarfInstructionsINS_17LocalAddressSpaceENS_16Registers_x86_64EE18evaluateExpressionEmRS1_RKS2_m
 0.2%   1.0%   2.5KiB   crossterm alloc::sync::Arc<T>::drop_slow
 0.2%   1.0%   2.5KiB         std std::sys_common::backtrace::output_filename
 0.2%   1.0%   2.5KiB         std __rdos_backtrace_dwarf_add
 0.2%   1.0%   2.5KiB   [Unknown] printf_core
 0.1%   0.9%   2.3KiB         mio mio::poll::Poll::poll
 0.1%   0.9%   2.2KiB         std _ZN9libunwind14EHHeaderParserINS_17LocalAddressSpaceEE7findFDEERS1_mmjPNS_10CFI_ParserIS1_E8FDE_InfoEPNS5_8CIE_InfoE
 0.1%   0.9%   2.2KiB         std core::str::pattern::StrSearcher::new
 0.1%   0.9%   2.2KiB         std rustc_demangle::v0::Printer::print_path
 0.1%   0.9%   2.1KiB   [Unknown] read_function_entry
 0.1%   0.8%   2.0KiB   crossterm crossterm::terminal::sys::unix::tput_value
 0.1%   0.8%   1.9KiB         std _ZN9libunwind10CFI_ParserINS_17LocalAddressSpaceEE8parseCIEERS1_mPNS2_8CIE_InfoE
 0.1%   0.7%   1.7KiB         std std::sync::once::Once::call_inner
 0.1%   0.6%   1.6KiB signal_hook alloc::collections::btree::map::BTreeMap<K,V>::insert
 0.1%   0.6%   1.6KiB         std std::panicking::rust_panic_with_hook
 8.3%  51.6% 128.0KiB             And 895 smaller methods. Use -n N to show more.
16.1% 100.0% 248.0KiB             .text section size, the file size is 1.5MiB

It's implementation of less utility in rust. source code

I want to know the purpose of each byte in the binary. Cargo bloat doesn't answer that quesiton.

Not that this answers your question, but if you are really concerned about file size for executables you could try out upx.

Just download it and run:

upx --brute myprogram

It can have insane compression ratios reducing the binary size by 5 times depending on the program, but it can also push up the startup time depending on the system you run it on, so obviously you want to start by making sure the initial program is small without compression.

crossterm is 4600 lines of code plus up to 20,000 lines in its dependencies.

You can add -n to cargo-bloat to display more methods.

It would be cool to hook that data up to a treemap:

4 Likes

Uh, yeah, that would be awesome. :sunglasses:

I've tried -n 100 and it haven't clarified a single thing for me.

You can see that most of the space is a giant text size.
16.1% 100.0% 248.0KiB .text section size, the file size is 1.5MiB.

Why is it there?

I believe .text sections are sections which contain code (as opposed to static data or binary metadata). cargo-bloat only analyzes the size of compiled code, so it only analyzes the .text section.

It is fairly surprising to see that the code makes up only 16% of the binary, but I guess function names, static strings and other staticly included data are fairly large?


It's worth noting that while rustc does care about binary size, it is not explicitly optimized for it. Many of the design choices, such as the way generics work, are built on the assumption that storage space is not a bottleneck.

It isn't impossible to get a small binary in Rust. But it's not going to be anywhere as easy as C to get a tiny binary, and to achieve similar probably want to choose your dependencies carefully and probably code with that in mind.

If you really do need tiny binaries, it's possible. Rust has been used for embedded devices, and it can be manipulated to make tinier binaries, especially if you leave out the rust standard library and only depend on core. GitHub - HugoPlatzer/tiny-rust-binaries is one useful exploration of that, and James Munns might be a good read for writing general-purpose tiny binaries as well.

Still, I want to stress, if you don't need this - if you're running on a regular server, and space isn't at a premium - getting a tiny binary probably won't be worth the extra pain.

3 Likes

Since I've specified panic = 'abort' and I don't use static strings myself, why would they take place? Function names?... Which names exactly? And why haven't they been optimized away?... I don't want to guess, I want to know what's going on, be in control and know that I've done reasonably good job on minimizing the binary. And to know that I need to know what is the cost of improving the situation actually.

Do you also manually run strip on it? Rust puts some debug information even in release mode.

1 Like

On abort panic!("still prints this").

1 Like

You should be able to use the size command to see where the memory is allocated (text, data, bss.)

On macos, you can use gobjdump -x path/to/main to dump everything included in the binary. (Might need to download this from somewhere.) Linux systems have dumpelf with does something similar. Not sure about Windows...

For me, this gives (a 251Kb stripped executable):


target/release/morr:     file format mach-o-x86-64
target/release/morr
architecture: i386:x86-64, flags 0x00000002:
EXEC_P
start address 0x00000001000036c0
 MACH-O header:
   magic:      0xfeedfacf
   cputype:    0x1000007 (X86_64)
   cpusubtype: 0x80000003 (LIB64) (X86_ALL)
   filetype:   0x2
   ncmds:      0xa
   sizeocmds:  0x6a0
   flags:      0xa00085
   version:    2

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00034510  0000000100001440  0000000100001440  00001440  2**4
                  CONTENTS, ALLOC, LOAD, CODE
  1 __TEXT.__stubs 0000029a  0000000100035950  0000000100035950  00035950  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 __TEXT.__stub_helper 00000466  0000000100035bec  0000000100035bec  00035bec  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .const        00003280  0000000100036060  0000000100036060  00036060  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .cstring      00000614  00000001000392e0  00000001000392e0  000392e0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 __TEXT.__unwind_info 00000700  00000001000398f4  00000001000398f4  000398f4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 __DATA_CONST.__got 00000010  000000010003a000  000000010003a000  0003a000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  7 __DATA_CONST.__const 000021a0  000000010003a010  000000010003a010  0003a010  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  8 __DATA.__la_symbol_ptr 00000378  000000010003d000  000000010003d000  0003d000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  9 __DATA.__thread_vars 000000d8  000000010003d378  000000010003d378  0003d378  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 10 .data         00000320  000000010003d450  000000010003d450  0003d450  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 11 __DATA.__thread_data 00000170  000000010003d770  000000010003d770  0003d770  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 12 __DATA.__thread_bss 00000090  000000010003d8e0  000000010003d8e0  0003d8e0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 13 .bss          00000310  000000010003d980  000000010003d980  00000000  2**6
                  ALLOC
SYMBOL TABLE:
no symbols

0x35950 = ~214Kb, so it looks like most of this is in that first .text section.

With the unstripped library, dumping debug info using gobjdump -d -g, I'm seeing that most of this is code from the std library. Allocating, formatting, slices, etc..

A quick count gives some rough proportions:

rg std dump.txt | wc -l โžœ 3333
rg core dump.txt | wc -l โžœ 2943
rg crossterm dump.txt | wc -l โžœ 483
rg morr dump.txt | wc -l โžœ 376
rg memchr dump.txt | wc -l โžœ 96
rg memmap dump.txt | wc -l โžœ 3

2 Likes

About half of the bin size is the statically linked std, which does not get the dead code treatment from lto. You can workaround this by following the instructions here: GitHub - johnthagen/min-sized-rust: ๐Ÿฆ€ How to minimize Rust binary size ๐Ÿ“ฆ

Before and after results:

$ cargo build --release && strip target/release/morr && ls -l target/release/morr
    Finished release [optimized] target(s) in 0.10s
-rwxr-xr-x  1 jay  staff  286628 Apr 17 00:59 target/release/morr*

$ xargo build --target x86_64-apple-darwin --release && strip target/x86_64-apple-darwin/release/morr && ls -l target/x86_64-apple-darwin/release/morr
    Finished release [optimized] target(s) in 0.04s
-rwxr-xr-x  1 jay  staff  162464 Apr 17 01:00 target/x86_64-apple-darwin/release/morr*

Now that it's down to 159 KiB, let's see what cargo-bloat has to say:

$ xargo bloat --release --target x86_64-apple-darwin -n 30 -w
    Finished release [optimized] target(s) in 0.06s
    Analyzing target/x86_64-apple-darwin/release/morr

 File  .text     Size                Crate Name
 5.3%  10.2%  12.2KiB                  std std::sys::unix::process::process_inner::<impl std::sys::unix::process::process_common::Command>::spawn
 3.2%   6.1%   7.3KiB                 morr morr::main
 2.8%   5.3%   6.4KiB            crossterm <crossterm::event::source::unix::UnixInternalEventSource as crossterm::event::source::EventSource>::try_read
 2.7%   5.1%   6.2KiB                  std std::sync::once::Once::call_once::{{closure}}
 1.7%   3.2%   3.9KiB            crossterm alloc::sync::Arc<T>::drop_slow
 1.7%   3.2%   3.9KiB            [Unknown] __mh_execute_header
 1.5%   2.8%   3.4KiB            crossterm crossterm::terminal::sys::unix::tput_value
 1.2%   2.4%   2.9KiB                 morr morr::draw
 1.1%   2.2%   2.6KiB                  mio mio::poll::Poll::poll
 1.1%   2.1%   2.5KiB                  std alloc::collections::btree::map::BTreeMap<K,V>::insert
 1.0%   1.8%   2.2KiB                  std ___rust_probestack
 0.7%   1.3%   1.5KiB          signal_hook alloc::collections::btree::map::BTreeMap<K,V>::insert
 0.7%   1.3%   1.5KiB                  std core::ptr::drop_in_place
 0.5%   1.0%   1.3KiB                 morr morr::line_reader::LineReader::read
 0.5%   1.0%   1.2KiB signal_hook_registry signal_hook_registry::GlobalData::load
 0.5%   0.9%   1.1KiB                  std <core::str::lossy::Utf8LossyChunksIter as core::iter::traits::iterator::Iterator>::next
 0.5%   0.9%   1.0KiB                  mio hashbrown::raw::RawTable<T>::reserve_rehash
 0.5%   0.9%   1.0KiB                  std std::sync::once::Once::call_inner
 0.5%   0.9%   1.0KiB                 morr morr::line_reader::LineReader::read_forw
 0.4%   0.8%   1.0KiB          signal_hook hashbrown::raw::RawTable<T>::reserve_rehash
 0.4%   0.8%     926B                  std <std::io::Write::write_fmt::Adaptor<T> as core::fmt::Write>::write_str
 0.4%   0.7%     922B            [Unknown] _main
 0.4%   0.7%     892B signal_hook_registry signal_hook_registry::handler
 0.4%   0.7%     879B     parking_lot_core parking_lot_core::word_lock::WordLock::lock_slow
 0.3%   0.6%     765B     parking_lot_core parking_lot_core::parking_lot::HashTable::new
 0.3%   0.6%     714B                  std <std::ffi::os_str::OsString as core::fmt::Debug>::fmt
 0.3%   0.6%     696B                  std core::str::slice_error_fail
 0.3%   0.5%     671B                  std std::panicking::rust_panic_with_hook
 0.3%   0.5%     664B                  std std::sys::unix::process::process_common::Stdio::to_child_stdio
 0.3%   0.5%     621B                  std std::sys::unix::fs::File::open_c
23.6%  45.3%  54.4KiB                      And 695 smaller methods. Use -n N to show more.
52.2% 100.0% 120.2KiB                      .text section size, the file size is 230.4KiB

There is still a lot of crossterm and std code in the bin. Let's see which crates actually take up the most space:

$ xargo bloat --release --target x86_64-apple-darwin -n 0 --message-format json | jq '[ .functions | group_by(.crate)[] | { crate: ([ .[].crate ] | unique)[], size: (map(.size) | add) } ] | sort_by(.size) | reverse'
    Finished release [optimized] target(s) in 0.04s
    Analyzing target/x86_64-apple-darwin/release/morr

[
  {
    "crate": "std",
    "size": 72384
  },
  {
    "crate": "crossterm",
    "size": 17664
  },
  {
    "crate": "morr",
    "size": 13375
  },
  {
    "crate": "mio",
    "size": 6478
  },
  {
    "crate": "signal_hook_registry",
    "size": 6112
  },
  {
    "crate": null,
    "size": 5089
  },
  {
    "crate": "signal_hook",
    "size": 3432
  },
  {
    "crate": "parking_lot_core",
    "size": 3172
  },
  {
    "crate": "std?",
    "size": 902
  },
  {
    "crate": "log",
    "size": 214
  },
  {
    "crate": "morr?",
    "size": 181
  },
  {
    "crate": "memchr",
    "size": 124
  },
  {
    "crate": "arc_swap",
    "size": 56
  },
  {
    "crate": "parking_lot",
    "size": 44
  },
  {
    "crate": "memmap",
    "size": 13
  }
]

Side note: Wow! jq is insanely powerful! I need to learn how to use it efficiently...

So there you go, a complete breakdown of which crates contribute to bin size! std accounts for 71 KiB, and crossterm 17 KiB. To dig in further, you will need to look at the call graph (e.g. find which crates are using so much of std).

For example, let's answer the following question: why is std::process::Command being used (the biggest function)? I used Hopper to disassemble the bin, but objdump or whatever could work, too. Turns out, crossterm::terminal::sys::unix::tput_value() is responsible, which is called by terminal::size(), which you call here: morr/main.rs at 3d7abdad201c9644084ce072dc9bcf4b10438fca ยท minnimumm/morr ยท GitHub

In other words, you can easily shave off 12.2 KiB (or more) just by not calling that function! Or better yet, if you replaced crossterm with something like crosscurses you can actually get the terminal size without a sub-shell call! This is by no means a silver bullet, of course, ncurses has its own flavor of bloat.

14 Likes

Yes I run strip. 1.5 MB unstripped ~300k stripped.

Finally a helpfull answer! Thanks a lot for your breakdown, will try to apply suggestions. Definitely want to stay as much rust native as possible and don't want to use anything that is based on curses. So maybe will create pull request to crossterm.

Wow, that's an awesome breakdown! Thanks for that. That could be useful to know.

No problem. I have investigated these kinds of binary bloat problems before. Surprisingly, clap is the biggest offender I've seen so far... No joke. Check out any of the examples in the clap repo:

[
  {
    "crate": "clap",
    "size": 100637
  },
  {
    "crate": null,
    "size": 46160
  },
  {
    "crate": "std",
    "size": 27984
  },
  {
    "crate": "termcolor",
    "size": 6539
  },
  {
    "crate": "clap?",
    "size": 5928
  },
  {
    "crate": "_20_subcommands",
    "size": 5518
  }
]

98 KiB just for argument parsing... :woman_facepalming:

Part of the skill in making these kinds of optimizations is finding viable alternatives to straight up bloaty crates.

1 Like

what program you have used to generate this treemap

In fairness to clap, they did look at this problem and back in Jan 2018 achieved a 57% decrease in size. And there's this quote:

I would however kind of like to keep things in perspective. While I agree binary sizes should be as small as possible, clap does a lot especially when compared to minimal arg parsers like getopts , pirate , or argparse where you end up implementing all those features manually and those implementations won't get counted against smaller arg parsing libs. So comparing binary sizes of arg parsers isn't super fair because it's not a true apples to apples comparison, even though it sounds like it should be.

I've never done a detailed comparison myself, and of course if you care about your binary size it's completely reasonable to trade a "batteries included" crate for a more minimalist/low-level crate, but we probably shouldn't dismiss the former as "bloaty" unless we can point to some source of binary size that really is unnecessary for the functionality they provide. And they're still thinking about opt-in features to trade functionality for size.

5 Likes

That's a screenshot from webpack-bundle-analyzer - npm

In Rust this layout can be computed with with Treemap โ€” Rust implementation // Lib.rs

1 Like

Oh, cool that there is a crate for that already. Then you could use basic drawing tools with a crate like image and the font-kit library for rendering text on top of the squares in the tree map, and then build that into a cargo bloat subcommand.