Why does rust binary take so much space?

I've tried -n 100 and it haven't clarified a single thing for me.

You can see that most of the space is a giant text size.
16.1% 100.0% 248.0KiB .text section size, the file size is 1.5MiB.

Why is it there?

I believe .text sections are sections which contain code (as opposed to static data or binary metadata). cargo-bloat only analyzes the size of compiled code, so it only analyzes the .text section.

It is fairly surprising to see that the code makes up only 16% of the binary, but I guess function names, static strings and other staticly included data are fairly large?


It's worth noting that while rustc does care about binary size, it is not explicitly optimized for it. Many of the design choices, such as the way generics work, are built on the assumption that storage space is not a bottleneck.

It isn't impossible to get a small binary in Rust. But it's not going to be anywhere as easy as C to get a tiny binary, and to achieve similar probably want to choose your dependencies carefully and probably code with that in mind.

If you really do need tiny binaries, it's possible. Rust has been used for embedded devices, and it can be manipulated to make tinier binaries, especially if you leave out the rust standard library and only depend on core. https://github.com/HugoPlatzer/tiny-rust-binaries is one useful exploration of that, and https://jamesmunns.com/blog/tinyrocket/ might be a good read for writing general-purpose tiny binaries as well.

Still, I want to stress, if you don't need this - if you're running on a regular server, and space isn't at a premium - getting a tiny binary probably won't be worth the extra pain.

3 Likes

Since I've specified panic = 'abort' and I don't use static strings myself, why would they take place? Function names?... Which names exactly? And why haven't they been optimized away?... I don't want to guess, I want to know what's going on, be in control and know that I've done reasonably good job on minimizing the binary. And to know that I need to know what is the cost of improving the situation actually.

Do you also manually run strip on it? Rust puts some debug information even in release mode.

1 Like

On abort panic!("still prints this").

1 Like

You should be able to use the size command to see where the memory is allocated (text, data, bss.)

On macos, you can use gobjdump -x path/to/main to dump everything included in the binary. (Might need to download this from somewhere.) Linux systems have dumpelf with does something similar. Not sure about Windows...

For me, this gives (a 251Kb stripped executable):


target/release/morr:     file format mach-o-x86-64
target/release/morr
architecture: i386:x86-64, flags 0x00000002:
EXEC_P
start address 0x00000001000036c0
 MACH-O header:
   magic:      0xfeedfacf
   cputype:    0x1000007 (X86_64)
   cpusubtype: 0x80000003 (LIB64) (X86_ALL)
   filetype:   0x2
   ncmds:      0xa
   sizeocmds:  0x6a0
   flags:      0xa00085
   version:    2

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00034510  0000000100001440  0000000100001440  00001440  2**4
                  CONTENTS, ALLOC, LOAD, CODE
  1 __TEXT.__stubs 0000029a  0000000100035950  0000000100035950  00035950  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 __TEXT.__stub_helper 00000466  0000000100035bec  0000000100035bec  00035bec  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .const        00003280  0000000100036060  0000000100036060  00036060  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .cstring      00000614  00000001000392e0  00000001000392e0  000392e0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 __TEXT.__unwind_info 00000700  00000001000398f4  00000001000398f4  000398f4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 __DATA_CONST.__got 00000010  000000010003a000  000000010003a000  0003a000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  7 __DATA_CONST.__const 000021a0  000000010003a010  000000010003a010  0003a010  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  8 __DATA.__la_symbol_ptr 00000378  000000010003d000  000000010003d000  0003d000  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  9 __DATA.__thread_vars 000000d8  000000010003d378  000000010003d378  0003d378  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 10 .data         00000320  000000010003d450  000000010003d450  0003d450  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 11 __DATA.__thread_data 00000170  000000010003d770  000000010003d770  0003d770  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 12 __DATA.__thread_bss 00000090  000000010003d8e0  000000010003d8e0  0003d8e0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 13 .bss          00000310  000000010003d980  000000010003d980  00000000  2**6
                  ALLOC
SYMBOL TABLE:
no symbols

0x35950 = ~214Kb, so it looks like most of this is in that first .text section.

With the unstripped library, dumping debug info using gobjdump -d -g, I'm seeing that most of this is code from the std library. Allocating, formatting, slices, etc..

A quick count gives some rough proportions:

rg std dump.txt | wc -l ➜ 3333
rg core dump.txt | wc -l ➜ 2943
rg crossterm dump.txt | wc -l ➜ 483
rg morr dump.txt | wc -l ➜ 376
rg memchr dump.txt | wc -l ➜ 96
rg memmap dump.txt | wc -l ➜ 3

2 Likes

About half of the bin size is the statically linked std, which does not get the dead code treatment from lto. You can workaround this by following the instructions here: https://github.com/johnthagen/min-sized-rust#optimize-libstd-with-xargo

Before and after results:

$ cargo build --release && strip target/release/morr && ls -l target/release/morr
    Finished release [optimized] target(s) in 0.10s
-rwxr-xr-x  1 jay  staff  286628 Apr 17 00:59 target/release/morr*

$ xargo build --target x86_64-apple-darwin --release && strip target/x86_64-apple-darwin/release/morr && ls -l target/x86_64-apple-darwin/release/morr
    Finished release [optimized] target(s) in 0.04s
-rwxr-xr-x  1 jay  staff  162464 Apr 17 01:00 target/x86_64-apple-darwin/release/morr*

Now that it's down to 159 KiB, let's see what cargo-bloat has to say:

$ xargo bloat --release --target x86_64-apple-darwin -n 30 -w
    Finished release [optimized] target(s) in 0.06s
    Analyzing target/x86_64-apple-darwin/release/morr

 File  .text     Size                Crate Name
 5.3%  10.2%  12.2KiB                  std std::sys::unix::process::process_inner::<impl std::sys::unix::process::process_common::Command>::spawn
 3.2%   6.1%   7.3KiB                 morr morr::main
 2.8%   5.3%   6.4KiB            crossterm <crossterm::event::source::unix::UnixInternalEventSource as crossterm::event::source::EventSource>::try_read
 2.7%   5.1%   6.2KiB                  std std::sync::once::Once::call_once::{{closure}}
 1.7%   3.2%   3.9KiB            crossterm alloc::sync::Arc<T>::drop_slow
 1.7%   3.2%   3.9KiB            [Unknown] __mh_execute_header
 1.5%   2.8%   3.4KiB            crossterm crossterm::terminal::sys::unix::tput_value
 1.2%   2.4%   2.9KiB                 morr morr::draw
 1.1%   2.2%   2.6KiB                  mio mio::poll::Poll::poll
 1.1%   2.1%   2.5KiB                  std alloc::collections::btree::map::BTreeMap<K,V>::insert
 1.0%   1.8%   2.2KiB                  std ___rust_probestack
 0.7%   1.3%   1.5KiB          signal_hook alloc::collections::btree::map::BTreeMap<K,V>::insert
 0.7%   1.3%   1.5KiB                  std core::ptr::drop_in_place
 0.5%   1.0%   1.3KiB                 morr morr::line_reader::LineReader::read
 0.5%   1.0%   1.2KiB signal_hook_registry signal_hook_registry::GlobalData::load
 0.5%   0.9%   1.1KiB                  std <core::str::lossy::Utf8LossyChunksIter as core::iter::traits::iterator::Iterator>::next
 0.5%   0.9%   1.0KiB                  mio hashbrown::raw::RawTable<T>::reserve_rehash
 0.5%   0.9%   1.0KiB                  std std::sync::once::Once::call_inner
 0.5%   0.9%   1.0KiB                 morr morr::line_reader::LineReader::read_forw
 0.4%   0.8%   1.0KiB          signal_hook hashbrown::raw::RawTable<T>::reserve_rehash
 0.4%   0.8%     926B                  std <std::io::Write::write_fmt::Adaptor<T> as core::fmt::Write>::write_str
 0.4%   0.7%     922B            [Unknown] _main
 0.4%   0.7%     892B signal_hook_registry signal_hook_registry::handler
 0.4%   0.7%     879B     parking_lot_core parking_lot_core::word_lock::WordLock::lock_slow
 0.3%   0.6%     765B     parking_lot_core parking_lot_core::parking_lot::HashTable::new
 0.3%   0.6%     714B                  std <std::ffi::os_str::OsString as core::fmt::Debug>::fmt
 0.3%   0.6%     696B                  std core::str::slice_error_fail
 0.3%   0.5%     671B                  std std::panicking::rust_panic_with_hook
 0.3%   0.5%     664B                  std std::sys::unix::process::process_common::Stdio::to_child_stdio
 0.3%   0.5%     621B                  std std::sys::unix::fs::File::open_c
23.6%  45.3%  54.4KiB                      And 695 smaller methods. Use -n N to show more.
52.2% 100.0% 120.2KiB                      .text section size, the file size is 230.4KiB

There is still a lot of crossterm and std code in the bin. Let's see which crates actually take up the most space:

$ xargo bloat --release --target x86_64-apple-darwin -n 0 --message-format json | jq '[ .functions | group_by(.crate)[] | { crate: ([ .[].crate ] | unique)[], size: (map(.size) | add) } ] | sort_by(.size) | reverse'
    Finished release [optimized] target(s) in 0.04s
    Analyzing target/x86_64-apple-darwin/release/morr

[
  {
    "crate": "std",
    "size": 72384
  },
  {
    "crate": "crossterm",
    "size": 17664
  },
  {
    "crate": "morr",
    "size": 13375
  },
  {
    "crate": "mio",
    "size": 6478
  },
  {
    "crate": "signal_hook_registry",
    "size": 6112
  },
  {
    "crate": null,
    "size": 5089
  },
  {
    "crate": "signal_hook",
    "size": 3432
  },
  {
    "crate": "parking_lot_core",
    "size": 3172
  },
  {
    "crate": "std?",
    "size": 902
  },
  {
    "crate": "log",
    "size": 214
  },
  {
    "crate": "morr?",
    "size": 181
  },
  {
    "crate": "memchr",
    "size": 124
  },
  {
    "crate": "arc_swap",
    "size": 56
  },
  {
    "crate": "parking_lot",
    "size": 44
  },
  {
    "crate": "memmap",
    "size": 13
  }
]

Side note: Wow! jq is insanely powerful! I need to learn how to use it efficiently...

So there you go, a complete breakdown of which crates contribute to bin size! std accounts for 71 KiB, and crossterm 17 KiB. To dig in further, you will need to look at the call graph (e.g. find which crates are using so much of std).

For example, let's answer the following question: why is std::process::Command being used (the biggest function)? I used Hopper to disassemble the bin, but objdump or whatever could work, too. Turns out, crossterm::terminal::sys::unix::tput_value() is responsible, which is called by terminal::size(), which you call here: https://github.com/minnimumm/morr/blob/3d7abdad201c9644084ce072dc9bcf4b10438fca/src/main.rs#L161

In other words, you can easily shave off 12.2 KiB (or more) just by not calling that function! Or better yet, if you replaced crossterm with something like crosscurses you can actually get the terminal size without a sub-shell call! This is by no means a silver bullet, of course, ncurses has its own flavor of bloat.

14 Likes

Yes I run strip. 1.5 MB unstripped ~300k stripped.

Finally a helpfull answer! Thanks a lot for your breakdown, will try to apply suggestions. Definitely want to stay as much rust native as possible and don't want to use anything that is based on curses. So maybe will create pull request to crossterm.

Wow, that's an awesome breakdown! Thanks for that. That could be useful to know.

No problem. I have investigated these kinds of binary bloat problems before. Surprisingly, clap is the biggest offender I've seen so far... No joke. Check out any of the examples in the clap repo:

[
  {
    "crate": "clap",
    "size": 100637
  },
  {
    "crate": null,
    "size": 46160
  },
  {
    "crate": "std",
    "size": 27984
  },
  {
    "crate": "termcolor",
    "size": 6539
  },
  {
    "crate": "clap?",
    "size": 5928
  },
  {
    "crate": "_20_subcommands",
    "size": 5518
  }
]

98 KiB just for argument parsing... :woman_facepalming:

Part of the skill in making these kinds of optimizations is finding viable alternatives to straight up bloaty crates.

1 Like

what program you have used to generate this treemap

In fairness to clap, they did look at this problem and back in Jan 2018 achieved a 57% decrease in size. And there's this quote:

I would however kind of like to keep things in perspective. While I agree binary sizes should be as small as possible, clap does a lot especially when compared to minimal arg parsers like getopts , pirate , or argparse where you end up implementing all those features manually and those implementations won't get counted against smaller arg parsing libs. So comparing binary sizes of arg parsers isn't super fair because it's not a true apples to apples comparison, even though it sounds like it should be.

I've never done a detailed comparison myself, and of course if you care about your binary size it's completely reasonable to trade a "batteries included" crate for a more minimalist/low-level crate, but we probably shouldn't dismiss the former as "bloaty" unless we can point to some source of binary size that really is unnecessary for the functionality they provide. And they're still thinking about opt-in features to trade functionality for size.

4 Likes

That's a screenshot from https://www.npmjs.com/package/webpack-bundle-analyzer

In Rust this layout can be computed with with https://lib.rs/treemap

1 Like

Oh, cool that there is a crate for that already. Then you could use basic drawing tools with a crate like image and the font-kit library for rendering text on top of the squares in the tree map, and then build that into a cargo bloat subcommand.

It may be easier to print SVG.

1 Like

Good point. That would probably be more useful anyway as you could still render the SVG if you wanted a bitmap image.

Dead code elimination would likely help improve the bin sizes. I suspect a lot of existing decision-making code would have to be moved to build-time decisions (features provide a good start, there is a lot more that can be done here, like internal use of build-time macros).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.