I'm trying to use Rust for a bare-metal x86 program that must fit in a 14KiB space. (It's part of a bootloader.) So far, I've been successful, and my program is about 10KB, but core::panicking is giving me trouble. rustc really wants to output a bunch of libcore formatting code (about 6KB) into my binary, and when it does, my program exceeds its budget.
(FWIW, I'm compiling with -C lto -C opt-level=2.)
Many Rust language and library features call a panic function, which eventually calls Formatter::pad or Formatter::pad_integral. The format argument is always "{}", so width and precision are None, and pad exits early on a fast path. The compiler isn't smart enough, though, so it outputs the entirety of pad and pad_integral. pad needs to truncate and/or pad UTF-8 strings, and ultimately pulls in functions for UTF-8 encoding and decoding.
I've found a few ways to suppress Formatter::pad[_integral], but they're some combination of ugly, unmaintainable, or incomplete. For example:
Use LTO with a panic_fmt function that ignores its core::fmt::Arguments parameter. Try to get rustc to delete the format_args!(...) code. I had been doing this two months ago, but then the parameter type changed from &core::fmt::Arguments to core::fmt::Arguments, and the trick stopped working. I've found a few ways to get it working again, e.g.:
Declare the panic_fmt lang_item with no parameters. I can also crash the compiler with such a function, though, so I don't expect it to continue working.
Refactor panicking such that each of core::panicking::{panic, panic_bounds_check, panic_fmt} calls the panic_fmt lang_item directly. Declare the lang_item with #[inline(always)].
Avoid calling Formatter::pad[_integral] in libcore. e.g. use Arguments::new_v1(&[expr], &[]) in core::panicking::panic(). Or, wrap the message in a new type, PanicMsg, whose Display implementation calls Formatter::write_str instead of Formatter::pad.
Move the panic and panic_bounds_check lang_items out of libcore. This completely solves the problem for those two panic entry points, but what about panicking::panic_fmt? Maybe it could be an inline function that just calls the panic_fmt lang_item? I could only see this working if the two had exactly the same signature. Currently, the ABIs are different and one of the functions bundles the file/line info into a tuple.
Alternatively, maybe we can make Formatter::pad[_integral] smaller? For example, I'd guess we could eliminate the encode_utf8 calls:
Change core::fmt::rt::v1::FormatSpec::fill from a char to a [u8; 4] (There's a Formatter::fill() function, though, and also, this changes the ABI. Are ABI changes still allowed?)
The call in Formatter::pad_integral seems unnecessary.
Adding some kind of -Os/-Oz flag to rustc would help, and it might be good enough for my particular use case. I hacked one in at one point, but it bit rot.
For tiny programs, I think there'd ideally be a way to suppress not just the format strings but also the filenames.
Questions:
Do any of the libcore changes above seem reasonable to anyone else?
Is there a better way to avoid this overhead?
Is a minimal-overhead libcore something the project cares about?
I compiler option to suppress line number information in panic messages has been discussed. It might convert all compiler-generated panics to no or 1-argument lang item calls.
-C lto makes a huge difference. I actually must build with either -C lto or the --gc-sections linker option. With neither option, I believe I'd include the entire libcore content with my binary, which is far too big. I can't easily confirm that right now, because it doesn't link -- libcore needs floating-point routines like pow, which are unresolved references.
-s just strips the symbol table, which I already omit from my binary. (I use rustc to produce a staticlib, then gold to link, then objcopy to extract just the text/data sections.)
Each higher opt-level produces slightly larger code. (For these numbers, I am pulling in the panic formatting code. I'm also passing these rustc arguments: --target i686-unknown-linux-gnu -C target-cpu=i386 -C relocation-model=static -C llvm-args=-rotation-max-header-size=0)
-C lto -C opt-level=1 generates a 14548 byte binary.
-C lto -C opt-level=2 generates a 15181 byte binary.
-C lto -C opt-level=3 generates a 15213 byte binary.
I added an experimental -C size-opt-level=[012] flag to my local compiler. Using that flag (and assuming I rebuild libcore with the flag), I see these numbers:
I massively reduced code size by replacing &str with &&str throughout my code. &str is a tuple of pointer and length, whereas &&str is a word-sized reference to said tuple. Building with panic formatting hacked out and with -C lto -C opt-level=2, the change reduced the size from 9289 to 7673 bytes (a 17.4% reduction). I'm not sure what to make of this -- I assumed &str was idiomatic; does the compiler/ABI need optimizing?
&str is definitely idiomatic. It's just a tradeoff. Having the best performance often means taking more space and Rust is usually optimized in that direction.
If it helps take up less space then passing &&str to the panic functions could make sense as its the slow path.
foo receives the address of a &str value. To call bar, foo makes a new &str value on the stack, then passes the address of that copy to bar. Why not pass the original address along?
I suspect passing &Arguments instead of Arguments to the panic functions would also take less space. It actually used to be &Arguments, but it changed when format_args!(f, args) changed to f(format_args!(args)). I'm wondering why the type changed.
For my use case, the panics in libcore are at least as important as the compiler-generated ones. I'd also prefer to keep the file/line info and discard the format -- presumably the 1-argument lang item call keeps the format? Discarding all the arguments would be acceptable, but suboptimal.
I feel like there should be some way to optimize an option-less format argument to use simpler (and faster) versions of Formatter::{pad, pad_integral}. Maybe it requires changing the formatter traits. I'll experiment with it.
I was able to get the binary small enough. I think the panic system still creates too much overhead. (I'm accustomed to C and C++, which have "zero overhead.") The code is online at GitHub - rprichard/pcboot: BIOS-based boot menu and loader. The code isn't booted by GRUB; in theory, it would do something like GRUB. It's mostly unwritten, though, and I'm not actively working on it.
As this thread was resurrected, it should be mentioned that between when @ryanprichard opened this thread and now, the ABI was modified with respect to "fat pointers" (which &str is). The code fragment above with the strings now just does a tail-call in foo to bar. Just in case others are concerned about it.