Fixed overhead -- Rust bootloader and core::panicking

I'm trying to use Rust for a bare-metal x86 program that must fit in a 14KiB space. (It's part of a bootloader.) So far, I've been successful, and my program is about 10KB, but core::panicking is giving me trouble. rustc really wants to output a bunch of libcore formatting code (about 6KB) into my binary, and when it does, my program exceeds its budget.

(FWIW, I'm compiling with -C lto -C opt-level=2.)

Many Rust language and library features call a panic function, which eventually calls Formatter::pad or Formatter::pad_integral. The format argument is always "{}", so width and precision are None, and pad exits early on a fast path. The compiler isn't smart enough, though, so it outputs the entirety of pad and pad_integral. pad needs to truncate and/or pad UTF-8 strings, and ultimately pulls in functions for UTF-8 encoding and decoding.

I've found a few ways to suppress Formatter::pad[_integral], but they're some combination of ugly, unmaintainable, or incomplete. For example:

  • Use LTO with a panic_fmt function that ignores its core::fmt::Arguments parameter. Try to get rustc to delete the format_args!(...) code. I had been doing this two months ago, but then the parameter type changed from &core::fmt::Arguments to core::fmt::Arguments, and the trick stopped working. I've found a few ways to get it working again, e.g.:

    • Declare the panic_fmt lang_item with no parameters. I can also crash the compiler with such a function, though, so I don't expect it to continue working.

    • Refactor panicking such that each of core::panicking::{panic, panic_bounds_check, panic_fmt} calls the panic_fmt lang_item directly. Declare the lang_item with #[inline(always)].

  • Avoid calling Formatter::pad[_integral] in libcore. e.g. use Arguments::new_v1(&[expr], &[]) in core::panicking::panic(). Or, wrap the message in a new type, PanicMsg, whose Display implementation calls Formatter::write_str instead of Formatter::pad.

  • Move the panic and panic_bounds_check lang_items out of libcore. This completely solves the problem for those two panic entry points, but what about panicking::panic_fmt? Maybe it could be an inline function that just calls the panic_fmt lang_item? I could only see this working if the two had exactly the same signature. Currently, the ABIs are different and one of the functions bundles the file/line info into a tuple.

Alternatively, maybe we can make Formatter::pad[_integral] smaller? For example, I'd guess we could eliminate the encode_utf8 calls:

  • Change core::fmt::rt::v1::FormatSpec::fill from a char to a [u8; 4] (There's a Formatter::fill() function, though, and also, this changes the ABI. Are ABI changes still allowed?)
  • The call in Formatter::pad_integral seems unnecessary.

Adding some kind of -Os/-Oz flag to rustc would help, and it might be good enough for my particular use case. I hacked one in at one point, but it bit rot.

For tiny programs, I think there'd ideally be a way to suppress not just the format strings but also the filenames.

Questions:

  • Do any of the libcore changes above seem reasonable to anyone else?
  • Is there a better way to avoid this overhead?
  • Is a minimal-overhead libcore something the project cares about?
1 Like

I compiler option to suppress line number information in panic messages has been discussed. It might convert all compiler-generated panics to no or 1-argument lang item calls.

Does -C lto -C opt-level=3 -C link-args=-s make a difference?

-C lto makes a huge difference. I actually must build with either -C lto or the --gc-sections linker option. With neither option, I believe I'd include the entire libcore content with my binary, which is far too big. I can't easily confirm that right now, because it doesn't link -- libcore needs floating-point routines like pow, which are unresolved references.

-s just strips the symbol table, which I already omit from my binary. (I use rustc to produce a staticlib, then gold to link, then objcopy to extract just the text/data sections.)

Each higher opt-level produces slightly larger code. (For these numbers, I am pulling in the panic formatting code. I'm also passing these rustc arguments: --target i686-unknown-linux-gnu -C target-cpu=i386 -C relocation-model=static -C llvm-args=-rotation-max-header-size=0)

-C lto -C opt-level=1 generates a 14548 byte binary.
-C lto -C opt-level=2 generates a 15181 byte binary.
-C lto -C opt-level=3 generates a 15213 byte binary.

I added an experimental -C size-opt-level=[012] flag to my local compiler. Using that flag (and assuming I rebuild libcore with the flag), I see these numbers:

-C lto -C size-opt-level=1 (aka -Os) generates a 14449 byte binary.
-C lto -C size-opt-level=2 (aka -Oz) generates a 13569 byte binary.

(The last one is just under my 14336 limit.)

I massively reduced code size by replacing &str with &&str throughout my code. &str is a tuple of pointer and length, whereas &&str is a word-sized reference to said tuple. Building with panic formatting hacked out and with -C lto -C opt-level=2, the change reduced the size from 9289 to 7673 bytes (a 17.4% reduction). I'm not sure what to make of this -- I assumed &str was idiomatic; does the compiler/ABI need optimizing?

1 Like

&str is definitely idiomatic. It's just a tradeoff. Having the best performance often means taking more space and Rust is usually optimized in that direction.

If it helps take up less space then passing &&str to the panic functions could make sense as its the slow path.

When I looked at the assembly, it seemed like the compiler was doing an excessive amount of copying. Maybe it's faster, though. I don't know.

e.g. Suppose we have:

#[inline(never)] pub fn bar(x: &str) { println!("{}", x) }
#[inline(never)] pub fn foo(x: &str) { bar(x) }

foo receives the address of a &str value. To call bar, foo makes a new &str value on the stack, then passes the address of that copy to bar. Why not pass the original address along?

I suspect passing &Arguments instead of Arguments to the panic functions would also take less space. It actually used to be &Arguments, but it changed when format_args!(f, args) changed to f(format_args!(args)). I'm wondering why the type changed.

Maybe the copying helps with memory locality?

@brson Do you have a link for this discussion?

For my use case, the panics in libcore are at least as important as the compiler-generated ones. I'd also prefer to keep the file/line info and discard the format -- presumably the 1-argument lang item call keeps the format? Discarding all the arguments would be acceptable, but suboptimal.

I feel like there should be some way to optimize an option-less format argument to use simpler (and faster) versions of Formatter::{pad, pad_integral}. Maybe it requires changing the formatter traits. I'll experiment with it.

No, I think they were offline.

That does sound suboptimal. Can you file an issue?

Sure. I'll get to it within a day or so.

I filed Seemingly inefficient code generated to forward a parameter to a function · Issue #22891 · rust-lang/rust · GitHub. I also located Remove the need for a callback to be passed to format_args! by eddyb · Pull Request #20136 · rust-lang/rust · GitHub, which removed the callback from format_args! and changed &Arguments to Arguments, which is causing some of my overhead (by copying large Arguments values around through the panicking functions).

Hi all

Did it work? Can you boot into the rust binary from GRUB?

Regards Tuxle

I was able to get the binary small enough. I think the panic system still creates too much overhead. (I'm accustomed to C and C++, which have "zero overhead.") The code is online at GitHub - rprichard/pcboot: BIOS-based boot menu and loader. The code isn't booted by GRUB; in theory, it would do something like GRUB. It's mostly unwritten, though, and I'm not actively working on it.

As this thread was resurrected, it should be mentioned that between when @ryanprichard opened this thread and now, the ABI was modified with respect to "fat pointers" (which &str is). The code fragment above with the strings now just does a tail-call in foo to bar. Just in case others are concerned about it.

thank you :smile: