Fixed overhead -- Rust bootloader and core::panicking


#1

I’m trying to use Rust for a bare-metal x86 program that must fit in a 14KiB space. (It’s part of a bootloader.) So far, I’ve been successful, and my program is about 10KB, but core::panicking is giving me trouble. rustc really wants to output a bunch of libcore formatting code (about 6KB) into my binary, and when it does, my program exceeds its budget.

(FWIW, I’m compiling with -C lto -C opt-level=2.)

Many Rust language and library features call a panic function, which eventually calls Formatter::pad or Formatter::pad_integral. The format argument is always "{}", so width and precision are None, and pad exits early on a fast path. The compiler isn’t smart enough, though, so it outputs the entirety of pad and pad_integral. pad needs to truncate and/or pad UTF-8 strings, and ultimately pulls in functions for UTF-8 encoding and decoding.

I’ve found a few ways to suppress Formatter::pad[_integral], but they’re some combination of ugly, unmaintainable, or incomplete. For example:

  • Use LTO with a panic_fmt function that ignores its core::fmt::Arguments parameter. Try to get rustc to delete the format_args!(...) code. I had been doing this two months ago, but then the parameter type changed from &core::fmt::Arguments to core::fmt::Arguments, and the trick stopped working. I’ve found a few ways to get it working again, e.g.:

    • Declare the panic_fmt lang_item with no parameters. I can also crash the compiler with such a function, though, so I don’t expect it to continue working.

    • Refactor panicking such that each of core::panicking::{panic, panic_bounds_check, panic_fmt} calls the panic_fmt lang_item directly. Declare the lang_item with #[inline(always)].

  • Avoid calling Formatter::pad[_integral] in libcore. e.g. use Arguments::new_v1(&[expr], &[]) in core::panicking::panic(). Or, wrap the message in a new type, PanicMsg, whose Display implementation calls Formatter::write_str instead of Formatter::pad.

  • Move the panic and panic_bounds_check lang_items out of libcore. This completely solves the problem for those two panic entry points, but what about panicking::panic_fmt? Maybe it could be an inline function that just calls the panic_fmt lang_item? I could only see this working if the two had exactly the same signature. Currently, the ABIs are different and one of the functions bundles the file/line info into a tuple.

Alternatively, maybe we can make Formatter::pad[_integral] smaller? For example, I’d guess we could eliminate the encode_utf8 calls:

  • Change core::fmt::rt::v1::FormatSpec::fill from a char to a [u8; 4] (There’s a Formatter::fill() function, though, and also, this changes the ABI. Are ABI changes still allowed?)
  • The call in Formatter::pad_integral seems unnecessary.

Adding some kind of -Os/-Oz flag to rustc would help, and it might be good enough for my particular use case. I hacked one in at one point, but it bit rot.

For tiny programs, I think there’d ideally be a way to suppress not just the format strings but also the filenames.

Questions:

  • Do any of the libcore changes above seem reasonable to anyone else?
  • Is there a better way to avoid this overhead?
  • Is a minimal-overhead libcore something the project cares about?

#2

I compiler option to suppress line number information in panic messages has been discussed. It might convert all compiler-generated panics to no or 1-argument lang item calls.


#3

Does -C lto -C opt-level=3 -C link-args=-s make a difference?


#4

-C lto makes a huge difference. I actually must build with either -C lto or the --gc-sections linker option. With neither option, I believe I’d include the entire libcore content with my binary, which is far too big. I can’t easily confirm that right now, because it doesn’t link – libcore needs floating-point routines like pow, which are unresolved references.

-s just strips the symbol table, which I already omit from my binary. (I use rustc to produce a staticlib, then gold to link, then objcopy to extract just the text/data sections.)

Each higher opt-level produces slightly larger code. (For these numbers, I am pulling in the panic formatting code. I’m also passing these rustc arguments: --target i686-unknown-linux-gnu -C target-cpu=i386 -C relocation-model=static -C llvm-args=-rotation-max-header-size=0)

-C lto -C opt-level=1 generates a 14548 byte binary.
-C lto -C opt-level=2 generates a 15181 byte binary.
-C lto -C opt-level=3 generates a 15213 byte binary.

I added an experimental -C size-opt-level=[012] flag to my local compiler. Using that flag (and assuming I rebuild libcore with the flag), I see these numbers:

-C lto -C size-opt-level=1 (aka -Os) generates a 14449 byte binary.
-C lto -C size-opt-level=2 (aka -Oz) generates a 13569 byte binary.

(The last one is just under my 14336 limit.)

I massively reduced code size by replacing &str with &&str throughout my code. &str is a tuple of pointer and length, whereas &&str is a word-sized reference to said tuple. Building with panic formatting hacked out and with -C lto -C opt-level=2, the change reduced the size from 9289 to 7673 bytes (a 17.4% reduction). I’m not sure what to make of this – I assumed &str was idiomatic; does the compiler/ABI need optimizing?


#5

&str is definitely idiomatic. It’s just a tradeoff. Having the best performance often means taking more space and Rust is usually optimized in that direction.

If it helps take up less space then passing &&str to the panic functions could make sense as its the slow path.


#6

When I looked at the assembly, it seemed like the compiler was doing an excessive amount of copying. Maybe it’s faster, though. I don’t know.

e.g. Suppose we have:

#[inline(never)] pub fn bar(x: &str) { println!("{}", x) }
#[inline(never)] pub fn foo(x: &str) { bar(x) }

foo receives the address of a &str value. To call bar, foo makes a new &str value on the stack, then passes the address of that copy to bar. Why not pass the original address along?

I suspect passing &Arguments instead of Arguments to the panic functions would also take less space. It actually used to be &Arguments, but it changed when format_args!(f, args) changed to f(format_args!(args)). I’m wondering why the type changed.


#7

Maybe the copying helps with memory locality?


#8

@brson Do you have a link for this discussion?

For my use case, the panics in libcore are at least as important as the compiler-generated ones. I’d also prefer to keep the file/line info and discard the format – presumably the 1-argument lang item call keeps the format? Discarding all the arguments would be acceptable, but suboptimal.

I feel like there should be some way to optimize an option-less format argument to use simpler (and faster) versions of Formatter::{pad, pad_integral}. Maybe it requires changing the formatter traits. I’ll experiment with it.


#9

No, I think they were offline.

That does sound suboptimal. Can you file an issue?


#10

Sure. I’ll get to it within a day or so.


#11

I filed https://github.com/rust-lang/rust/issues/22891. I also located https://github.com/rust-lang/rust/pull/20136, which removed the callback from format_args! and changed &Arguments to Arguments, which is causing some of my overhead (by copying large Arguments values around through the panicking functions).


#12

Hi all

Did it work? Can you boot into the rust binary from GRUB?

Regards Tuxle


#13

I was able to get the binary small enough. I think the panic system still creates too much overhead. (I’m accustomed to C and C++, which have “zero overhead.”) The code is online at https://github.com/rprichard/pcboot. The code isn’t booted by GRUB; in theory, it would do something like GRUB. It’s mostly unwritten, though, and I’m not actively working on it.


#14

As this thread was resurrected, it should be mentioned that between when @ryanprichard opened this thread and now, the ABI was modified with respect to “fat pointers” (which &str is). The code fragment above with the strings now just does a tail-call in foo to bar. Just in case others are concerned about it.


#15

thank you :smile: