How do I get a `staticlib` that holds only the "magic symbols"?

I have a project that ultimately uses a build system other than cargo and would like to include cargo-built artifacts like any other dependency. I have been researching best practices and found this GitHub thread and this pre-RFC among other sources. The general pattern other native build systems (like Bazel and Buck) are using is to link in the rlib artifacts from cargo, as they include only the object files for the immediate crate. This leaves external dependency management and final linking up to the non-Rust ("native") build system.

One of the pieces I still need to create is a library with the set of global Rust shims in it (symbols like __rust_alloc, __rust_start_panic, etc). In the pre-RFC they recommend building a "no-op" crate to get these symbols:

Note (non-normative): At the time of writing, the -C emit-std-bundle=yes flag can simply be a no-op, as the Rust compiler can successfully create such staticlibs already by compiling an empty crate. The purpose of the flag is to ensure that this behavior is preserved in the future in an opt-in fashion.

I tried this and sure enough, the symbols I'm looking for are in there. However, there are also a ton of other symbols that I do not want (e.g., some mangled Rust symbols from std and core, pthread APIs, memcpy, etc.)

What would a crate whose artifact contains just the global Rust shims in a staticlib look like?

Of course, make sure you're using --release to ensure optimizations (e.g. inlining) are applied. After optimizations, in theory all that should remain in the staticlib at that point is necessary (i.e. reachable from the symbols you do care about).

I followed the pre-RFC steps to produce this staticlib:

$ cargo new --lib stdrust
$ cd stdrust
$ echo "[lib]" >>Cargo.toml
$ echo "crate-type = ['staticlib']" >>Cargo.toml
$ RUSTFLAGS="-C emit-std-bundle=yes" cargo build --release
$ ls -l target/release/libstdrust.a
-rw-------  1 pcwalton  staff  17031504 Jan 19 19:37 target/release/libstdrust.a

... with the exception of the emit-std-bundle argument, which would be a no-op at this point anyhow. I can confirm I passed the --release flag to cargo.

Inside my target/release folder, my libstdrust.a is 14.3MB, and contains hundreds of additional symbols I would not expect to be in there (e.g., mangled std and core functions). Here's a screenshot of what I am seeing inside one of the object files I extracted from the library:

Clearly I'm either missing something from the build steps, or what I thought needed to be globally accessible to all Rust code is not accurate. Any help here would be much appreciated.

Looking at the pre-RFC example a bit more, the release build there is 16.2MB. Presumably it includes a lot more than I would have expected.

What I'm interested in are specifically the "magic symbols", of which I understand there to be very few.

I did a quick grep of the current sources, and here's a sampling of the kinds of symbols I am talking about:

__rust_abort
__rust_alloc
__rust_alloc_error_handler
__rust_alloc_error_handler_should_panic
__rust_alloc_zeroed
__rust_anonymous_pipe1__
__rust_begin_short_backtrace
__rust_c_alloc
__rust_c_dealloc
__rust_dealloc
__rust_drop_panic
__rust_end_short_backtrace
__rust_foreign_exception
__rust_maybe_catch_panic
__rust_no_alloc_shim_is_unstable
__rust_panic_cleanup
__rust_print_err
__rust_realloc
__rust_rwlock_rdlock
__rust_rwlock_unlock
__rust_rwlock_wrlock
__rust_start_panic
__rust_try
__rustc_panic_cleaner
__rustc_panic_cleanup

My assumption is that this is (nearly) the total set of global Rust "magic symbols". These would never be found in an rlib, and would be a small amount of code compared to the rest of a Rust artifact. Everything else is name mangled or otherwise is not shared between Rust components. Is that an accurate statement?

The mangled symbols are almost certainly not shared with other compiled objects. There are some cases where generic monomorphization​s are shared between codegen units, but I think it's always a case of using upstream's instantiation, and usually the generics are just duplicated in every codegen unit that utilizes the generic functionality.

You'll probably get a bit smaller of a staticlib with panic=abort, but the main code size contributor is probably the formating and IO machinery. These are necessarily utilized by the leaf-crate-emitted shims.

Out of mild curiosity, does that show the full list of symbols or just the exported symbols? I thought we had made rustc better at not just exporting every and all symbol from staticlib bundles. Maybe that was only for a target other than macOS.

You could also try throwing the output through strip to see if that impacts anything.

2 Likes

Nope, we still export all symbols from a staticlib. There isn't really a way to avoid that other than partial linking unfortunately.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.