Undefined symbol: __rust_probestack when compiling executable from LLVM IR

I would like to run souper on my Rust binary to see how much faster it would be. So I:

  1. Created a new binary crate with cargo +1.55.0 new (using 1.55.0 because that's the last Rust version that emits LLVM 12 IR, the only form of IR Souper supports)
  2. Added the following to Cargo.toml to make sure the generated LLVM IR does not need to be linked with any other Rust code:
codegen-units = 1
lto = true
opt-level = 3
  1. Compiled the Rust into LLVM IR with cargo +1.55.0 rustc --release -- --emit=llvm-ir
  2. At this point I would run Souper on the LLVM IR, but I'm keeping it simple for now.
  3. Compiled the LLVM IR with llc target/release/deps/{crate name}-{lots of hex digits}.ll -o object.o -filetype=obj.
  4. Attempted to link it with ld.lld -L/usr/lib -L/usr/lib/gcc/x86_64-unknown-linux-gnu/10.2 -lc -ldl -lm -lpthread -lgcc_s object.o

And then I ran into this error (GNU ld also produces the same error):

ld.lld: error: undefined symbol: __rust_probestack
>>> referenced by string.rs:583 (library/alloc/src/string.rs:583)
>>>               object.o:(std::backtrace_rs::symbolize::gimli::elf::Object::section::hccba3ee4d6fd747d)

It appears __rust_probestack is an function defined in the compiler-builtins crate. That crate is included in the standard library, so I don't see why it would be getting dropped as I build the binary. My theory is that --emit=llvm-ir doesn't support global_asm!, and since __rust_probestack is the only function defined using that macro it errors.

How can I fix or work around this error?

--emit=llvm-ir will only emit llvm ir for the current crate and not for it's dependencies. Try -Clto when compiling the executable. This should include all rust dependencies in the local llvm ir. (If you use cargo, you will need to enable LTO in the profile section of Cargo.toml to ensure all dependencies are LTO capable.) By the way you may want to emit llvm bitcode instead. It is smaller and more stable (as in stable between llvm versions. textual ir may not work with other llvm versions. bitcode should be loadable by all newer versions)

I already enabled lto = true in my Cargo.toml in step 2 - that should be enough, right?

Oh, right. No clue what the issie is then.

Okay, I think I finally solved it! I ran strace on rustc to find out what command it was using to produce the linked executable, then copied over all the relevant parts to my own command (also I switched to using LLVM bitcode, thanks bjorn3):

clang target/release/deps/{crate name}-{lots of hex digits}.bc \
    -Wl,--as-needed \
    -Wl,-Bstatic ~/.local/share/rustup/toolchains/1.55.0-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-dd7db1bec6909f24.rlib \
    -Wl,-Bdynamic -lgcc_s -lutil -lrt -lpthread -lm -ldl -lc \
    -Wl,--eh-frame-hdr -Wl,-znoexecstack -Wl,--gc-sections -Wl,-zrelro -Wl,-znow \
    -Wl,-O3 -nodefaultlibs -no-pie

Does seem like the missing piece was linking against that compiler_builtins rlib. The only hitch was that I had to replace the -pie option with -no-pie, because Clang threw this error:

/bin/x86_64-unknown-linux-gnu-ld: /tmp/{crate name}-{lots of hex digits}.o: relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIE
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)

Not sure what caused it, but at least it all seems to work now. And the nice thing about using clang now is that to enable Souper I simply have to substitute clang for sclang and everything else works the same.

If you're curious, the benchmark results of souper vs non-souper were... not that amazing. I ran an HTTP benchmark with rewrk twice and Souper showed a speed improvement of about 1-3%, but this could just be noise. I also benchmarked calling sort and sort_unstable on 1024 random u32s and generating some random u32s and u64s from rand::thread_rng(), but there was no performance difference at all. Bit of a shame but at least I know how to run Souper now :​)

1 Like

I believe souper has been used to find new peephole optimizations to add to LLVM, so that may explain why it didn't improve much.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.