Static lib and LLVM intrinsics

I'm trying to run some rust code on qemu's RISCV virt machine. I started by creating a #![no_std] rust library with crate-type = ["staticlib"], and a basic assembly file that sets up a stack and jumps to an entry point in the rust lib. So far, so good. However, when I attempt to link the two I receive errors like

undefined reference to `memset'

I examined the rust artifact for occurrences of the name memset, and sure enough:

$ nm target/riscv64gc-unknown-linux-gnu/debug/librust_obj.a | rg memset
                 U memset  <- seems like the important one
0000000000000000 T _ZN17compiler_builtins3mem40__llvm_memset_element_unordered_atomic_117h88840ce51f6dc317E
0000000000000000 T
...more
0000000000000000 T __llvm_memset_element_unordered_atomic_4
...more

So it looks like LLVM is generating calls to an undefined symbol memset (if I'm interpreting the U correctly). On a whim, I created stub functions with all of the names mentioned in the linker errors, and linking proceeds without a hitch! The resulting executable runs as expected in qemu as well.

This got me reading about compiler intrinsics, but I still haven't made full sense of this situation. In particular:

  • Why does LLVM generate calls to these functions/symbols in the first place? Is there some advantage to leaving these up to the user/lib to define?
  • Why does rustc include these calls in the compiled lib....a, even when I haven't used any core functions that rely on them (and am building with --profile release)?
  • What's the best way to proceed?

I can see that my stub functions are being called if I include certain code. For instance, if I compare two slices, then my stub memcmp is called. Should I just write a version of memcmp (and the others) and link it?

Thank you!

It's very common for LLVM to emit a memcpy() when moving large structs around instead of writing the instructions itself, and it'll use memset() for initializing things like arrays.

Those two operations are pretty common, and almost certainly used by something from core that your code uses, so that's probably where the references are coming in.

In general, Rust and LLVM assume the existence of a functioning C environment/toolchain.

Yes, several.

  1. It's much easier and shorter for LLVM to emit a memcpy() than to emit a long-winded and highly repetitive set of literal copy-in-loop instructions.
  2. Since copying around memory is one of the most frequent operations, e.g. every move operation in Rust can potentially invoke it (after all, we have to heat our dwellings somehow!), calling memcpy() and memset() result in substantial reduction of code size (both at the bitcode and the machine code level). This is basically the inverse of inlining.
  3. Smaller code size and complexity in turn helps decrease compilation/optimization time and decreases instruction cache pressure.
  4. The memcpy() function provided by the C standard library can be significantly more optimized than a naïve, byte-wise copy. It usually considers at least platform word alignment (so it can copy 4-8 bytes upon every iteration instead of 1), sometimes it does even more clever things.

No, you should link against you platform's libc.

Thank you both! That makes a lot of sense.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.