Static lib and LLVM intrinsics

wjlewis · November 18, 2022, 11:40am

I'm trying to run some rust code on qemu's RISCV virt machine. I started by creating a #![no_std] rust library with crate-type = ["staticlib"], and a basic assembly file that sets up a stack and jumps to an entry point in the rust lib. So far, so good. However, when I attempt to link the two I receive errors like

undefined reference to `memset'

I examined the rust artifact for occurrences of the name memset, and sure enough:

$ nm target/riscv64gc-unknown-linux-gnu/debug/librust_obj.a | rg memset
                 U memset  <- seems like the important one
0000000000000000 T _ZN17compiler_builtins3mem40__llvm_memset_element_unordered_atomic_117h88840ce51f6dc317E
0000000000000000 T
...more
0000000000000000 T __llvm_memset_element_unordered_atomic_4
...more

So it looks like LLVM is generating calls to an undefined symbol memset (if I'm interpreting the U correctly). On a whim, I created stub functions with all of the names mentioned in the linker errors, and linking proceeds without a hitch! The resulting executable runs as expected in qemu as well.

This got me reading about compiler intrinsics, but I still haven't made full sense of this situation. In particular:

Why does LLVM generate calls to these functions/symbols in the first place? Is there some advantage to leaving these up to the user/lib to define?
Why does rustc include these calls in the compiled lib....a, even when I haven't used any core functions that rely on them (and am building with --profile release)?
What's the best way to proceed?

I can see that my stub functions are being called if I include certain code. For instance, if I compare two slices, then my stub memcmp is called. Should I just write a version of memcmp (and the others) and link it?

Thank you!

Michael-F-Bryan · November 18, 2022, 12:25pm

It's very common for LLVM to emit a memcpy() when moving large structs around instead of writing the instructions itself, and it'll use memset() for initializing things like arrays.

Those two operations are pretty common, and almost certainly used by something from core that your code uses, so that's probably where the references are coming in.

H2CO3 · November 18, 2022, 12:25pm

In general, Rust and LLVM assume the existence of a functioning C environment/toolchain.

Yes, several.

It's much easier and shorter for LLVM to emit a memcpy() than to emit a long-winded and highly repetitive set of literal copy-in-loop instructions.
Since copying around memory is one of the most frequent operations, e.g. every move operation in Rust can potentially invoke it (after all, we have to heat our dwellings somehow!), calling memcpy() and memset() result in substantial reduction of code size (both at the bitcode and the machine code level). This is basically the inverse of inlining.
Smaller code size and complexity in turn helps decrease compilation/optimization time and decreases instruction cache pressure.
The memcpy() function provided by the C standard library can be significantly more optimized than a naïve, byte-wise copy. It usually considers at least platform word alignment (so it can copy 4-8 bytes upon every iteration instead of 1), sometimes it does even more clever things.

No, you should link against you platform's libc.

wjlewis · November 18, 2022, 12:48pm

Thank you both! That makes a lot of sense.

system · February 16, 2023, 12:48pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
#![no_builtins] does not remove memcpy from pure Rust code help	9	439	September 14, 2025
Linker error 'undefined reference to memcpy' when not linking libc help	4	2809	January 23, 2023
How to make rustc link to my version of memcpy, memset, etc help	4	389	July 6, 2024
Reliably working around rust emitting `memset` when putting a slice on the stack help	6	1025	October 15, 2023
How to prevent bad code genration while using intrinsics in rust help	6	816	September 14, 2023

Static lib and LLVM intrinsics

Related topics