Down we go!
The undefined symbols you're seeing are from the inappropriate inclusion of startfiles, specifically crt0.o
.
memset
is used to clear bss and would normally come from some combination of -lgcc
, -lgloss
, and -lc
, depending on the platform.
__libc_init_array
is normally exposed by the linker script.
Passing through -nostartfiles
would "fix" this, but it isn't actually necessary, because...
...the fundamental problem is the first line, which reads (in part):
fp16.0.o uses VFP register arguments, fp16 does not
This is a sign that different parts of the application have been compiled targeting different ARM processor variants. Specifically, rustc
is using the thumbv7em-none-eabihf
target, which specifies the Thumb v2 instruction set, with DSP extensions and floating point, microcontroller subset, hard float ABI.
But when gcc
is invoked to act as linker, no information about the triple is passed. ARM processors in particular have enough variants that it's critical to pass information about the processor to the linker:
- It uses it to fill in special ELF sections used to detect inappropriate mixing of architectures, and
- It controls GCC's
multilib
behavior, which will choose different versions of crt0.o
and libc
compiled for the appropriate instruction set variant.
Basically: the thumbv7em-none-eabi
target now built into rustc
is slightly wrong, in that it won't work correctly unless the application adds link-args. At minimum we need to select the right instruction set (!):
Then we must control multilib lookup for -lm
and friends:
-mcpu=cortex-m4
(or m3
or m7
or...)
-mfloat-abi=hard
-mfpu=fpv4-sp-d16
...because the toolchain has not one libc
, but many:
./lib/thumb/libc.a
./lib/armv7-ar/thumb/libc.a
./lib/armv7-ar/thumb/softfp/libc.a
./lib/armv7-ar/thumb/fpu/libc.a
./lib/armv7-m/libc.a
./lib/libc.a <-- currently getting this
./lib/armv6-m/libc.a
./lib/fpu/libc.a
./lib/armv7e-m/libc.a
./lib/armv7e-m/softfp/libc.a
./lib/armv7e-m/fpu/libc.a <-- we want this
Note that the information GCC wants to receive is modeled differently from the way LLVM represents it. In particular, the switches for specifying ARM architecture variants are less flexible, and are mostly controlled by -mcpu
. To make the rustc Cortex-M targets useful out-of-the-box without writing .cargo/config
in every project, we might need to separate them into per-CPU variants that contain the right GCC link flags. I'm happy to take a crack at this if nobody objects.
Anyway.
We can fix this problem (needing to specify the architecture and variant) using linker flags that are not position-sensitive. They conventionally go first, i.e.:
ld $(PRE_ARGS) $(OBJECTS)
But they can also technically go after; see below.
The code @japaric linked was also adding arguments to model universal dependencies on C libraries. This is not unusual; on ARM, you'll almost certainly need -lgcc
. The gcc
linker frontend normally stuffs it in for you, unless you use -nodefaultlibs
(which rustc
appears to be adding to the flags by default) or -nostdlib
.
You may need -lm
depending on how Rust compiles operations like sin
, or if you use any fp64
, since the variant in question only supports single-precision floating point.
That code is also using linker object groups. This is an emergency feature for handling cyclic dependencies between objects, which are normally processed in the linear order I described in my last post. It's most commonly deployed for handling the (somewhat necessary) interdependence of -lgcc
, -lc
, and -lm
(and in this case Newlib's -lnosys
).
For correctness, all of these -lfoo
flags, including the group, need to appear last in the list of objects, i.e. in this position:
ld $(PRE_ARGS) $(OBJECTS) $(DEPENDENCIES)
Now, as it turns out, rustc is adding flags provided by -C link-arg=foo
on the end of the linker command line. With GNU ld, I believe that almost all the relevant switches are insensitive to their precise location on the command line, so it should be legit to do this:
ld $(OBJECTS) $(PRE_ARGS) $(DEPENDENCIES)
The main exception is -U
. It is occasionally useful to add -U foo
to the linker command line, which explicitly adds an undefined symbol to the symbol table (at the point in argument processing where the flag appears). By putting this on the command line before the linked objects, you ensure that definitions of the named symbols are collected from objects (which otherwise isn't guaranteed to happen, particularly in the presence of --gc-sections
).
So that's the main case where using GNU ld really needs a "pre" link arg, in my experience. There are some other obscure cases.
I have mostly had to do this in cases where a library deliberately leaves an undefined 'hook' symbol to be filled in by the application, as in some of FreeRTOS's error handlers. I usually encounter this when I'm cramming pre-existing software into my build system; you can probably get pretty far while ignoring this use case.
Finally, there are cases where it is useful to emit flags alongside objects in the middle section of the linker command line. You've already seen object groups used in teensy3-rs-demo; --whole-archive
is another useful case that I discuss here.
Cobble models these two concepts separately using the link_flags
key (switches that go to the left of the objects on the command line) and the link_srcs
key (objects -- and flags! -- that are collected across the whole graph and topologically sorted). Any library target can emit both link_flags
and link_srcs
into the environment of any user, so they are accumulated into the top-level (typically executable) target.