I'm exploring building Objective-C FFI bindings with a focus on producing machine code from Rust that matches Clang's output. But it seems the codegen requirements are not compatible with Rust's compilation model and I'm not aware of a language facility to bridge the gap.
I opened this Topic to discuss potential ways to support this scenario, or to determine that it should not be supported. If there's a Language or Compiler feature that's worth exploring here, I'll be happy to drive it forward.
Scenario
Objective-C is a strict superset of C. All implementations I'm aware of use the C ABI calling convention. So, crossing the language boundary is primarily a matter of passing Objective-C runtime data structures (which use the C ABI layout) to Objective-C runtime functions.
A method call in Objective-C (or message send in ObjC parlance) is dynamically resolved, ad hoc, at runtime as part of the method invocation. (Each and every method invocation performs dynamic, ad hoc resolution.) The method being invoked is identified by its selector, which is essentially a key into a class's method hash map where the value is a function pointer to the implementation.
In Apple's toolchain, the flow of a selector from a source file to a method identity at runtime is roughly as follows:
- Clang emits a local symbol into the object file for each selector used in the TU.
- The linker de-duplicates selectors by value (the symbol name is discarded) when linking objects into an image. The output image contains a definition for each selector used.
- The dynamic linker uniques selector values across images when loading an image into a process.
I'm not able to find a (good) way to emulate (1), which is a requirement for a few reasons:
- Apple's linker does not link selectors across images, which is a requirement for binary compatibility—there's no guarantee any image will have any particular selector definition across versions.
- Apple's linker uses only local symbols (i.e. not
.globl
or.private_extern
) to resolve selector symbols when linking an image.
Duct Tape and WD-40
I have found a less-than-ideal way to emulate (1):
- Set the codegen option
codegen-units = 1
, which effectively creates a single object file for the crate.- Unfortunately, as far as I can tell, there's not a way to set this on a per-crate basis. Downstream users of the crate must manually modify their build settings, affecting the entire build graph.
- Wrap
global_asm!()
with a macro to define the selector value and a reference to the selector. (Each image must define a value. The dynamic linker will fix up the references with the uniqued value.) The macro serves three purposes:- It's the only facility I am aware of that can guarantee a symbol is local to the object file into which it's emitted.
- It provides a way to guarantee consistent symbol names across crates (used in the next step).
- It prevents the Rust compiler from optimizing away the read of the selector value through the selector reference.
- Define an
all_selectors!()
macro in each crate that invokes the macro from (2) to define a value and reference for each selector it adds relative to its dependencies. That crate and all of its downstream dependencies instantiate that macro and all upstream macros to define symbols for all selectors used.- Because there is only one object file per crate and we use the
global_asm!()
macro to generate stable selector symbol names, when upstream uses of a selector are linked into a downstream call, a symbol by that name exists! - The macro system will become complex to maintain as the dependency graph grows, as there can only be one crate that defines any particular selector due to the stable symbol name.
- Instantiating the macro in the downstream crate is another manual, non-standard integration step.
- Because there is only one object file per crate and we use the
Sample Assembly Code
The following is the assembly code generated by Clang for a selector named init
targeting a 64-bit platform:
.section __TEXT,__objc_methname,cstring_literals
l_OBJC_METH_VAR_NAME_:
.asciz "init"
.section __DATA,__objc_selrefs,literal_pointers,no_dead_strip
.p2align 3
_OBJC_SELECTOR_REFERENCES_:
.quad l_OBJC_METH_VAR_NAME_
In Apple's Objective-C runtime, selector values are C-style strings but emitted into a specific section of the binary. The selector reference is simply a pointer to that string, though which string instance may change when the image is loaded into a process. The reference emitted into a specific section as well, enabling the dynamic linker to quickly perform the fixups at load time.
I got pretty close to emulating this in Rust, but wasn't able to find a way to make the symbol private to the object file. But, even if that were solved, I'd still need to replicate the symbol into each object file that uses the selector.
macro_rules! sel {
[$cmd:literal] => {
{
#[link_section = "__TEXT,__objc_methname,cstring_literals"]
static _SELECTOR_NAME: [u8; $cmd.len()] = *$cmd;
#[link_section = "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"]
static _SELECTOR: $crate::SEL = $crate::SEL {
_name: _SELECTOR_NAME.as_ptr(),
};
let ptr: *const *const u8 = &_SELECTOR._name;
unsafe { core::ptr::read_volatile(ptr) }
}
}
}
extern "C" {
fn objc_msgSend(receiver: *const std::ffi::c_void, cmd: *const u8) -> *const std::ffi::c_void
}
let initialized_object = unsafe { objc_msgSend(uninitialized_object, sel![b"init\0"]) };
Possible Solutions
I've identified some potential approaches that might create some (semi-)supported path for this scenario, roughly ordered by increasing amount of estimated effort to complete. I think (4) might be the most viable option and would appreciate your feedback.
- It could be this is too much of a niche case and it's not worth building some facility to accommodate this.
- Build support for crate-specific compiler flags (e.g. always use
codegen-units = 1
).- This isn't a particularly clean solution—flag resolution in the presence of downstream settings/overrides gets murky fast.
- It may not even solve the problem—if use of a selector leaks into a downstream crate (e.g. via inlining), it then requires it too use the
codegen-units = 1
workaround.
- Sidestep the issue by implementing work arounds in
lld
.- There's no obvious reason to me why the linker shouldn't be able to use a selector symbol defined in some object file if it has non-local visibility.
- It could also "import" selectors from upstream images so the linked image always has a definition.
- The first point doesn't seem unreasonable, but the second seems like a hack.
- This approach isn't ideal because it creates diverging conventions for Objective-C object code.
- Add new
#[link_visibility = local]
and#[no_elide]
attributes (spellings are just for the purposes of discussion!) that instruct the compiler to emit the symbol+value into every compilation unit, and not optimize read throughs of the symbol away, respectively. I like this because it captures the linking and functional requirements, doesn't appear overtly niche, and doesn't leak anything about Rust's compilation model (i.e. number of codegen units). But, I'm sure there's plenty of nuance that's not immediately obvious to me in my naĂŻve speculation! - Create C wrappers for all Objective-C code and use the C interface in Rust.
- While this would work, it adds indirection. This could be optimized away by LTO, but that requires the Objective-C compiler generate bitcode compatible with the Rust compiler. Given we're targeting Apple's platforms, and Apple's fork of LLVM can vary substantially from mainline, this may not be tenable.
Thanks for reading through this! I look forward to hearing any thoughts, ideas, questions, and/or suggestions you may have!