Linking issues when designing a dynamic plugin-based architecture

I'm struggling to understand the subtleties of Rust library linking with respect to loading dynamic plugin libraries completely at runtime, particularly where I'd like the host library to expose some types that plugins can use.

If I wanted to do this in C/C++, I'd compile a host library to a .dll or .so, where the host library was responsible for loading plugin libraries at runtime. In order to create a plugin, I'd import just the host headers into my plugin library's project when I build it. Generally I wouldn't require the plugin library to know anything about the functions to call on the host - if it needed to do so, I'd pass a struct to the plugin containing callback functions that comprised an API, and this struct would be described in the host's public headers.

The plugin library would expose functions of a known signature (eg. to report the interface version it uses, and provide one or more entry points for the host to call), and the host would look up these symbols at runtime to make use of the plugin.

Using this model, the plugin library does not need to link against the host object code, either statically or dynamically. It just responds to calls that the host makes of it, potentially calling back into the host via the callbacks, and may make use of any types provided in the host headers.

I'm trying to work out how to implement something like this in Rust. Originally I tried specifying the host library as ["rlib", "dylib"] (so that other Rust code can use and call into it), and the plugin library as ["cdylib"]. Here the plugin exposes its functions as extern "C", and has a Cargo dependency on the host in order to access the public host types. The functions are extern "C" not specifically for C compatibility, but in order to conform to a stable ABI. They would only be expected to be called by Rust code.

Unfortunately, it seems this configuration causes the plugin library to link statically against the host. For example, when I look at the compiled plugin DLL in the Dependencies utility, any extern "C" functions that I've defined on the host end up being defined on the plugin library too. This is obviously not what I want.

My next step was to try dropping rlib from the host and just compiling with ["dylib"]. This removed the erroneous static linking from the plugin, but I got an error when trying to use the host (example from Windows):

The code execution cannot proceed because std-5740e47ddabbb05c.dll was not found.
Reinstalling the program may fix this problem. 

Presumably the host library now cannot find the Rust standard library, and has not linked it in statically. No std-*.dll is present in my output directory either. I can understand wanting to avoid linking different versions of the standard library into different targets, but here it shouldn't matter as far as I understand, since the host (and any launcher binary) are compiled at the same time, and the plugin library is completely external and should be able to statically link whatever it wants.

What would be the best way to produce this kind of plugin architecture in Rust? Essentially what I'm looking for is a way to produce an external cdylib that uses "just the headers" (in C/C++ terms) of the host library, without depending on any of its implementation. I understand that running cbindgen is one option here, but given I'm planning only to write plugin libraries in Rust and not actually in C or C++, generating headers just to parse them back again with bindgen seems quite redundant.

2 Likes

there's no equivalent to C "headers" in rust. if you want to implement the callback based approach in rust, the plugin definitely should NOT use the host library as a dependency. you can just declare the callback struct directly in the plugin. also, your plugin should export symbols with #[unsafe(no_mangle)] for it to be dynamically loaded.

to make sure the host and plugin are in-sync, you can create an "API" lib crate which contains the declaration of the plugin interface, and both the host and plugin then use this API crate as dependency. essentially, it's an analog of the shared "header" file in your C solution.

note, rust doesn't have a stable ABI, so your plugin API should use C ABI, i.e. #[repr(C)] for shared data types, and extern "C" fn for callback function pointers.

2 Likes

Right. So the API crate is a library crate that just includes types, and any code that does exist (eg. member functions on structs) is linked statically into any client of the API crate?

it could, but it'd better to have just the signature of the functions, while the actual implementation should be in the host crate, which mimics more closely your description of the callback based architecture.

here's a minimal example of this idea:

in the api crate:

// crates/plugin-api/src/lib.rs

/// host and plugin can use this constant to avoid api version mismatch
pub const API_VERISON: u32 = 1;

/// the entry point of a plugin
///
/// when the host loads the plugin, it calls this function with an opaque context
/// pointer and a struct containing the provided API functions.
///
/// returns `true` if the plugin initialized successfully.
pub type PluginInitFunction =
	unsafe extern "C" fn(ctx: *mut c_void, functions: *const HostFunctions) -> bool;

// other plugin exported function types can be declared

/// some example APIs the host provides to plugins
/// may be marked as `non_exhaustive`, especially for the plugin side
#[repr(C)]
pub struct HostFunctions {
	/// retrieve the version
	pub get_version: extern "C" fn(ctx: *mut c_void) -> u32,
	/// print a greeting message to the console
	pub log: unsafe extern "C" fn(ctx: *mut c_void, message: *const u8, message_len: usize),
	// ...
}

in one of the plugin crate:

// crates/first_plugin/src/lib.rs

/// entry point of the plugin
#[unsafe(no_mangle)]
pub extern "C" fn init_plugin(ctx: *mut c_void, functions: *const HostFunctions) -> bool {
	let functions = unsafe { &*functions };
	if (functions.get_version)(ctx) != plugin_api::API_VERISON {
		return false;
	}
	unsafe {
		let message = "successfully initialized my first plugin";
		(functions.log)(ctx, message.as_ptr(), message.len());
	}
	true
}

in the host program:

// crates/host/src/main.rs

fn main() {
	// prepare the parameters to call the plugin entry point
	let ctx = std::ptr::null_mut();
	let functions = HostFunctions {
		get_version: plugin_callback_get_version,
		log: plugin_callback_log,
	};

	// load the plugin and resolve the symbol
	let first_plugin = unsafe { libloading::Library::new("./plugins/libfirst_plugin.so").unwrap() };
	let init_function = unsafe {
		first_plugin
			.get::<plugin_api::PluginInitFunction>("init_plugin")
			.unwrap()
	};

	// call the init function
	let ok = unsafe { init_function(ctx, &functions) };
	if !ok {
		first_plugin.close().unwrap();
	}
}

// the implementation of the callback APIs

extern "C" fn plugin_callback_get_version(_ctx: *mut c_void) -> u32 {
	plugin_api::API_VERISON
}

/// SAFETY: message and message_len should point to a utf8 string
unsafe extern "C" fn plugin_callback_log(
	_ctx: *mut c_void,
	message: *const u8,
	message_len: usize,
) {
	let message =
		unsafe { str::from_utf8_unchecked(slice::from_raw_parts(message, message_len)) };
	println!("plugin log: {}", message);
}

the "raw" plugin api must go through the C ABI, making it rather hard to use. but you can also provide some safe wrapper/adapter code, which just converts the C ABI to the rust native ABI, allowing both the host and plugins to be written using idiomatic rust, similar to many "bindings" of ffi libraries.

below is an example using a rust trait for the api, which doesn't need unsafe unless the API is inherently unsafe. the host just provides an implementation of the trait, and the plugin can use a "proxy" object [1] to call the API, something like this:

the safe version of the raw API as a trait:

// crates/plugin-api/src/safe-api.rs

/// the safe version of the raw API
/// 
/// the host provides an implementation of this trait
pub trait HostApi {
	fn get_version(&mut self) -> u32 {
		API_VERISON
	}
	fn log(&mut self, message: &str);
}

helpers for the host side, can be an optional feature, e.g. `#[cfg(feature = "host")]:

// crates/plugin-api/src/safe-api/host.rs

extern "C" fn api_shim_get_version<Api: HostApi>(ctx: *mut c_void) -> u32 {
	let api = unsafe { &mut *(ctx as *mut Api) };
	api.get_version()
}

unsafe extern "C" fn api_shim_log<Api: HostApi>(
	ctx: *mut c_void,
	message: *const u8,
	message_len: usize,
) {
	let api = unsafe { &mut *(ctx as *mut Api) };
	let message =
		unsafe { str::from_utf8_unchecked(std::slice::from_raw_parts(message, message_len)) };
	api.log(message)
}

pub struct ApiProvider<Api>(PhantomData<Api>);

impl<Api: HostApi> ApiProvider<Api> {
	pub const FUNCTIONS: HostFunctions = HostFunctions {
		get_version: api_shim_get_version::<Api>,
		log: api_shim_log::<Api>,
	};
	pub fn load_plugin(api: &mut Api, plugin_path: &Path) -> Option<Plugin> {
		// use libloading to load the plugin and call `init_plugin`
		todo!()
	}
	//...
}

how the host can use the safe API:

// crates/host/src/main.rs

fn main() {
	let mut api = Api {
		plugin_name: "first_plugin".to_string(),
	};
	ApiProvider::load_plugin(&mut api, "plugins/libfirst_plugin.so").unwrap();
}

/// this struct is stateful
struct Api {
	plugin_name: String,
	//...
}

impl HostApi for Api {
	fn log(&mut self, message: &str) {
		println!("plugin [{}]: {}", &self.plugin_name, message)
	}
}

and here's the helpers for the plugin side API:

// crates/plugin-api/src/safe-api/plugin.rs

/// basically a trait object
pub struct DynApi<'b> {
	ctx: *mut c_void,
	functions: &'b HostFunctions,
}

impl<'b> DynApi<'b> {
	pub unsafe fn from_raw(ctx: *mut c_void, functions: *const HostFunctions) -> Self {
		DynApi {
			ctx,
			functions: unsafe { &*functions },
		}
	}
	// example of an "extra" api on top of the raw api
	pub fn check_version(&mut self) -> bool {
		self.get_version() == API_VERISON
	}
}

/// the trait implementation simply forwards to the function pointers
impl HostApi for DynApi<'_> {
	fn get_version(&mut self) -> u32 {
		(self.functions.get_version)(self.ctx)
	}
	fn log(&mut self, message: &str) {
		unsafe { (self.functions.log)(self.ctx, message.as_ptr(), message.len()) }
	}
}

how a plugin can use the safe API:

// crates/first_plugin/src/lib.rs

#[unsafe(no_mangle)]
pub extern "C" fn init_plugin(ctx: *mut c_void, functions: *const HostFunctions) -> bool {
	// only need this one `unsafe` block
	let api = unsafe { DynApi::from_raw(ctx, functions) };
	if !api.check_version() {
		return false;
	}
	api.log("successfully initialized my first plugin");
	true
}

  1. a custom type that emulates a rust trait object because representation of real trait objects are unstable ↩︎

3 Likes

oh, forgot to mention, most of the tedious boilerplate work, or similar idea, can be found in the thin_trait_object crate, so you don't have to manually juggle the low level ABI stuff, but the trade-off is you lose the fine control of, e.g. the layout of the vtable.

This is a lot of detail, thanks! I'll have to spend some time properly going through it to understand everything. One question I do have so far, though, is whether it's always necessary to pass pointers back and forth, as opposed to Rust references? So far, where I've used extern "C" functions, the compiler hasn't complained if I pass a reference to a repr(C) struct. Is there an advantage to using pointers instead?

no, it's not necessary to use raw pointers only.

rust references have the same representation of raw pointers of the same type. in many cases, when it comes to declaring ffi function parameters and return types, references and raw pointers of "regular" types are the same, except raw pointers can be NULL but rust references always point to valid values. you can even use Option<&SomeType> in ffi function signatures, which is guaranteed to have the same representation of a raw pointer *const SomeType.

when you use rust ferences in ffi function signature, you do need to understand the implication of lifetimes though.

see the ffi chapter of the nomicon book for details:

in some cases, you cannot replace raw pointers with rust references. these are mostly the cases where the pointee type is not accurate, e.g. untyped or opaque typed pointers (e.g. *mut c_void), or runtime type-erased pointers (e.g. some form of thin trait object proxy type).

note, dynamic sized types are NOT ffi safe: their pointers are "fat", and have unstable memory layout -- neither rust references nor raw pointers of DSTs can be used in ffi. this includes many common rust types, such as slices &[SomeType], the family of string slices &str, &OsStr, &CStr [1], and trait objects &dyn SomeTrait, etc.


  1. CStr explicitly does NOT guarantee its representation for the possibility of future changes ↩︎

To clear up one point of potential confusion here: if you're saying &str is not FFI-safe, I guess that's technically different from saying it's not usable in an extern "C" function call?

So far in what I've been writing, I've been basing my interface function signatures on what the linter says is acceptable based on the extern "C" attribute. It does not seem to complain if I use &str, whereas it does complain if I use, for example, &CStr. Given what you said above, I suspect I should in fact be considering "FFI-safe" types as a subset of extern "C"-safe types, and not just rely on the latter.

technically yes, an extern "C" function isn't necessarily always used for ffi, and can be consumed by rust code only.

but if you use non-ffi-safe types in an extern "C" function, you do get a built-in lint warning by default, nevertheless the compiler will accept the code, and I believe it will generate the correct machine code too.

what it really means that &str is not FFI-safe is that, &str is a "fat" pointer, which is a #[repr(Rust)] type and does not have a stable memory layout guarantee. however, this is NOT saying the representation is undeterministic. rather, when you have a parameter type &str or return type for an extern "C" function, rustc will simply use the "native" rust representation of the type, which is totally valid.

I'm not sure what's going on in your configuration, but it's not the default behavior that it doesn't complain about &str.

the improper_ctypes_definitions lint (warn by default) does fire for &str:

while technically ffi-safe types are not a hard requirement for extern "C" functions, practically, extern "C" is almost always used for ffi situations, because for pure rust code, it has no benefit to use extern "C" as opposed to extern "Rust".

I say ffi-safe very loosely here. sorry for the confusion.

Technically there is no such thing as being "FFI safe", every type can can be used across an FFI boundary, so long as the caller and callee agree about the size, layout and alignment of all used types.

In practice this type of agreement is essentially impossible for all "normal" rust types, including reference types and structs. This is because the compiler may freely re-arrange struct member order, padding bits, as an optimization. Even slightly different compiler flags could all of a sudden change your struct layouts. Only [repr(C)] types (and primitives like pointers) have a well defined representation (size, layout, alignment). If you are REALLY brave, you could pass rust references to C and successfully manipulate them with C code, but doing so in real programs is probably a bad idea.

The sunset of types for which it's "easy enough" to reason about their representation are called "FFI safe" types.

1 Like

I think that's usually what safe means. For comparison, "safe" vs "unsafe" code isn't about what's sound or unsound, only about what could be unsound. In the same way, FFI-safety isn't about whether or not you're guaranteed to get ABI mismatches, it's about whether you can.

After having got a bit further, I've got a question about passing functions between the plugin and the host. Let's say I have this setup for the sake of an example:

pub type CallbackFn = extern "C" fn(i32) -> i32;

// In the plugin, callable by the host:
pub extern "C" fn get_plugin_callback() -> CallbackFn
{
    return plugin_internal_callback;
}

extern "C" fn plugin_internal_callback(val: i32) -> i32
{
    // This can do whatever, the impl is not important
    return val + 123;
}

The callback that's returned is static in the sense that it's code built into the plugin library, but if the plugin library itself is unloaded, I guess the callback will no longer point to valid code.

Where I've been loading libraries using libloading, any symbol looked up in the library is either tied to the lifetime of the library, or is designated unsafe (so it's the programmer's job to make sure it is not used if the library is unloaded. However, if I call a function symbol and have the function return a callback from the plugin library, it appears that Rust will just let me pass this around as a fn, with neither a lifetime attached nor an unsafe designation. As far as I can see, this doesn't relate to the lifetime of the symbol wrapper I called.

I'm not planning on unloading any plugin libraries while I do this, but nevertheless I suspect I'm not quite understanding how callback functions are supposed to be transferred between plugin and host. Perhaps this would be solved by following @nerditation's method to the letter, but I'm interested in what I'm missing here.

you are right that the function pointer type cannot be checked by the borrow checker because function pointer types don't have lifetime markers in themselves (yet they are Copy).

the reason is that dynamic loading is a low-level platform specific feature and outside the rust abstract machine[1]. the rust ownership and borrow checker semantic model does not fit well for these kind of constructs, you need to use unsafe code to handle them (ffi, raw pointers, or even inline assemblies), and it's very hard, if possible at all, to create wrappers/abstractions in safe rust that are sound.

so you have several options to make this dynamically loaded callback API sound (hopefully):

  • make all the callback function pointers as unsafe so they cannot be called in safe code. however, this makes it (almost?) impossible to create a safe (or mostly-safe) API on top of the low level one.

  • mark the API that returns the callbacks as unsafe, shifting the responsibility to the user to make sure not to the returned callbacks after the module is unloaded. but, as is commonly said, the compiler doesn't check your comments. so even if you write comprehensive safety invariants in the API documentation, the API itself is still very hard to reason about its soundness.

  • instead of returning a regular function pointer, you can return some wrapper type with a lifetime marker, similar to the libloading::Symbol type. so the callback "function pointers" become callback "objects". with careful design, this approach is most likely to have a safe API (almost, you still need libloading).

    note, because the Fn* family traits are nightly only, libloading uses Deref to support the function call syntax. for your own API, you don't need to support the function call operator, though it's nice to have.


  1. other such examples include inter-process shared memory, or memory mapped files, or volatile io, etc ↩︎

Makes sense. I suppose what I want would be to have a function "object" with a lifetime that's tied to the original libloading::Library I used for the call.

From what I understand from my experiments so far, PhantomData<T> will make your struct act like it has a lifetime-bound reference to some data T. Would it be possible/appropriate to use it in this instance to act like the struct has a lifetime-bound reference to the libloading::Library that the function is implemented in? I'm thinking something like the pseudo-Rust below. (I've not got all the syntax down yet.)

// Safety concerns: it is the caller's responsibility
// to ensure that the lifetime of the function argument
// passed in lives at least as long as the owner
// library whose lifetime it will be bound to.
pub unsafe fn to_fn_object<'l, Params, Ret>(
  // Here, the owner argument is used in order
  // to assign the lifetime of the returned
  // function object, but nothing more.
  owner: &'l libloading::Library,
  // The syntax below is probably not quite
  // correct, but you get the intent...
  function: extern "C" fn<Params, Ret>
) -> PluginCallback<'l, Params, Ret> {
    // I can deref this object later in order
    // to call the wrapped function.
    return PluginCallback {
        wrapped_fn: function,
        // The PhantomData encodes the lifetime
        // information, but I'm not yet sure of the
        // syntax for how to bind to the owner's
        // lifetime here.
        phantom: PhantomData
    }
}

One subsequent point on PhantomData and #[repr(C)]/extern "C" too: I expect that as far as the plugin library is concerned, it can just pass functions back to the host as the default fn type, since in the context of the library the function does indeed have static lifetime. Once a callback is passed to the host, it's the host's responsibility to wrap it in an appropriate lifetime that is bound to the Library that was loaded. In this way, I don't have to use PhantomData in my host->plugin interface structs or functions, just in the host implementation. Is this understanding correct?

Function pointers aren't the only type that is unsafe to pass across the library boundary if there might be any library unloading, unfortunately - the general issue is that what is 'static from the loaded library's perspective is not necessarily 'static from the host executable's.

e.g. unloading the library can also potentially break:

  • Any dyn SomeTrait whose real type is defined in the library (unless it has the library's lifetime associated), because the pointer to the vtable will be invalid if the library has been unloaded.
  • Any &'static T created by the library, because it might refer to a static defined in the library (even just a string literal).
  • Any T that transitively contains any reference, even if T: 'static, because of the previous one.

Rust doesn't distinguish between:

  1. This type is 'static because it's totally self-contained and doesn't refer to anything else at all, like i32.
  2. This type is 'static because it only refers to owned heap allocations, like String.
  3. This type is 'static because it only refers to data defined in the compiled binary, like Box<dyn Foo> - not safe in the presence of library unloading.

You can't always tell just from the type, either - &'static str is safe if it's the result of calling String::leak, or if it's derived from a string literal in the host binary, but it's not safe if it's derived from a string literal in the unloaded library.

So, just wrapping things that deal with function pointers is not sufficient to avoid all problems here, and personally I wouldn't allow library unloading at all unless the interface with the library was extremely simple.

2 Likes

That's a good point. In my current implementation I'm not planning on unloading any plugin libraries (at least, not once I've verified that the API version a library conforms to is the one that the host supports), but it would be nice to make use of Rust's available features as much as I can.