How to implement a function declared by C in Wasmtime?

Hi, I'm trying to implement a function which is declared in test.c file like below in Wasmtime.

// test.c
extern void my_extern_fun();
int main(){
    my_extern_fun();
    return 0;
}

I'll use clang, llc, wasm-ld to compile test.c to test.wasm. And I hope I can make it work by wasmtime test.wasm instead of the errors like below.

Error: failed to run main module `./test.wasm`

Caused by:
    0: failed to instantiate "./test.wasm"
    1: unknown import: `env::my_extern_fun` has not been defined

Of course, I have to modify the source code of Wasmtime. But what troubles me is that how can I make it.

I have tried a lot of ways like adding a interface in the .witx file and implementing it, creating a crate which implement the function my_extern_fun and registering it in Store, adding the label extern "C" and #[no_mangle]. But none of them worked.

So, I'm wondering if anybody can offer any help.

You shouldn't be making any modifications to Wasmtime for this. Your error message is saying that you didn't provide the import for env::my_extern_fun() when you were instantiating the WebAssembly module.

This can be done via by passing a list of Externs (functions, static variables, etc.) to Instance::new(), but that's fairly low level, and using the wasmtime::Linker can save you from a bunch of hassle.

To show how it's done, here's the example from Linker::func_wrap() where they give the guest access to host::double(), host::log_i32(), and host::log_str().

let mut linker = Linker::new(&engine);
linker.func_wrap("host", "double", |x: i32| x * 2)?;
linker.func_wrap("host", "log_i32", |x: i32| println!("{}", x))?;
linker.func_wrap("host", "log_str", |caller: Caller<'_, ()>, ptr: i32, len: i32| {
    // ...
})?;

let wat = r#"
    (module
        (import "host" "double" (func (param i32) (result i32)))
        (import "host" "log_i32" (func (param i32)))
        (import "host" "log_str" (func (param i32 i32)))
    )
"#;
let module = Module::new(&engine, wat)?;

// instantiate in multiple different stores
for _ in 0..10 {
    let mut store = Store::new(&engine, ());
    linker.instantiate(&mut store, &module)?;
}
3 Likes

Thanks! This really helps me. :smile:

To make it clearer to other developers, I'll show my code details below.

I add it here: wasmtime/run.rs at main · bytecodealliance/wasmtime · GitHub

// run.rs
// ...
let mut linker = Linker::new(&engine);
linker.allow_unknown_exports(self.allow_unknown_exports);

linker.func_wrap("env", "my_extern_fun", | | println!("call [my_extern_fun]"))?;

populate_with_wasi(
    &mut store,
// ...

And it works!

You might also want to look into the wit-bindgen project for automating this sort of thing.

For Rust hosts, they've got a proc-macro which generates the glue code for passing higher-level objects into/out of the WebAssembly module, an host functions are provided by implementing a trait.

Thanks! :smile: I'll check it later, it can be helpful.

But I got another problem here. :persevere:

If I want to call a much more complex function like the my_extern_func below in linker.func_wrap(). What should I do?

use libc::{c_ulong, c_uint, c_long};
struct Example {
    // ...
}
extern "C" {
    fn my_extern_func(e: *mut Example, ptr: *mut c_ulong, cptr: *mut c_uint) -> c_long ;
}
// ...

I can't use the same way above, because the param type should be simple, like i32, u32 .etc. Because it should fit the wasm code.

// this won't work
linker.func_wrap("env", "my_extern_fun", my_extern_func)?;

So how could I solve it?

This is where things get a lot more complex and you start to see why tools like wit-bindgen were created.

First, let's assume our guest code (written in C) looks something like this:

typedef struct {
  char first_name[16];
  uint16_t age;
} Example;

// The host function we want to use
int64_t my_extern_func(Example *e, int64_t *ptr, uint64_t *cptr);

This is perfectly valid for the guest to do. Note that I've chosen integer types with explicit sizes - this is important because WebAssembly is typically compiled as 32-bit, while your host is typically a 64-bit machine, and we don't want to mess things up because some integer type decided to be a different size on one architecture versus the other.

Now the interesting thing to note is that when the guest passes the host a "pointer", it's really just passing you a u32 index into its linear memory. That means the implementation of our host function looks something like this:

linker.func_wrap("env", "my_extern_fun", my_extern_func)?;

fn my_extern_func(caller: Caller<'_>, e: u32, ptr: u32, cptr: u32) -> i64 {
  ...
}

(the Caller parameter is what we use to access the internals of a WebAssembly module)

Now, the tricky part is to turn that e "pointer" back into something that looks like an Example struct. We do this by asking the caller to give us the Memory object exported by this WebAssembly module (typically called "memory", I think) that represents the module's linear memory, then asking for mutable/immutable access to the bytes.

Once we've got access to the linear memory as a range of bytes, we can do some unsafe pointer arithmetic and casting to get the *const Example back.

#[repr(C)] // Safety: it's our responsiblity make sure this 
           // matches the C struct's layout *exactly*.
struct Example {
  first_name: [u8; 16],
  age: u16,
}

fn my_extern_func(caller: Caller<'_>, e: u32, ptr: u32, cptr: u32) -> i64 {
  // First, get a reference to the memory object
  let memory = caller.get_export("memory")
    .unwrap()
    .into_memory()
    .unwrap(); // TODO: proper error handling

  // Next, we need to access the linear memory as a bunch of bytes
  let linear_memory: &[u8] = memory.data(&caller);

  // Safety: This is safe because
  // - We've manually verified that the layout for our two Example structs
  //   match, and Example doesn't contain any internal pointers which we 
  //   might accidentally interpret as pointers on the host instead of offsets 
  //   into the guest's linear memory
  // - The `my_extern_func` function doesn't call back into WebAssembly,
  //   (which could accidentally lead to it mutating/destroying the values
  //   while the host has a reference to them, or growing linear memory
  //   which could leave us with dangling pointers)
  // - The guest promises that the pointers are within linear memory
  // - The start of linear memory is guaranteed to be aligned, and the
  //   Example object is guaranteed (by the compiler/guest's allocator) to
  //   be aligned correctly with respect to the start of linear memory
  unsafe {
    let e: *const Example = linear_memory.as_ptr().add(e as usize).cast();
    // If these are meant to be slices, then use std::slice::from_raw_parts()
    let ptr: *const i64 = linear_memory.as_ptr().add(ptr as usize).cast();
    let cptr: *const u64 = linear_memory.as_ptr().add(ptr as usize).cast();

    // magic goes here
    ...
  }
}

(I haven't actually run this code or checked whether it compiles, but it should work)

You could also parse the byte slice you get from &linear_memory[e as usize..e as usize + 18] to get a [u8; 16] followed by a little-endian u16 like you would when implementing the parser for a binary format, but both pointer arithmetic and parsing intimately rely on the layout of Example so they're pretty much equivalent[1]. I'm okay with taking the unsafe route because I wrote both sides of the code, they both have the same failure mode, and an unsafe cast requires 1 line whereas safely parsing would require 5-15 lines with no real benefit.

Before I switched to wit-bindgen I was maintaining about 4kloc of this sort of binding code at work (although using Wasmer, not Wasmtime), so I can attest to how monotonous and error-prone it is.


  1. Although, a bug in parsing just gives you garbage/index-out-of-bounds panics, whereas bad pointer arithmetic could result in UB if you mess up (e.g. you turn it into a &Example that gets mutated by the guest, you are actually pointing to a Foo, the pointer isn't a pointer to anything but actually just a random number the guest plucked out thin air, etc.). ↩︎

2 Likes

I get it now! That makes sense. Thanks a lot! :smile:

But I got a little confused here. :persevere:

If there are some internal pointers in the Example like below. So does it mean that we have to confirm the offsets it may contain? But what if we can't?

// c code
typedef struct {
  uint8_t* buf;
  void* pointer;
  uint8_t** buf2;
} Example;
// =================== //
// rust code
struct Example {
  buf: *mut c_uchar,
  pointer: *mut c_void,
  buf2: *mut *mut c_uchar
}

Because my goal is to take an inner function out of a C project and want to make it run faster via implementing it in WebAssembly Runtime. As the C runtime can be hard to know and it means maybe I can't get the offsets of the inner pointer or other param pointers.

So in this case, is there any good solution? Or I need to take some time learning about wit-bindgen.

If the guest defined Example to look something like this:

typedef struct {
  uint8_t* buf;
  void* pointer;
  uint8_t** buf2;
} Example;

Then you would probably do something like this on the host side:

#[repr(C)]
pub struct Example {
  buf: u32,
  pointer: u32,
  buf2: u32,
}

pub fn get_example(linear_memory: &[u8], ptr: u32) -> &Example {
  unsafe { &*linear_memory.as_ptr().add(ptr as usize).cast() }
}

// Note: we can't expose fields like `buf` directly because they're just 
// offsets, so we use getters

impl Example {
  pub fn buf(&self, linear_memory: &[u8]) -> &[u8] {
    unsafe { 
      let ptr = linear_memory.as_ptr().add(self.buf as usize);
      let len = 42; // I'm assuming this is a hard-coded constant
      std::slice::from_raw_parts(ptr, len)
  }

  ...
}

The buf2 example is a bit trickier because you require &mut access to linear memory and the *mut *mut c_uchar is generally the equivalent of Vec<Vec<u8>>, except we don't know the lengths of anything... So I'm just going to guess some random numbers and assume you know how big the buffers are.

impl Example {
  ...

  pub fn buf2(&self, linear_memory: &mut [u8]) -> Vec<&mut [u8]> {
    // Safety: This has all the normal safety requirements around the pointer being 
    // within bounds, aligned, and pointing at a valid object.
    // 
    // We also assume that no elements in buf2 overlap with any other element
    // (or buf2 itself) because that would lead to aliased mutation.
    unsafe {
      let buf2_ptr = linear_memory.as_mut_ptr().add(self.buf2 as usize).cast();
      let len = 42;
      let buf2: &mut [u32] = std::slice::from_raw_parts_mut(buf2_ptr, len);
 
      // Note that buf2 just contains offsets into the guest's linear memory,
      // so we need to do more pointer math to get a reference to the buffers
      // they point to
      let mut elements = Vec::new();

      for offset in buf2 {
        let ptr = linear_memory.as_mut_ptr().add(offset as usize);
        let len = 123;
        let element = std::slice::from_raw_parts_mut(ptr, len);
        elements.push(element);
      }
    }

    elements
  }
}

In general, the pattern is that whenever your guest introduces some level of indirection (i.e. a pointer), you'll need to load that as a u32 offset into linear memory and do pointer arithmetic to get the item it points at.

You also need to be pretty careful with what you're doing, because it is easy to have a bad time by missing a level of indirection, having a bad struct definition, or using & when you should have been using &mut. If you are mutating a buffer in the guest's memory, you'll want to write your code in such a way that only one thing has &mut access to linear memory at any one time[1].

Pointer casting like this lets the host have zero-copy communication with the guest, but it's also easy to shoot yourself in the foot.


  1. Technically this isn't required if you pass linear memory around as a *mut u8 pointer to its start and are "careful" to never create overlapping &mut references into the buffer, but I think I've written enough unsafe code for one post. ↩︎

1 Like

This is missing #[repr(C)] before it.

1 Like

This really helps me!

So the main solution here is to make a way confirming the size of the pointer and allocate it in linear_memory.

Thanks again! :muscle:

And I think it's really necessary to find a way learning wit-bindgen. :tired_face:

Sorry to bother again.

I got another problem here. If I want to implement a function whose return type is long, how can I achieve that.

// C function
long test() {
  return 2 >> 35;
}

And in Rust, it will be like this.

extern "C" {
  fn test() -> i64;
}

And to implement it in Wasmtime, it'll be like this.

linker.func_wrap("env", "test", || -> i64 {
    unsafe {
        test()
    }
})?;

But the problem here is that my target arch is wasm32, so the return type in xxx.wasm will be changed to i32, and the error will be like below.

Error: failed to run main module `main.wasm`

Caused by:
    0: failed to instantiate "main.wasm"
    1: incompatible import type for `env::test`
    2: function types incompatible: expected func of type `() -> (i32)`, found func of type `() -> (i64)`

So how could I fix this?

You can either use int64_t on the wasm side or i32 on the rust side.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.