Calling a WASM method that accepts a byte array with embedded wasmtime

I'm working on a sandbox environment for user defined functions in a data processing engine. It is a perfect use case for wasm, and even better it doesn't need interface types. I have a super simple need for the types on the wasm methods. All I need to do is be able to pass in a borrowed byte array into the method and it can return an owned byte array. This data processing engine expects to use a specific data serialization method inside WASM functions. I've been experimenting with building a simple method that takes a byte array using both wasm-buildgen and building a similar library using wasm32-unknown-unknown target. I'm hitting a surprising mismatch when I then try and run these methods in a smaller runner program I wrote using wasmtime. When using wasm-bindgen the signature just looks like:

#[wasm_bindgen]
pub fn udf(input_buffer: &[u8]) -> u32

When trying without wasm-bindgen it looks like

#[no_mangle]
pub unsafe extern "C" fn add(input: *mut u8) -> u32 {

In both of these situations when I decompile the wasm produced the function takes two i32s which is what I expect So now when I try and run this I am using this:

    let module = Module::from_file(store.engine(), "../adder/pkg/adder_bg.wasm").unwrap();
    let instance = Instance::new(&store, &module, &[]).unwrap();
    let add = instance.get_typed_func::<Option<ExternRef>, u32>("udf").unwrap();
    let ext_ref = ExternRef::new([0,1]);
    println!("0 + 1 = {}", add.call(Some(ext_ref)).unwrap());

Which is generating the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: type mismatch with parameters

Caused by:
    expected externref found i32', src/main.rs:23:72
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'm pretty convinced that the externref on my runner is wrong but I don't know what is correct. What type should I pass in here?

Something to note is that wasm_bindgen is designed to be used with their JavaScript shims and called from a JavaScript environment (e.g. Node.js or the browser). That way they control both sides of the function call and can change the signature however they want, letting them pass almost any type to/from WebAssembly even though it only supports a couple integer and float types. If you aren't running your code in the browser and using its JavaScript ships you'll need to implement a lot of wasm-bindgen's glue yourself.

For example when wasm-bindgen sees #[wasm_bindgen] pub fn udf(input_buffer: &[u8]) -> u32, the function signature may be rewritten to the more WebAssembly-friendly extern "C" fn udf(input_buffer_ptr: *const u8, len: u32) -> u32 (rustc compiles pointers and u32 as i32).

WebAssembly is sandboxed, so there's no way for the guest to directly read a reference/pointer from the host's address space. In general, to pass data from the host into the guest you'll need to ask it to allocate some memory and copy your data to that (unless the guest provides a buffer the host can write to).

That normally looks something like this:

let module = Module::from_file(store.engine(), "../adder/pkg/adder_bg.wasm").unwrap();
let instance = Instance::new(&store, &module, &[]).

// our input
let data = [0, 1];

// allocate some memory inside the WebAssembly address space
let malloc = instance.get_typed_func::<i32, i32>("malloc").unwrap();
let data_ptr = malloc.call(data.len()).unwrap();

// copy the data from the host's address space to the WebAssembly address space
let memory = instance.get_memory("memory").unwrap();
memory.write(data_ptr, &data).unwrap();

// invoke our `udf` function
// Note: this assumes your function was declared as
// #[no_mangle] pub extern "C" fn udf(input_buffer_ptr: *const u8, len: u32) -> u32
let add = instance.get_typed_func::<(i32, u32), u32>("udf").unwrap();
let ret = add.call(data_ptr, data.len() as i32);

// I'm not sure how you are passing the result back, but I assume it'd
// require memory.read(...)

// free our temporary buffer
let malloc = instance.get_typed_func::<i32, i32>("free").unwrap();
malloc.call(data_ptr).unwrap();

Often the WebAssembly runtime will provide some nice abstractions for working with pointers into WebAssembly memory (e.g. WasmPtr<T> in wasmer). However, you'll still need to do the allocate-copy-invoke-free dance when moving data from the host to the guest unless your wasm provides a buffer that the host can write into.

This is saying that wasmtime expected udf()'s first argument to be an externref (because that's the type you gave to get_typed_func()), but the argument's actual type was a i32 (i.e. a pointer into WebAssembly memory).

1 Like

Hmm. I think it is missing one more thing. I took your example code with just a few modifications to make the type system happy and now I'm getting

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: failed to find function export `malloc`', src/main.rs:39:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I tried building for wasi and linking in those modules by changing my module set up to:

    let store = Store::default();
    let mut linker = Linker::new(&store);
    let mut builder = WasiCtxBuilder::new();
    Wasi::new(linker.store(), builder.build().unwrap()).add_to_linker(&mut linker).unwrap();
    let module = Module::from_file(store.engine(), "../adder/target/wasm32-wasi/debug/adder.wasm").unwrap();
    let instance = Instance::new(&store, &module, &[]).unwrap();

But that still produces the same "failed to find function export malloc" error. This seems like it's on the right track but I need to make the resolution of malloc happy. Any help there would be massively appreciated

So while this will work to get things rolling ideally I'd like make this data available to the sandbox without a copy for performance reason. I've found this proposal and it looks like some of the work has landed in wasmtime, wasm-bindgen. Unfortunately for the life of me I can't get it to work. The deadend I finally hit is an unresolved import for __wbindgen_init_externref_table in my final wasm.

Your code needs to expose malloc() and free() functions for allocating memory inside the WebAssembly memory space using the default allocator (e.g. by deferring to alloc::alloc::alloc()).

For example:

#![no_std]
extern crate alloc;

use core::alloc::Layout;

pub unsafe extern "C" fn malloc(size: u32, alignment: u32) -> *mut u8 {
    let layout = Layout::from_size_align_unchecked(size as usize, alignment as usize);
    alloc::alloc::alloc(layout)
}

pub unsafe extern "C" fn free(ptr: *mut u8, size: u32, alignment: u32) {
    let layout = Layout::from_size_align_unchecked(size as usize, alignment as usize);
    alloc::alloc::dealloc(ptr, layout);
}

This is one of those "shims" I was referring to when I said that wasm-bindgen generates code for running in a JavaScript environment. It's provided by the JavaScript wasm-bindgen generates for loading your WebAssembly module.

Do you need to invoke the wasm-bindgen CLI manually so it can patch in those functions?

What about avoiding the copy by writing the data into WebAssembly memory from the outset?

Say you were reading a HTTP response, you could allocate a suitably sized buffer in WebAssembly memory and every time you read the response body you write it into that buffer. That way there aren't any unnecessary copies.

This is one of those "shims" I was referring to when I said that wasm-bindgen generates code for running in a JavaScript environment. It's provided by the JavaScript wasm-bindgen generates for loading your WebAssembly module.

Do you need to invoke the wasm-bindgen CLI manually so it can patch in those functions?

Ah! This might be part of my problem. When using the wasm-bindgen macro and not using the wasm-bindgen CLI I get 4 unresolved imports. Using the CLI it reduces it down to that one import. I'll look at the javascript produced, it might have a clue to that last import. It's possible that I'm invoking wasm-bindgen with the wrong arguments/target. Or maybe using wasm-bindgen is a bad idea if I'm not going to be using this in a JS env?

Here is an example of how I'm invoking it:

RUSTFLAGS="-C target-feature=+reference-types" cargo build --target wasm32-unknown-unknown; rm -rf bound; wasm-bindgen --no-typescript --reference-types --target no-modules target/wasm32-unknown-unknown/debug/adder.wasm --out-dir bound

I've never used the wasm-bindgen cli directly. I was using wasm-pack before but they haven't merged the change to add --reference-types yet

What about avoiding the copy by writing the data into WebAssembly memory from the outset?

Say you were reading a HTTP response, you could allocate a suitably sized buffer in WebAssembly memory and every time you read the response body you write it into that buffer. That way there aren't any unnecessary copies.

You read my mind. I was thinking about that. I don't know the internal architecture of the execution pipeline well but this could be possible. The role for wasm will be user defined functions so we will be feeding this data into the wasm engine the minority of the time. I'll talk to the maintainer of the project. He's got a better feel for that.

This gives me a lot to work with. And the fact that __wbindgen_init_externref_table is a shim for js environment shed a lot of light on things. Thank you so much for your help. After a little more research and experimentation I'll report back.

1 Like

So this was extremely useful to me and I eventually got things sorted out. I posted a minimal project to show what I did and how it works. For anyone finding this post later via google that can be found here: https://github.com/stusmall/wasm-udf-example

The key problem I had was ExternRef isn't what I hoped it was and isn't relevant here.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.