Enum deserialization issue bincode

I'm using serde and bincode to serialize and deserialize one structs (input) and one enum (output) to be passed to a webassembly module. Sometimes, the deserialization of the enum ffails with this error:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: invalid value: integer `1233992`, expected variant index 0 <= i < 2', wa_run/src/threadpool.rs:129:44
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: invalid value: integer `1233992`, expected variant index 0 <= i < 2', wa_run/src/threadpool.rs:129:44
thread 'threadpool::tests::it_pools' panicked at 'called `Result::unwrap()` on an `Err` value: Any { .. }', wa_run/src/threadpool.rs:101:31

The enum hold a serde_json::Value, so serialization and deserialization takes place in two steps, since that won't serialize directly using bincode::serialize. I'm using an intermediary enum for that, and implement TryFrom back and forth between the two. Without going into too much detail, it looks like this:

#[derive(Debug, Serialize, Deserialize, PartialEq)]
enum ResultTemp {
    Value(Vec<u8>),
    Error(Vec<u8>),
}

#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
pub enum ResultDto {
    Value(JsonValue),
    Error(String),
}

impl crate::Decode<ResultDto> for ResultDto {
    fn decode(bytes: Vec<u8>) -> anyhow::Result<ResultDto> {
        let temp = bincode::deserialize::<ResultTemp>(&bytes[..])?;
        let dto = <ResultTemp as TryInto<ResultDto>>::try_into(temp)?;
        Ok(dto)
    }
}

impl crate::Encode for ResultDto {
    fn encode(self) -> anyhow::Result<Vec<u8>> {
        let temp = <ResultDto as TryInto<ResultTemp>>::try_into(self)?;
        let bytes = bincode::serialize::<ResultTemp>(&temp)?;
        Ok(bytes)
    }
}

To my untrained eyes it looks like it's misinterpreting a some number as the enum value, but since the enum doesnt have any 1233992 the deserialization fails. What I really can't wrap my head around is why some values seem to serialize and deserialize fine but other values fail.

Very grateful for any pointers or ideas!

I'm not familiar with bincode, but from the error message:

it seems like either you passed wrong data, or the data is corrupted in some way. aprparently, bincode encodes the variant kind of an enum using integer values as tag, as opposed to json, which encode the variant of an enum using a "string" key as tag (internally, externally, adjacently tagged).

I'd suggest you log the "to be deserialzed" data (a.k.a. wire format) and compare it to the bytes of the original serialized output, and find out why they are different

Hmm, looking at the bytes it's indeed something strange going on :thinking:

Returned:
[54, fa, 11, 0, 54, fa, 11, 0, 0, 0, 0, 0, 7b, 22, 61, 72, 67, 22, 3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 31, 7d, 7d]

Expected:
[0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 7b, 22, 61, 72, 67, 22, 3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 31, 7d, 7d]

It's the leading 7 bytes that differ:
54, fa, 11, 0, 54, fa, 11
0, 0, 0, 0, 13, 0, 0

...the rest seems identical between failing and working examples:
0, 0, 0, 0, 0, 7b, 22, 61, 72, 67, 22, 3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 31, 7d, 7d
0, 0, 0, 0, 0, 7b, 22, 61, 72, 67, 22, 3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 31, 7d, 7d

My best guess is that the unchanged part of the array are the values serialized first, into the temporary enum in the code example above, and that the leading bytes denote the enum index.

Kind of tricky to debug this :thinking:

Reading a little about repr(C) it seems that enum with values are not really FFI safe. When passing values back and forth to WebAssembly I use extern C since its effectivly defining a FFI, even though it's rust on both sides. ALso i've noted that the values I pass in do not seem to fail - and they don't contain any enums.

here's the reference:

Could this be a likely cause of this annoying issue..? Are enum with values a no no in code that compiles to wasm? ..Atleast when that value needs to be passed through the FFI boundary?

You haven't shown any code relevant to FFI, but since you are doing FFI you have some unsafe somewhere. Those might be points of interest. "Here be dragons."

No i haven't posted that code since I thought the issue was with the serialization. I did mention i was passing the values to WASM though, which requires a FFI :slight_smile: anyway - tests just started passing after I replaced the ResultDto and ResultTemp with structs insteads of enums:

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[repr(C)]
struct ResultTemp {
    value: Vec<u8>,
    error: bool,
}

#[derive(Debug, Serialize, Deserialize, PartialEq, Clone)]
#[repr(C)]
pub struct ResultDto {
    value: JsonValue,
    error: bool,
}

I've only ran basic tests so far - hoping it did the trick.

Thinking I may get rid of the intermediary struct, or even try to just get the bytes of the ResultDto and directly copy them to the WASM memory - but that's another story I guess :wink:

To be fair, the first mention of WASM or FFI was the post just prior to my reply. No indication of unsafe otherwise until that point. This is the only distinction I am making. Now that the existence of unsafe is apparent, that changes the original question completely.

1 Like

no it wasn't - it was in the first sentence of the first post

You're right, my mistake. That's what I get for looking for "WASM" and "FFI" specifically. The point stands that unsafe has a lot of gotchas and "the data looks corrupted" is only one possible outcome of misusing it.

I didn't really phrase the question in terms of wasm or FFI though - since I didn't really understood the error in the first place. It was pretty easy to overlook, since the question was mostly regarding bincode and serialization.

Yep, your point it indeed stands :slight_smile: lesson learned I guess! Next time I'll hopefully fail at something new :wink:

Can you put together a failing example without WASM or other FFI? You should be able to put together a reproduction example that works in the Rust Playground, since bincode simply serialises to a Vec<u8>, and deserializes from a &[u8].

My suspicion is that it's not the serialization/deserialization at fault, but rather whatever is supposed to take the bytes and pass them around, and that by permuting the data structure layout, you're simply hiding the bug.

The derived Serialize and Deserialize implementations handle the repr stuff correctly, by not relying on it, but instead using their own numbering system for the enum variants for things like bincode, where the first variant in code order is numbered 0, the second is numbered 1 and so on. As a result, I don't think your change fixes anything - it just moves where in the struct gets corrupted to something that the assertions aren't checking.

2 Likes

I dont think I've seen any errors when serializing and deserializing the same values in rust without wasm, all errors have occured when passing values (enums or structs containing enums) from wasm to rust (and not the other way around). If i run into any errors that are reproducible without wasm i'll make sure to post an example playground, but so far I havent.

One thing I feel usure of when usung bincodes serialization is how it enumerates the structs. My DTO structs are defined in a separate crate that I use both in rust and in wasm. Is there any chance the enumeration gets screwed somehow since its compiled to two different binaries that later communicate through FFI...? The fact that it did not always fail speaks against that I think - then it should have failed regardless?

Though Im new to working with FFIs and WASM it does seem pretty clear that enums with values aren't really supported in the FFI context (see the link above) - but then again it feels like it should work since the structs serialize to bytes before they're passed in either direction - the enums were never passed directly to a foreign function, they were serialized to bytes, written to linear memory and then read back up on the other side.

Do you think it could have been an issue where some other process writes to the same memory...? Im using wasmtime as runtime and their Store and Memory structs to communicate between rust and wasm. The only unsafe code I have are some calls to host functions from inside wasm to pass u32 for pointers and lengths of the data to read and write.

Another thought that struck me is that I compile WASMfor wasm32-wasi but i run the rust code on my 64bit system, and that would create some confusion around the ints passed around. I dont think I pass any usize but if for example bincode does that then maybe that could be a source...?

I havent seen any error since i changed the enums to structs though. Not sure where the corrupted bytes might be hiding since i only pass values in once and then out once, then linear memory is dropped by wasmtime. Memory is then reinitilized on subsequent calls.

Any tips on how to proceed analyzing this would be greatly appriciated - I feel a bit lost as to how to check if I just managed to hide the bug as you say?

The enumeration is done by Serde, not bincode; the derive macro for Serialize numbers the variants in the order it finds them in the code. If, however, you have a mismatch between the two versions - the one in native Rust, the other in wasm, all bets are off here, because bincode is not self-describing - it needs the definitions to be unchanged.

I thus suspect that the most likely bug is that you're passing round the wrong set of bytes; I would be inclined, since the data is a JSON value anyway, to remove bincode completely, and just use serde_json to serialize and deserialize. JSON is self-describing, and if you're passing round corrupted data, it'll get you a better error message than passing a chunk of bincode will.

This code is almost certainly the issue - you're reading the wrong bit of wasmtime's structs (e.g. because it's already reused the data), and thus getting into trouble. It's also possible that you're doing something wrong accessing the data inside wasmtime from your Rust code.

This is fine. bincode's serialization format for integers uses 64 bits for usize regardless of machine size - in a 32 bit machine like wasm32, that'll just have 32 zero bits.

Remove bincode - it's not getting you anything, but it's making the debugging harder, since you're wrapping a self-describing format (JSON) inside a non-self-describing format. Pass UTF-8 encoded JSON around, using serde_json::from_slice to decode, and serde_json::to_vec to encode. Look for the corruption here.

My guess is that you're not spotting the bug, because it's sensitive to something like timing, or output size, and by perturbing the format, you've changed the situation. Using JSON instead of bincode wrapped JSON will make this more obvious.

1 Like

Thank you, thats great!
I'm going to give that a shot and see what I can find out. It'll hqve to be later though, the hour is late here in Sweden :slightly_smiling_face:

Stay tuned for more errors!

My guess is that you're not spotting the bug, because it's sensitive to something like timing, or output size, and by perturbing the format, you've changed the situation. Using JSON instead of bincode wrapped JSON will make this more obvious.

I think you were right about this. I removed bincode and the intermediary ResultTemp struct, and just serialize to / from json and use from_slice and to_vec as you advised. Now I get the error again, and again only for some cases and again I can't reproduce it in pure rust - it only happens when I pass values back from WASM.

// to_vec() called on this value
Value(Object {"value": Array [Number(1), Number(2), Number(3)]})

// Expected value
// This is what I get when running tests in pure rust on the my DTO-crate and encoding / decoding works just fine
[7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 5b, 31, 2c, 32, 2c, 33, 5d, 7d, 2c, 22, 65, 72, 72, 6f, 72, 22, 3a, 66, 61, 6c, 73, 65, 7d]

// These are the bytes returned from WASM
[74, f8, 11, 0, 74, f8, 11, 0,   3a, 7b, 22, 76, 61, 6c, 75, 65, 22, 3a, 5b, 31, 2c, 32, 2c, 33, 5d, 7d, 2c, 22, 65, 72, 72, 6f, 72, 22, 3a, 66, 61, 6c, 73, 65, 7d]

It now errors with this message:

thread 'runner::tests::it_runs' panicked at 'called `Result::unwrap()` on an `Err` value: expected ident at line 1 column 2', wa_run/src/runner.rs:151:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

These are the relevant parts of the code that compiles to WASM

// Host functions defined in "extern C" block
extern "C" {
    fn argp() -> u32;
    fn argl() -> u32;
    fn resp(ptr: u32);
    fn resl(len: u32);
}

#[no_mangle]
pub extern "C" fn run() {
    let bytes = unsafe {
        let arg_ptr = argp() as *mut u8;
        let arg_len = argl();
        Vec::from_raw_parts(arg_ptr, arg_len as usize, arg_len as usize)
    };
    let dto = CallDto::decode(bytes).unwrap();

    let res = capture_error(dto, <CallDto as Call>::call)
        .encode()
        .unwrap();

    let res_ptr = res.as_ptr() as u32;
    let res_len = res.len() as u32;

    unsafe {
        resl(res_len);
        resp(res_ptr);
    };
}

And in rust using wasmtime it looks like below:


pub struct Runner {
    engine: Engine,
    module: Module,
    linker: Linker<State>,
}

impl Runner {
    pub fn new(file: &str) -> anyhow::Result<Self> {
        let engine = Engine::default();
        let module = Module::from_file(&engine, file)?;
        let mut linker = Linker::new(&engine);
        wasmtime_wasi::add_to_linker(&mut linker, |state: &mut State| &mut state.wasi)?;

        let mut runner = Self {
            engine,
            module,
            linker,
        };

        let argp = "argp";
        runner
            .linker
            .func_wrap("env", argp, |mut caller: Caller<'_, State>| {
                caller.data_mut().input.ptr
            })?;

        let argl = "argl";
        runner
            .linker
            .func_wrap("env", argl, |mut caller: Caller<'_, State>| {
                caller.data_mut().input.len
            })?;

        let resp = "resp";
        runner
            .linker
            .func_wrap("env", resp, |mut caller: Caller<'_, State>, ptr: u32| {
                caller.data_mut().output.ptr = ptr;
            })?;

        let resl = "resl";
        runner
            .linker
            .func_wrap("env", resl, |mut caller: Caller<'_, State>, len: u32| {
                caller.data_mut().output.len = len;
            })?;

        Ok(runner)
    }

    pub fn call(&self, dto: CallDto) -> anyhow::Result<ResultDto> {
        let wasi = WasiCtxBuilder::new().build();
        let data = State {
            input: Mem { ptr: 0, len: 0 },
            output: Mem { ptr: 0, len: 0 },
            wasi,
        };

        let mut store = Store::new(&self.engine, data);
        let instance = self.linker.instantiate(&mut store, &self.module)?;
        let run = instance.get_typed_func::<(), ()>(&mut store, "run")?;
        let memory = instance.get_memory(&mut store, "memory");

        if let Some(m) = memory {
            let ptr = 0;
            let bytes = dto.encode()?;
            let length = bytes.len() as u32;
            store.data_mut().input.ptr = ptr;
            store.data_mut().input.len = length;
            m.write(&mut store, ptr as usize, &bytes)?;
        }

        run.call(&mut store, ())?;

        let ptr = store.data_mut().output.ptr as usize;
        let len = store.data_mut().output.len as usize;
        let mut buf = vec![0; len];
        if let Some(m) = memory {
            m.read(&mut store, ptr, &mut buf)?;
        }

        drop(store);
        let result = ResultDto::decode(buf.to_vec())?;
        Ok(result)
    }
}

For reference, this is their documentation on using linear memory from rust.
https://docs.wasmtime.dev/examples-rust-memory.html

As you can see, the functions resl() and resp() are defined in rust and called form the wasm module to set pointer and length value on the State struct within the Wasmtime Store. Values are then read to rust afterwards, and not within the scope of the argp().

An idea could be that since reading the bytes happens after argp() scope has ended, then the memory I try to read from is not really owned by the current scope, and other processes may perhaps have written to it?

On the other hand, the WebAssembly linear memory is allocated by the Wasmtime function get_memory(). To my understanding that piece of memory is still within scope and shouldn't be dropped until I call drop(store). There are not other calls within the Wasm module after argp(), so I don't think there are any processes that should be able to write to that memory before I read it?

You have a use-after-free bug, and all bets are off.

When run returns, one of the things that happens is the drop code for res is run. This does whatever's appropriate to mark the memory owned by res as reusable by later code. This could well result in something else being able to overwrite part of the data you're about to read.

The fix is to change resp to do the copy, while run is still in progress, so that when res's drop code is run, the data has already been copied out of the runtime.

1 Like

I added some code to resp() to read and decode the bytes, and write it to the store State. I then get the values from the store before calling drop(), and now the tests seem to pass again. I thought copying the bytes to buf before dropping would be enough, but it seems copying has to be done within the scope of run().

...do you think i could move back to uqing enum instead of struct? I think that would be alot nicer

Yes - your issue was not to do with the enum, but to do with your handling of the Vec<u8> that encode generates, and the [u8] that you fed into `decode.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.