Hey all, I am looking for a serialization format that works at least in rust and js, that let you manipulate binary data (Vec<u8>) in a nice way: you define your Struct and then you read and write to it. behind the scene, no copy are made.
I understand that the latter limitation allow maximum efficiency as there is no need to fetch size before accessing, but I do not mind if there are some indirection and even if the data need to be shuffled arround after that
I looked at various format,
flatbuffer do not support writing (default value do not exist in the serialised data for example)
cbor require tags and all library I found expect to serialize and deserialize the whole thing (copy)
other have similar issues like MessagePack
My use case is WebAssembly module where I would like to avoid the host to keep copying the data around for the module to get latest state and so both host and module would read and write to the memory vec<u8>
I think it will be difficult to find a library that does what you want.
Consider the following problem:
Any general-purpose data serialization format will certainly have support for some kind of string (text) datatype. So the serialization format must make one of two choices:
The format mandates a text encoding for string fields, but which encoding do you use? Rust employs UTF-8 natively, whereas JS uses UTF-16. Whatever you decide on, at least one of these languages has to copy and convert its native string data during serialization/deserialization, which violates your no-copy requirement.
The format treats strings as raw bytes and you can use any encoding you want. But without a standardized encoding, you won't have convenient access to the string data in both languages. Now you could, for example, mandate UTF-8, and write JavaScript code to work on UTF-8 strings rather than native JS strings, but this is quite awkward for the programmer. After all, these serialization libraries are supposed to make the developer's experience more convenient, even if there is a(n acceptable) cost in machine performance.
So in conclusion, zero-copy serialization formats tend to impose costs on developer ergonomics, which is one of the main reasons why cross-language serialization formats don't tend to support it. On the other hand, if performance is really that important to you, you could write a JS library to manipulate data from the binary-layout crate you linked to, but I don't expect you'll find a widely-used alternative that meets all your requirements.
Depending on your desired ergonomics and how complex your structures are this is somewhere between trivial to roll with the bare APIs on both sides to literally impossible.
Since WASM memory can (and in fact, must for anything bigger than a number) be read and written directly from JavaScript, tools like bindgen are already generating JavaScript that reads out the internal Rust structures whenever you return them (which is why they must be FFI-safe with a known layout to be returned) - perhaps you can get away with simply returning a pointer to the Rust-side structures and peek and poke into them from JS?
Thanks for your replies, make sense,
I think I am just going to use byteorder for rust side and DataView for js.
And then I can create a custom API for it in both case