Hi all, I'm pretty new to Rust and I was trying to implement a custom binary serialization. I've seen that serde (and bincode) seem to be the standard de facto in Rust, which is actually nice.
My problem is that I have custom structs (bringing it from the java world) and I want to write bytes in a bytebuffer (DataOutputStream in Java) to write down my own format.
The struct is very simple
pub struct MyStruct {
size: u32,
offsets: Vec<u32>,
strings: Vec<u8>,
}
size
indicates how many strings I'm storing
offsets
indicates where the string start and ends on the string
vector
strings
is the vector of all the strings one after the other.
For instance, when storing "foo" "bar I will have
- size to 2
- offsets to be: [16, 19, 22] that represent where the strings start and ends in the sequence of bytes
- strings to be [98, 97, 114, 102, 111, 111] (the byte representation for "foobar"
The structure serialized will give this sequence of bytes:
[0, 0, 0, 2, 0, 0, 0, 16, 0, 0, 0, 19, 0, 0, 0, 22, 98, 97, 114, 102, 111, 111]
As you can see there is size 2 (the first 4 bytes), the offsets to see where the string starts and end (foo start after 16 for 22-16=3 bytes, bar starts after 19 for 22-19=3 bytes).
Now, doing this in java is pretty simple (sorry for the comparison). When serializing I just do a bunch of DataOutputStream.writeInt(), DataOutputStream.write()
and when deserializing I do the opposite: ByteBuffer.getInt(), ByteBuffer.get()
in the corresponding for loops.
I was trying to do the same in Rust, and I got to know serde which looked a pretty fancy idea to remove a bunch of boilerplate I didn't really need from the java variant. But I've ended up with even more code so I suspect I'm doing something wrong.
Here's my implementation:
#[cfg(test)]
mod test {
use super::*;
use bincode::Options;
use serde::ser::SerializeStruct;
use serde::{Deserialize, Serialize, Serializer, Deserializer, de};
use serde::de::{Visitor, SeqAccess};
use std::fmt;
#[derive(PartialEq, Debug)]
pub struct MyStruct {
size: u32,
offsets: Vec<u32>,
strings: Vec<u8>,
}
impl Serialize for MyStruct {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut state = serializer.serialize_struct("MyStruct", 3)?;
state.serialize_field("size", &self.size)?;
for i in &self.offsets {
state.serialize_field("offsets", i)?;
}
for o in &self.strings {
state.serialize_field("strings", o)?;
}
state.end()
}
}
impl<'de> Deserialize<'de> for MyStruct {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
struct SymbolTableVisitor;
impl<'de> Visitor<'de> for SymbolTableVisitor {
type Value = MyStruct;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("MyStruct")
}
fn visit_seq<V>(self, mut seq: V) -> Result<MyStruct, V::Error>
where
V: SeqAccess<'de>,
{
let size: u32 = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(0, &self))?;
println!("Deserialized size: {}", size);
let capacity = (size + 1) as usize;
let mut offsets = Vec::with_capacity(capacity);
let mut strings = Vec::with_capacity(size as usize);
for i in 0..capacity {
let element = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(i, &self))?;
println!("Deserialized element: {}", element);
offsets.push(element)
}
for i in 0..size {
let symbol = seq
.next_element()?
.ok_or_else(|| de::Error::invalid_length(i as usize, &self))?;
println!("Deserialized symbol: {}", symbol);
strings.push(symbol)
}
Ok(MyStruct {
size,
offsets,
strings,
})
}
}
const FIELDS: &'static [&'static str] = &["size", "offsets", "strings"];
deserializer.deserialize_struct("MyStruct", FIELDS, SymbolTableVisitor)
}
}
#[test]
fn it_can_get_serialized_and_deserialized() {
let mut offsets = Vec::with_capacity(2);
offsets.push(1);
offsets.push(2);
let mut strings = Vec::with_capacity(2);
strings.append(&mut "hello".to_string().into_bytes());
strings.append(&mut "world".to_string().into_bytes());
let my_struct = MyStruct {
size: 2,
offsets,
strings,
};
let my_options = bincode::DefaultOptions::new()
.with_fixint_encoding()
.with_big_endian();
let serialized = my_options.serialize(&my_struct).unwrap();
println!("serialized = {:?}", serialized);
let deserialized: MyStruct = my_options.deserialize(&serialized).unwrap();
println!("deserialized = {:?}", deserialized);
}
}
Spoiler: the test doesn't work.
I have many questions on this snippet, trying to summarize them here:
- Is this really the best practice to ser/deser a custom format or should I change direction here? It seems that serde/bincode are very good as long as you don't have format constraints, but when you want to do something more custom they start to seem too restrictive?
- deserialization doesn't work because I suspect I've given 3 fields, but trying to deserialize more:
next_element
is counting as a field, instead I'm trying to deserialize part of the field.. not sure how to solve this. - not quite sure why I need to name fields and struct since I'm never using them. this happens for instance in
deserialize_struct
,serialize_field
,serialize_struct
,write_str
- one other question is on all the usize to u32 convertions... should I really do all of that?
Sorry for the long post... hope that you guys can give me a lead of what am I doing wrong since I thought this would have been easier that what I'm experiencing..
Thanks a lot!