I have a need to save a complex nested structure to disk. I took the time to learn how to write/read simple string files and feel fairly competent with that, but can't find any good resources describing how to save complex data structures. I brainstormed with my son who spends a lot of time working with PHP and he mentioned using JSON or YAML. We also found references to a crate titled serde. Still, I'm mostly clueless how to get started at solving this problem. Here's a sample code snippet showing something similar to the structs I needing to work with:
struct Sub {
one: String,
two: u32,
three: bool,
}
struct Super {
ten: String,
twenty: Vec<Sub>,
thirty: f64,
}
Notice that the Super struct contains a field that is a vector that contains elements of type Sub. Can anyone point me in the right direction for learning how to solve this kind of data structure to disk? Thanks.
serde is a crate which handles serialization (ser) and deserialization (de). It allows structs to be serialized and deserialized to many different formats, including complex structures like Vec. If you don't need your struct to be read to users or other programs, you might want to serialize to a binary format.
I recommend reading further in the official serde documentation here. Your data should be pretty trivial to serialize by the library, requiring just a few #[derive(Serialize, Deserialize)] annotations to make it possible.
let super = Super {...}; // struct contents omitted for brevity
let json: String = serde_json::to_string(&super).unwrap(); // serialize to a JSON string
// The `json` could be written to file, or transmitted over the network, etc
let new: Super = serde_json::from_str(&json).unwrap(); // Deserialize from a json string
assert_eq(super, new);
Note that there are serialization formats other than JSON that are also supported by serde, it's quite a flexible library.
I would like to caution against this, especially for people who are new to serialization. Binary formats do not necessarily result in smaller serialized data; they can, but there's no guarantee.
As with all things in software development, KISS reigns supreme.
Reading is a bit more complex, as one has to define the size of the structure before reading it. There's std::mem::size_of::<u32>() for getting the size of primitives. As you can see, code above writes the length of String before writing the string its self, so one can read the appropriate length.
This gives a binary file, obviously, so read and write methods have to exactly match. Very compact and fast, great for large amounts of data.
Many prefer to serialize data before writing it to a file, e.g. to JSON. Human readable data sounds great, in practice one still has to know pretty exactly what to expect when reading programmatically, so I serialize only for exchange with other programs, e.g. a web browser.
For JSON I prefer json - Rust over serde, as serde comes with a huuuge chunk of dependencies and if I understood it correctly, is more a generalized crate for all kinds of serializations. Triggers my overengineering itch. Crate json does it's job perfectly, in an easily understandable fashion.
Oh, and don't forget a versioning marker at the beginning of the file or inside JSON, in case data format might change in the future.
You completely missed the point of Serde. Serde allows
working with static domain types. The crate you linked to doesn't. You'd need to do all the type validation and conversion manually.
See, I gave Serde a fair chance and tried for several hours to get it working. I even ignored these long compile times. I watched this video which tries to explain how Serde works: Decrusting the serde crate - YouTube
It didn't work out, so it's not for me.
And yes, sometimes I really miss the simplicity of 1980ies C, modern PHP or JavaScript when writing Rust code. All too often Rust looks much like a playfield for programming language experts.
I didn't get it working. Maybe I had get it working if I have had this sample code above.
Now I'm a happy user of the json crate. It requires writing a couple of lines, which I don't mind as long as it's easy to do. While doing so I can do some data refinement, e.g. rounding floats to avoid bloating JSON with 15 digits after the decimal where 2 are sufficient.
I have no idea what you are talking about. Serde is not hard to use. You literally only need to add #[derive(Serialize, Deserialize)] to your type and they become (de)serializable. A full working example is only a couple lines of code.
If it took you more than that, then that's a sign that either you did something absolutely wrong, and/or you need to learn the language, the tools, and the ecosystem better.
You have to know it works this way. That's a distinction.
Now imagine you come along naively and think: "I want to write my data as JSON!" In JavaScript it's just JSON.stringify(data), so what could be so hard with this plan? Then compare these two documentation pages to see what gets closer to the plan:
The first one comes along with a number of examples, doing exactly what you want: turn some data into JSON. The second one talks about "generically", "ecosystem", "provides a layer", for JSON there's only a link. At this point it's obvious: serde is for experts, json is for "just do it". Whether this evaluation is justified or not doesn't matter at this point.
That said, even with the knowledge I gained here in this discussion I won't swap for serde, because my incoming JSON is of only vaguely known structure and the known parts don't match what I use in Rust, so I have to parse and/or convert manually anyways.
Coincidentally it was only this morning I had the urge to see how I would serialise data with protocol buffers. Which might be useful in an up an coming project. A quick google told be that there was a protobuf crate for Rust that would do the job. I soon found the docs: protobuf - Rust where the first thing it says is: "Library to read and write protocol buffers data". Great, I thought, perfect, just what I need.
Three hours later.... Perhaps I am stupid but after hunting around in that moras of Modules, Structs, Enums, Constants, Traits and Type definitions, I have no idea how to begin using this thing. I'm sure it's in there somewhere but jees.
// Encode a struct into binary:
let out_bytes: Vec<u8> = my_struct.write_to_bytes().unwrap();
// Decode binary into a struct:
let my_struct = MyStruct::parse_from_bytes(&in_bytes).unwrap();
But there is a lot of steps to set that up to work, none very complicated but everything is impossible til you find out how.
The moral of the story...
I have been programming in languages like C, C++, Pascal, PL/M, Coral, Ada, for decades, using all kind of libraries. I have been into Rust on and off for three years or so, but often I find the language and crate documentation impenetrable.
So yes, "we need to learn the language, the tools, and the ecosystem" as you say. But would that really help much in the many cases like this?
Some simple examples in the documentation to get started with would be wonderful.
P.S. A big thanks to Jeff Garzik for that wonderful protobuf example.
I've written Rust, Java, Scala, Haskell and a little C++. I've found that documentation in the Rust ecosystem, i.e. The language, standard library and popular libraries, to be by far the best. It's good practice to have doctests for every public function, which often serve as an example. From my point of view, the cpp documentation is borderline incomprehensible. So that varies very much depending on your background.
Yes, this is exactly what I have been battling for the last year as I've been trying to learn Rust. Honestly, I love Rust, especially its built-in safety features. I'm just now coming back to coding after a hiatus of decades and it has been a difficult struggle, primarily because the Rust documentation is either poorly written or the writers assume a level of familiarization with programming languages and practices that I simply don't have.
That said, it would be unreasonable for me to insist that all writers of this kind of documentation be able to write at a level accessible by novices. That's a skill/quality that goes beyond a person's expertise/skill in his/her profession. I spent much of my vocational career as a math teacher where it was vital that I present the knowledge and skills I taught at a level that my students could access. The thing is, teaching is/was in my blood. It came naturally. (My mother said I was born to be a teacher, and nobody knows you as well as your mother. :>) Other people are born to be computer programmers and may or may not have the ability to write their documentation at novice level.
And...that said, it is not unreasonable for me to expect that the Rust community should make changes in the format and common conventions that Rust documentation tends to follow. They also could, as @ZiCog suggested, include lots and lots -- and lots -- more examples of how to use each feature. I'll say this, too. Examples should end with a tested result that isn't just some assert() statement. I think that if these issues had been addressed before I started learning Rust, the learning curve I've been working through would likely have been cut in half.
I agree that in general the documentation for Rust and crates is very good. It is certainly comprehensive. But really, where would you have started with the protobuf case I described above?
Ha, yes. That is because C++ is incomprehensible. C++ has the benefit of an ocean of books and tutorials about just about anything that have accreted around it over decades.
I would not say the documentation is poorly written. It's just that it is mostly exclusively a reference work rather than a tutorial. As is typified by the protobuf docs: protobuf - Rust. Everything is listed and specified there in glorious detail. It's a dictionary rather than a literary work, if you see what I mean.
Yes, indeed, but how does one find out what one should be doing? I'm sure it could be done from the protobuf docs but it would take forever to comb through all those details and figure out how to piece it together.
How should I have approached my protubuf adventure?
More philosophically the Rust headline is "A language empowering everyone
to build reliable and efficient software". Well, OK, that includes empowering all those who don't yet know what they are doing with Rust or it's crates. They (we) need some help.
I disagree and strongly detest unnecessary creep of JSONs and XMLs.
If your serialized data is not intended for direct human consumption or editing, my rule of thumb is to use a binary format by default, especially considering that serde + bincode makes it a breeze. More often than not binary formats will be smaller and significantly faster to encode and decode. They also provide more opportunities for future optimizations, e.g. if your data contains a lot of binary blobs or strings, you can deserialize your data in a borrowed fashion, so instead of needlessly copying data around you will work with references into blob with serialized data.