What is the best way to de-/serialize a simple struct?

Hi,

I have a struct like this:

struct MyStruct<'a> {
    foo: HashSet<&'a str>,
    bar: Option<HashSet<&'a str>>,
    baz: Option<String>,
}

serde seems a bit overkill, so I'd like to avoid it as dependency.

I could create a simple custom text format, but I don't want to handle string escaping by hand.

I found miniserde (by the author of serde), microserde (based on miniserde, very new), and speedy (which looks interesting), but I'm wondering what other options there are.

Speed is important, but the struct should only be 4 to 8 MB on the hard drive in a text format, so most Rust solutions would probably be fast enough. I don't care much if the format is text or binary (I never used a binary format directly, but that most likely would be faster).

Serde is an almost-standard, high-quality, battle-tested library. Serialization and deserialization involves so many little details that you are almost guaranteed to get it wrong if you try to implement it yourself. Inventing ad-hoc formats isn't going to end well, either, if you eventually need to expose your data to the outside world (which you might not want to do just yet, but requirements can and do change).

Just use Serde, really. You can choose from a variety of popular formats such as JSON and Bincode, available in the serde_json and bincode crates, respectively.

8 Likes

Serialization is not some impossibly complex dark art that should be forbidden everyone but the serde authors; be careful to thoroughly specify your format and write some tests and you will be OK. And there are legitimate reasons not to use serde, as nice a library as it is; it is in fact a heavy dependency, and its model is not always a good fit for some applications where protocol evolution or forwards-compatible (de)serialization are important. I think the original poster's question is valid.

I have done that, I did come up with my own serialization format before. It surely is not rocket science. But it took me several weeks to design, think through and document the specification, write parser/writer implementations, and add enough tests that I could be reasonably confident it worked correctly, exactly because there are so many little details to think about and get right. It's not that it's impossible, but repeating the boilerplate (which serde abstracts away) shouldn't be done unless there's a very good reason to do so.

Of course. However, OP listed exactly none of them, other than "I want to avoid it".

Thanks, everybody!

@H2CO3: I'm a big fan of serde, and it is one of the reasons why I love Rust. The main reason why I'm looking for alternatives is, as @glowcoil mentioned, that it's a heavy dependency. I don't care about this for myself, but I'm working on something that probably would be useful to other people, and I don't want to force serde on them for my small crate.

I was thinking about having two options for de-/serialization (one would be serde, for people who use serde anyway), and put them behind feature flags.

there are legitimate reasons not to use serde

Of course. However, OP listed exactly none of them, other than "I want to avoid it".

Yes, I probably should have provided more information regarding my use-case.

In addition to the above: I'm working on a command line tool and lib that checks if a CSS class exists in a CSS stylesheet. For example, I'm using Tailwind CSS with a static-website generator I'm building in Rust, and I'd like to validate CSS classes at build and/or compile time (e.g. in my tera templates). I'm pretty sure, that would be useful to other people.

There is one step where CSS classes are extracted from a CSS stylesheet, and cached in a file (this would happen only when the stylesheet changes, for example via the cli tool and a file watcher).

Then later, the application would load the data from the file to validate CSS classes.

One approach would be to offer serde_json and miniserde (which is by the same author as serde, and therefore most likely properly implemented) as options for de-/serialization via cargo features.

Besides that, I just had another idea:

For a different project I used the quote crate (also by the amazing author of serde) to generate API code for Rust (a command line tool that takes JSON as input, and writes Rust source code to files).

I could do the same with my struct from above.

The benefit would be, that the CSS validation crate wouldn't require any dependency for deserializing at all.

The workflow would be like this:

  1. The stylesheet is changed
  2. The CLI tool extracts the CSS classes and writes them in the form of Rust source code to a file that is defined by the user
  3. The application is re-compiled (e.g. a web application, or a static-website generator)

The main disadvantage I see is, that every change to the stylesheet would require a re-compilation (which wouldn't be a problem for myself, but maybe for others).

However, I guess that this would be the best option for binary size (which probably only would be relevant for people who want to include my crate into a WASM application to check CSS classes dynamically at run-time. For example from within a tera template, or when CSS classes are computed at run-time).

I'm also not sure, if most people would be comfortable having a code generator adding source files directly into their 'src' directory.

A procedural macro probably would be another option, but would require re-compilation at every CSS change as well. I don't have any experience with procedural macros, so I don't know if there would be other disadvantages.

Is de/serialization an essential part of the normal operation of your library? If not, you can simply use Cargo features and #[cfg] configurations in order to give users the ability to opt in to (or out from) having Serde as a dependency.

I do believe the majority of the Rust binaries already contains the serde as a dependency either directly or indirectly. For those cases adding another ser/de crates like miniserde doesn't improve anything.

1 Like

Is de/serialization an essential part of the normal operation of your library?

@H2CO3: No, it's not essential, and I will make de/serialization optional.

I do believe the majority of the Rust binaries already contains the serde as a dependency either directly or indirectly. For those cases adding another ser/de crates like miniserde doesn't improve anything.

@Hyeonu: Yes, I'm aware of this. My plan was therefore to give users the choice between serde and something else (so people who use already serde don't have to pay for the additional "something else").

It turned out, my CSS class extractor extracts CSS classes from the TailwindCSS stylesheet (3.9MB) in 1025ms in debug mode and in 75-125ms in release mode.

If I activate all optimization in debug mode for my crate (I think, this is possible), then it's most likely more than fast enough to always run (without de/serialization).

This would add cssparser as dependency, which seems to be bigger than serde, however.

So after thinking about it, and with the feedback I got in this thread, I think a procedural macro would be the ideal approach if binary size is important (or if the 75-125ms are a problem).

Basically, the macro would:

  1. check if a cache file exists (that is newer than the CSS stylesheet)
  2. if not, parse the provided CSS file, deserialize it with serde, and write it to the cache file
  3. embed the data where the macro is called

This would eliminate serde and cssparser as dependency from the binaries of applications that use my crate, right?

I'm curious what other people think about this approach.