Serializing a crate that doesn't expose everything


#1

Hi Rust friends, I’m looking for a little guidance about how I can serialize a struct exported by a crate, but whose dependencies are not exported. Specifically from this great and super fast Cuckoo Filter crate: https://github.com/seiflotfy/rust-cuckoofilter

pub struct CuckooFilter<H> {
    buckets: Box<[Bucket]>,  // Not pub use'd
    len: usize,
    _hasher: std::marker::PhantomData<H>,
}

The struct CuckooFilter is exported, but one of its inner types Bucket is not, so I can’t derive Serialize from it. So two questions:

  1. Am I wrong and is it actually possible to still derive Deserialize on this? I started to follow the instructions on how to serialize remote crate structs from Serde’s docs and it complained that Bucket was not public, which indeed its not and theres nothing I can do about it without forking.

  2. If I do fork, why don’t I just and serde as a dependency and derive deserialize and serialize from it? I realize you don’t want to add dependencies for features people might never use, but it’s Serde, and Serde aint going away anytime soon. Is this bad practice?

Thanks for your help and thoughts!


#2

Deriving Deserialize for a data structure can be dangerous. It can allow structures to be created that violate invariants of the type which are normally protected by its public API.

(as a more general rule of thumb, a Serialize output should never be capable of producing output that you couldn’t produce by writing a manual impl that uses the public API)

Normally the better thing to do for data structures is to manually implement the traits. Serialize should generally be implemented by iterating over the collection to serialize it as either a sequence or mapping. But it looks like this data structure does not actually store its elements? I have to wonder whether serialization is even appropriate for this type at all…


To answer your actual questions:

  • In most use cases you can get away with just making a newtype and implementing (not deriving) Serialize and Deserialize on that.
  • If you want to add impls to the crate, they can be hidden behind a "serde" cargo feature so that the dependency is optional.

#3

I definitely see the concern regarding the safety of re-creating the object in this manner, however the value of a CuckooFilter or BloomFilter comes from the fact that I don’t need access to all the original data. If I did, then I would just use a big hash set. Serializing data structures that are expensive to generate feels fairly reasonable.

Either way implementing it myself or Deriving it would run into the same issue in that the type Bucket is not exported. The whole tree looks like this:

// Not exported
#[derive(PartialEq, Copy, Clone, Hash)]
pub struct Fingerprint {
    pub data: [u8; FINGERPRINT_SIZE],
}

// Not exported
#[derive(Clone)]
pub struct Bucket {
    pub buffer: [Fingerprint; BUCKET_SIZE],
}

// Exported
pub struct CuckooFilter<H> {
    buckets: Box<[Bucket]>,
    len: usize,
    _hasher: std::marker::PhantomData<H>,
}

So really just a bunch of nested u8 arrays. I don’t see away around forking, however I’m thinking I can just add some from_data() and get_data() functions. Or, it does look like the Hashbrown crate uses Serde behind a cargo feature so at least there’s some precedence there.