API Design Question: Flexible but Generic Typing [solved]


#1

Hello everyone,

I’m currently working on some bindings to a “flat-file” database (two, actually), and I have a question about the best approach to designing the API.

My first inclination was to look like HashMap<K, V> in the standard library, with get, put, and arbitrary key and value types. However, those types are a bit more constrained in this case. In particular, everything must be translatable to and from a slice of bytes, because an array of bytes is what the underlying library code uses.

My basic test cases were all about strings, so my API currently looks like this:

// a zero-cost custom opaque type for handling C-returned data
type OpaqueDatumBox = Box<Deref<Target=[u8]>>; 

pub fn exists<K: ?Sized + AsRef<[u8]>>(&self, key: &K) -> DbResult<bool> { ... }
pub fn get<K: ?Sized + AsRef<[u8]>>(&self, key: &K) -> DbResult<Option<OpaqueDatumBox>> { ... }
pub fn put<K: ?Sized + AsRef<[u8]>, V: ?Sized + AsRef<[u8]>>(
        &mut self,
        key: &K,
        value: &V,
    ) -> DbResult<()> {
        ...
}

You get the idea. Everything comes from or turns into an array of bytes. String are covered, but I’m not sure about a lot of other things.

Deref<Target=[u8]> was the most compact way to get data out, and I think stuff like this in unit tests looks good (feel free to disagree):

assert_eq!(
     str::from_utf8(&*db.get("Mr. Rogers").unwrap().unwrap()),
     "Won't you be my neighbor?"
);

But for other types, I don’t see a lot of AsRef<[u8]>> trait implementations in the standard library – or From<[u8]> or Into<[u8]> for that matter. It would be a lot harder to do something like this:

assert_eq!(db.put("Bottles of Beer on the Wall", 99).unwrap().is_ok());

This is obviously possible with things like the byteorder crate, but the caller has to put in the same boilerplate every time (“turn my integer into a big-endian u16 and unwrap it”).

Even if I were to make the generics apply fully to the DbFile struct itself instead of being taking per-method the way HashMap<K, V> does, I would still need some unified way to cast it into what the library requires: a string of bytes. I’ve though about adding a “serializer” to the object, but I’m not sure if that would actually make things better or just move the complexity to a different place.

Any ideas on how to make this more ergonomic?


#2

I’ve though about adding a “serializer” to the object, but I’m not sure if that would actually make things better or just move the complexity to a different place.

From an API point of view, this may be the way to go. Here is a minimal prototype using JSON but the format would be up to you.

let mut db = Jhwgh1968::new();
db.put("Mr. Rogers", "Won't you be my neighbor?");
db.put("Bottles of Beer on the Wall", 99);

println!("{}", db.exists("Mr. Rogers")); // true
println!("{}", db.exists(2.71828)); // false
println!("{}", db.get("Bottles of Beer on the Wall") == Some(99)); // true

#3

I agree with @dtolnay - require the key and value types to be de/serializable from byte slices. The user will know the concrete types they want to use and will also know how to de/serialize them. Your lib then provides the file management, loading and saving the byte blobs.


#4

Thanks for the input to you both. That code example helps, @dtolnay. It got me reading about Serde, something I’ve been meaning to do for a while.

I’ll be working through the code in more detail tomorrow, but from the serde documentation, it looks like I will need four pieces if I want maximum flexibility:

  1. Type K, which is the key of the pair.
  2. Types GV and PV, which are the input and output types for the value of the pair. These must be different, because deserializers return e.g. Result<V>, whereas the serializer would accept type V.
  3. SerializeFn<T> and DeserializeFn<U> types, which are function types which translate to and from &[u8] and T, or a user convertable type thereto. (That is, for values, PV=V and GV=Result<V>.)
    In the case of Serde, these two functions would be serdes_json::from_slice and to_slice. If it’s something simpler like strings, it would be just str::from_utf8 and as_ref in the simpler case of strings. (See comment 2 about GV and PV above.)
  4. The “full” constructor would accept four functions in addition to its current parameters: a serializer for type K, a deserializer for type K (which would actually return Option<K>), a serializer that accepts GV, and a deserializer that generates PV.
    While this is the extra complexity I thought I was moving around, I am now thinking that the the vast majority of use cases can be covered with at most three or four “convenience” constructors.

I’ll have to see what the code looks like to see if it’s really that ugly, but it certainly looks like a promising approach at this time.

Thanks again, and if you have any further comments or simplifications, please reply.


#5

Just to close the loop (and perhaps help others with design problems in the future), I think it looks good.

I made a separate “type helper” trait that allowed wrapping a “core” database file handle that only uses byte slices. The definitions are really ugly:

pub type SerializeFn<'a, T: 'a> = Fn(T) -> &'a [u8];
pub type DeserializeFn<'a, T: 'a, E: 'a> = Fn(OpaqueDatumBox) -> Result<T, E>;
pub struct DbStore<'a, K: 'a, KD: 'a, KE: 'a, V: 'a, VD: 'a, VE: 'a, DBE: 'a> {
    serialize_key: &'a SerializeFn<'a, K>,
    deserialize_key: &'a DeserializeFn<'a, KD, KE>,
    serialize_value: &'a SerializeFn<'a, V>,
    deserialize_value: &'a DeserializeFn<'a, VD, VE>,
    db: &'a mut DbFile<DBE>,
}

impl<'a, K: 'a, KD: 'a, KE: 'a, V: 'a, VD: 'a, VE: 'a, DBE: 'a> DbStore<'a, K, KD, KE, V, VD, VE, DBE> {
    fn get(&self, key: K) -> Result<Option<Result<VD, VE>>, DBE> {
        let key_slice = (self.serialize_key)(key);
        match self.db.get(key_slice) {
            Ok(val_option) => {
                match val_option {
                    Some(val_slice) => Ok(Some((self.deserialize_value)(val_slice))),
                    None => Ok(None),
                }
            }
            Err(e) => Err(e),
        }
    }
    // .. other functions look similar ...
}

But the actual implementations are not half bad:

///
/// A ``DbStore`` view where keys and values are UTF-8 strings.
///
pub type DbStoreStrings<'a, E> = DbStore<
    'a,
    &'a str,
    String,
    str::Utf8Error,
    &'a str,
    String,
    str::Utf8Error,
    E
>;

fn deserialize_string(b: OpaqueDatumBox) -> Result<String, str::Utf8Error> {
    match str::from_utf8(&*b) {
        Ok(s) => Ok(String::from(s)),
        Err(e) => Err(e),
    }
}

impl<'a, E> DbStoreStrings<'a, E> {
    ///
    /// Creates a new view over the passed ``DbFile``.
    pub fn new(db: &'a mut DbFile<E>) -> DbStoreStrings<'a, E> {
        DbStoreStrings {
            serialize_key: &str::as_bytes,
            deserialize_key: &deserialize_string,
            serialize_value: &str::as_bytes,
            deserialize_value: &deserialize_string,
            db: db,
        }
    }
}

And the code to use them is even easier (just call new on the right type, and be done with it).

Thanks again!