Replacing a hash of (mostly) ZSTs with a compile time constant


#1

I’ve been working on adding prepared statement caching to Diesel, and had an interesting idea. The initial implementation was similar to how we do it in Rails, where we construct the SQL string, and then hash that to determine a unique prepared statement name. However, the structure in Diesel can likely eliminate this cost entirely, as our queries tend to have unique types.

Our AST is primarily composed of zero sized types. Every column and table gets a unique type to represent it, such as users::id. Most of our AST nodes are entirely generic, and sized based on their fields, such as And<Lhs, Rhs>. As such a query like users.left_outer_joins(posts).filter(users::name.eq(posts::author_name)) would continue to have a size of 0, but be uniquely identifiable as a type.

I had originally thought that we could do this with TypeId::of, but that function has the constraint that it’s right side be 'static. We have only one node where that isn’t true, which is Bound<T, U>. For Bound, T represents the SQL type (always 0 sized), and U is the data being serialized. This is able to work with references, so I can’t guarantee 'static. Even if we removed the 'static bound from TypeId::of, presumably &'a i32 and &'b i32 would be considered different types (if this is incorrect, please let me know as TypeId::of probably would work).

What’s especially interesting about this case is that for Bound<T, U> I actually would prefer to eliminate U entirely. I don’t care whether it’s Bound<VarChar, &str> or Bound<VarChar, String>, as it doesn’t affect the query as a whole. That said, I think having 2 different statements for those two types to be an acceptable cost as long as when lifetimes are involved, lifetimes don’t result in an unbounded growth in the number of prepared statements.

I’ve been trying to think about ways to solve this, but without the ability to effectively control the return value for TypeId::of for that specific type, I’m a bit at a loss. So I thought I’d reach out to see if there were any ideas.

Thanks for taking the time to read through this.


#2

Instead of using TypeId::of directly, you could write a new UniqueId trait that does basically the same thing. The implementation for all of the types save Bound could just forward to TypeId::of, whilst Bound uses TypeId::of::<Bound<T, ()>> or something.

At which point, you cross your fingers and hope the optimiser isn’t feeling lazy today. :slight_smile:


#3

How would that work for something like And<Lhs, Rhs> though? I could delegate to UniqueId for both sides, but I need a way to uniquely combine them.


#4
use std::any::{Any, TypeId};
use std::marker::PhantomData;

pub struct Term;
pub struct And<T, U>(PhantomData<(T, U)>);
pub struct Bound<T, U>(PhantomData<(T, U)>);

pub trait UniqueId {
    type Id: Any;
}

pub fn unique_id_of<T: UniqueId>() -> TypeId {
    TypeId::of::<T::Id>()
}

impl UniqueId for Term {
    type Id = Self;
}
impl<T, U> UniqueId for And<T, U> where T: UniqueId, U: UniqueId {
    type Id = And<T::Id, U::Id>;
}
impl<T, U> UniqueId for Bound<T, U> where T: UniqueId {
    type Id = Bound<T::Id, ()>;
}

fn main() {
    println!("id: {:?}", unique_id_of::<And<Bound<Term, String>, Term>>());
    println!("id: {:?}", unique_id_of::<And<Term, Bound<Term, String>>>());
    println!("id: {:?}", unique_id_of::<And<Term, Bound<Term, &'static str>>>());
}

#5

Hm, this is interesting. I’m mildly concerned about the burden this would push on third party crates adding new expressions, but this certainly seems to be the most plausible solution. I will explore this further.


#6

I suppose eventually with specialization I can fall back to the less performant “hash the SQL query” form if UniqueId isn’t implemented, rather than require it doing something like:

pub enum PreparedStatementLookup {
    TypeId(TypeId),
    HashedQuery(u64),
}

pub trait PreparableStatement: QueryFragment {
    fn prepared_statement_key(&self) -> PreparedStatementLookup;
}

impl<T: QueryFragment> PreparableStatement for T {
    default fn prepared_statement_key(&self) -> PreparedStatementLookup {
        let query = somehow_create_query_without_db_backend_specifics(self);
        let mut hasher = SipHasher::new();
        query.hash(&mut hasher);
        HashedQuery(hasher.finish())
    }
}

impl<T: UniqueId> PreparableStatement for T {
    fn prepared_statement_key(&self) -> PreparedStatementLookup {
        TypeId(unique_id_of::<T>())
    }
}