I'm developing (in my spare spare time) a little crate called polygraph
that creates data types for multiple interconnected graph structures. This was somewhat motivated by my development of a build system fac, which needs to track relationships between files (both input, output, and both) and build rules. It's got two kinds of node and multiple kinds of relationships, so you can't easily use something like petgraph.
I'd appreciate suggestions on the API for querying my graph data structure (see below after I've explained an example graph).
Anyhow, the basic idea of polygraph is to use a macro to declare both a set of interrelated data types, and a graph type (or in-memory database type) that will hold all the data, including relationships between the objects. This system will store both forward and reverse linkages, so iterating over links in any direction will cost O(1) per element returned. Accessing any data requires a reference to the graph, and a Key<T>
which is a Copy
type that internally is just an index into an array, but its type indicates which array it is an index into. Conceptually I think of a Key
as like a kind of pointer, but it requires a reference to the graph in order to be dereferenced. If this were C or C++, I probably would have used a pointer for this (and had all the accompanying unsafety). Requiring that reference, however, makes exclusive access via &mut
very clear, so I think it's a good thing, it's just a little hard to figure out how best to arrange that access.
Here is an example (see also the documentation) which generates a family tree data structure with pets (well, only dogs).
polygraph_macro::schema! {
/// The database (or graph) itself will be a type called FamilyTree.
type FamilyTree;
/// By defining a `Surname` type, we can query which `Person` have a
/// given surname.
pub struct Surname(pub String);
pub struct Person {
pub last_name: Key<Surname>,
/// `father` is a one-to-many relationship
pub father: Option<Key<Person>>,
/// `mother` is a one-to-many relationship
pub mother: Option<Key<Person>>,
/// We use a raw String for given name, since we don't care to
/// iterate over all "Davids". Otherwise we'd do the same as with [`Surname`].
pub name: String,
/// `dog` is a many-to-many relationship
pub dog: KeySet<Dog>,
}
pub struct Dog {
pub name: String,
}
}
I can construct a simple graph with the following:
let mut db = Tree::new();
let mickey = db.insert_dog(Dog {
name: "Mickey".to_string(),
});
let minnie = db.insert_dog(Dog {
name: "Minnie".to_string(),
});
let my_dogs: tinyset::Set64<_> = [mickey, minnie].iter().cloned().collect();
let roundy = db.insert_surname(Surname("Roundy".to_string()));
let maiden_name = db.insert_surname(Surname("Maiden".to_string()));
let me = db.insert_person(Person {
last_name: roundy,
father: None,
mother: None,
name: "David".to_string(),
dog: my_dogs.clone(),
});
let wife = db.insert_person(Person {
last_name: maiden_name,
father: None,
mother: None,
name: "Monica".to_string(),
dog: my_dogs.clone(),
});
let kid = db.insert_person(Person {
last_name: roundy,
father: Some(me),
mother: Some(wife),
name: "Kid".to_string(),
dog: my_dogs.clone(),
});
What I'd like help with is in writing a query to this graph. I've implemented a couple of APIs, but I'm not really happy with any of them.
The first option would be to simply index the db
using a key in order to query that element. When querying one deep this looks nice:
assert!(db[me].father_of.contains(kid));
However, when you start wanting to query something that requires two accesses to the database (like the text of my last name) it gets harder to read because the db
access is nested rather than sequential:
assert_eq!(db[db[me].last_name].0, "Roundy");
Or as a third possible query, you could check if two people have at least one dog in common:
assert_eq!(db[me].dog.iter().any(|dog| db[dog].dog_of.contains(kid));
or I suppose we could have looked at the intersection of my dogs with those of my kid. This doesn't seem too bad, but I really dislike the nested square brackets on the last_name assertion above.
A second option would be to create a method on Key
that takes a &FamilyTree
as an argument, so that chaining works. For now I'll call this method d(&db)
, but giving it a decent name (short, but clear) is one thing I'd like advice on, if this seems the best way. The three queries above become:
assert!(me.d(&db).father_of.contains(kid));
assert_eq!(me.d(&db).last_name.d(&db).0, "Roundy");
assert_eq!(me.d(&db).dog.iter().any(|dog| dog.(&db).dog_of.contains(kid));
I like that you can read this from left to right in a more natural way (for those of us accustomed to methods, anyhow), but I don't like my name d
for the method, and am having trouble inventing a short name that is decent.
Other options that I have considered and rejected (but could reconsider) are:
-
To have a global variable for the graph data structure, and implement
Deref
onKey
. This would be extremely ergonomic, but would preclude having two different graphs, which could be a problem. It would also make any sort of mutation challenging, since we wouldn't every be able to ensure there is exclusive access. If we used some sort of a lock-free data structure we might be able to mitigate this, but on the whole it doesn't seem like a great idea (unless one of you has a brilliant suggestion to revive it). -
Embed a reference to the
FamilyTree
in eachKey
and then implementDeref
. This gives the same ergonomics benefit, but now we can't hang onto aKey
without preventing any mutation of the graph, which seems like a pain. Also, keys become larger, and are presumably no longerCopy
? -
Perhaps a hybrid system where you have both
Key<T>
andKeyRef<'a,T>
with the latter like #2 above, holding a reference to the graph and implementingDeref
, and a method to convert aKey
into aKeyRef
(which could either be indexing or a method that would look like my.d(&db)
above).
As I said, I'm not satisfied by any of my ideas, and am hoping that y'all can come up with either something better, or a fix for one of the issues I've identified. Or at least you can tell me that you like one of these ideas, and I can move forward with fewer doubts that I'm making something that will be incomprehensible to use.