Review: Relational algebra library overview (MS thesis chapter)

First, I want to thank everyone who read and gave feedback on my previous post. It's been extremely helpful.

This time, I have the first draft of a different section that I'd like feedback on. The intent of this section is to give a high-level ("guide-level" in Rust terms) overview of the Rust library which forms the core part of my thesis. I suspect that there may be some pretty large omissions in this section-- I'm intimately familiar with the whole project, which puts me in a poor position for figuring out what is and isn't obvious.

Edit: There are a few references in here to Tylisp, which is described in detail in the previous chapter. It’s a domain-specific language which describes type-level functions; a Tylisp function takes Ryst types as arguments and then returns some other Rust type as its output. It operates entirely at compile-time, and never touches any runtime values.

The remaining sections in this chapter contain reference-level information about everything mentioned here; I have deliberately omitted them from this document -- I want to find out if the overview is understandable in isolation, without referring to the details later in the paper.

NB: This still needs a formatting pass to do things like put inline code in an appropriate font.


First pass

I hit a bit of a wall on page 63:

To associate this plan with Unique, we need to define the UniquePlanner tylisp function.
[Dense macro to presumably define UniquePlanner followed by a couple non-macro impls that don't (directly) involve UniquePlanner]

A quick google search for "tylisp" brings up your crate, so presumably this is a system that's been covered in the preceding chapter(s).

Outside of that, nothing major jumped out at me. However, I was primarily just getting the gist of things as I read along, and mostly skimmed the code blocks. The summary (section 4.2) really solidified what I had come around to as a read. This slowness on my part is at least partially, and perhaps mostly, because I haven't thought about things in relational algebra terms for some time. Now that I'm in the right head space, I'll give it a...

Second pass

I ramble a bit

4.1.1 Data Model

Now that I'm in the right head space, this all makes perfect sense.

4.1.2 For Application Programmers

If I read this as a "how to use this" guide, everything reads well. From a Rust perspective there's a lot of wondering what's going on under the hood. :slight_smile:

Maybe the previous section would benefit from highlighting that records are usually tuples of column values (or if this isn't in fact typical, perhaps this section and the next should highlight that in these examples, the records are tuples of column values).

4.1.3 For Architects

You sort of skip strait from columns to relations, with this note:

The most convenient way to define a schema is to define a singleton structure which contains all of the schema’s relations as fields.

Looking at the code, I conclude that the records are (PartId, PartName) -- let's call this Tuple -- and that the relation is a Vec<Tuple>. The rest of the section then flows well in terms of getting the point of the data structures. These statements did throw me off for awhile though:

A BTreeIndex<K,R>, for example, contains several instances of the relation type R, one for each unique value of the column K.
Unlike BTreeIndex, RedundantIndex contains only a single instance of its declared subrelation.

In BTreeIndex, R is Option<Tuple>, so for awhile in my head I stumbled with "they really mean the record type" and "that [Option] is not the relation [Vec], though". Then with RedundantIndex (which also has a Option<Tuple> in the signature) I struggled with how this could only hold a single instance in some sense.

After reading back and forth a couple times, I realized that the (sub)relation in question was changing each time, and that the (not shown) struct of RedundantIndex contains a single BTreeIndex.

If I'd read more carefully at first to map R to the generic parameter, I probably wouldn't have stumbled so bad, so it's somewhat on me. But perhaps the quote below could be reworded, as the parts relation at that point (Vec<Tuple>) isn't present in the other (indexed) data structures.

Correcting these deficiencies requires adding an index to the parts relation.

For Library Authors

This is very dense, but that's not a fault, it's just a dense subject. The way a few of the pieces fit together are still not clear to me, but I'm not sure if those pieces being clear is a goal of this section or not. In case it is:

A lot of the associated types and trait bounds only appear in this section of the excerpt (FastCols, QueryRequest). Most of them are pretty intuitive. Some of them don't have entirely clear origins or relationships, though. ExternalRecord is one example.

The mechanics of the tylisp function are still a mystery to me. Here's roughly where I got to. I make an iterator UniquePlan, I implement QueryPlanImpl for it, and now I can get a UniquePlan out of a Unique and an appropriate query Q. Is Unique is now Queryable<Q> via blanket impl?

Based on the definition of prepare here, the way we query a relation is through the query() method. Perhaps this method has a default definition, which uses Self::Planner? E.g. in this case, Unique::query uses UniquePlanner to create a QueryPlan, on which we call prepare()?

In summary of this section, I think it's pretty analogous to the "for application programmers" section. It seems pretty complete as a reference or example on how to write an adapter, but leaves out a lot of details of the machinery going on. I suspect it's perfectly fine for what are probably the goals of the excerpt. Perhaps some notes (reminders?) of how things tie together are possible, e.g. what makes Unique Queryable.

I still have the impression that the tylist concept and macro were covered in a section prior to this excerpt. I think it would have been useful as a reviewer to have that included in this excerpt.

In summary

I wrote a lot of text but, if I understand the purpose of these sections, there are only a couple minor suggestions. The rest is me trying to suss out the larger picture and technical details which probably aren't germane. Let me know if you want me to boil all this down to a short bullet list of minor suggestions.

Knowing what the tylist function/macro is about would have made reviewing smoother. I assume it's covered earlier in your thesis.

Misc. Typos

  • Page 61: quert in "direct lookup for any quert"
  • Page 61: extra comma in "continuous, memory allocation"

Thanks; this is really helpful. Inline responses below.

Sorry about that; I should have mentioned that in my initial post. Tylisp is the main subject of the previous chapter; it’s a domain-specific language for expressing type-level functions and complex type bounds.

This is essentially all you should need to know to understand what’s going on (though actual implementations might have more complicated functions). A tylisp function takes Rust types as arguments and returns a Rust type as output. In this case, it’s taking the relation and query types as input and producing some type that implements QueryPlanImpl. The library will then use QueryPlanImpl::prepare() to construct an instance of that object to actually perform the query and IntoIterator to step through the results (implemented in this case by implementing Iterator directly).

I’ll clean this language up and add it the section; it doesn’t hurt to remind the reader what the main point of the previous chapter was, and how it fits into the bigger picture.

That’s a good piont. Records can have a pretty arbitrary shape, but flat tuples are the most convenient for examples. “Usually” may be too strong, but flat tuples are extremely common, especially in the main definitions. The more exotic types are usually generated as a view output or similar.

This is definitely worth calling out explicitly in the text.

I make heavy use of blanket impls to verify that several pieces agree with each other, and Queryable is a big one. I should probably call those out here: The rest of the chapter talks about each trait/struct individually, but this section needs to provide the big picture of how it all fits together.

This is basically correct; I should say this explicitly somewhere. Note that prepare() is an associated function: it takes no Self argument. What Planner returns is a type which implements QueryPlan, and then prepare() is called to construct an instance.

To use an analogy, this chapter is intended to be the picture on a jigsaw puzzle box. The rest of the chapter are the individual pieces: lots of detail, but a very narrow focus. As the reader is trying to piece together an understanding of the entire system, this is supposed to give some idea of what role the pieces that they don’t fully understand yet serve.

1 Like

Here's what I added to the top of the "For Library Authors" section to describe the overall query process:

All of Memquery’s machinery for executing queries is contained in a blanket implementation of the Queryable<'a, Q> trait (§4.2.5). Library authors need to be familiar with how Queryable works in order to make their extension types interact with the rest of the Memquery ecosystem. Also, unlike application programmers, library authors will usually call Queryable::query() directly instead of constructing adapters when they need to retrieve records from a relation.

Every query request is represented by a type that implements the QueryRequest trait (§4.6.1), which specifies a set of selection filters and a presentation order for records. Given a query request q of type Q , Queryable<'a, Q>::query(&'a self, q) performs a three-step process:

  1. Execute the tylisp expression {<Self as RelationImpl>::Planner, &'a Self, Q} to obtain a query plan type P .
  2. Construct an instance p of type P by calling <P as QueryPlanImpl>::prepare(&'a self, q)
  3. Obtain an iterator of result records by calling <P as IntoIterator>::into_iter(p)

Queryable<'a, Q> is automatically implemented for all types where this process produces an iterator that yields items of type <Self as QueryOutput<'a>>::QueryRow. The primary task of a library author, then, is to provide implementations of these traits such that their extension type implements Queryable for every possible QueryRequest type Q.

1 Like

I think that helps a lot (especially without the context of the preceding chapters).

This is pretty interesting generally, and was also key to wrapping my head around it.

1 Like