Request for feedback on database API design

droundy · September 29, 2019, 4:09pm

I've been thinking about writing a "simple" nosql strongly typed relational database, and find myself tempted to make it use a global variable, which of course horrifies me, but for which I can't see a cleaner approach. So I'd like to talk through my reasoning hear and see if y'all have a better idea.

By a relational database, I mean that data can refer to other data, including in cycles. By strongly typed, I mean that users will specify their "tables" as rust types. e.g.

struct Person {
  name: String,
  birth_year: i64,
  mother: Option<Key<Person>>,
}

To make the database relational, when a datum is inserted into the database, a Key<T> will be created, which is the primary key, and functions like a pointer to the datum. In the above example, the mother field could hold a key to the mother (if mother is in the database).

I would prevent the use of invalid keys but not making public any constructors that would allow the creation of invalid Keys. This would avoid numerous possible runtime errors and bugs. There is a catch, though: what if a user creates two databases, and tries to use a key from one database in another database? All those runtime errors become possible again.

So how to handle this?

Plan A: There Can Be Only One

Put the database in a global variable. Don't let users close it and open another.

This solves all the consistency issues (apart from a corrupt database, of course) and allows panic free operation (provided users don't do stupid unsafe things). It also means that we could look up key data more simply (e.g. implement Deref for Key<T> even if it is Copy).

Plan B: Trust/blame the User

Document that you run into trouble of you mix up the keys for two different databases and leave it at that. Requires panic if user is naughty, and that panic could happen deep in the code. Also, looking up data from keys requires user to also say which database it is for. Clumsy and panic-ridden doesn't make this sound appealing.

Plan C: Heavyweight Keys

Store a reference (presumably Arc<Mutex<...>>) to the database in the keys themselves. When using keys check that they are for the right database. We then have panics at the leaves of the API. Deref becomes a possibility, but not a great idea, if it might lead to deadlock. But some sort of nice API is probably possible. It does mean keys can't be Copy so the API cannot be as pretty as for Plan A.

Plan D: Wish for Witness Types

If I were writing in Haskell, I'd use existential witness types. Basically the function that opens a database could create a brand new type that would allow to ensure that any keys used for that database would have a different type than keys for any other database. So far as I know, rust does not have this capability.

Conclusion

On the whole, I lean towards the single global database option. But I'm also uncomfortable with it. Any suggestions?

leudz · September 29, 2019, 4:47pm

The slotmap crate solved this issue by allowing a custom key type. This seems very close to Plan D.

droundy · September 29, 2019, 5:07pm

Yes, that does sound similar and is indeed trying to solve the same problem. Their solution, however is sort of halfway between Plan B and Plan D. It allows a user to protect themselves, but does nothing to protect the library from a clueless user, so it still falls in the "blame the user" category, and doesn't allow the library to assume that invariants after held in any unsafe code.

leudz · September 29, 2019, 5:26pm

You could force users to provide a different type for every database but the api will suffer a bit. Something like this on the user side:

#[derive(Key)]
struct MyKey;

Database::new::<MyKey>();

droundy · September 29, 2019, 8:23pm

I can see how you'd force the user to define one key, but I don't see how you're thinking to make them have one per database.

H2CO3 · September 29, 2019, 8:45pm

Couple of thoughts on this.

Since databases by definition manage data that is dynamic but at least decoupled/independent from precompiled programs, I don't believe it is practical or even possible to design a database without any sort of runtime error.

"This key does not belong in this database" is a quite niche kind of error anyway – in fact I don't recall ever running into it. And there are several other, much more frequent mistakes one can make to confuse the DB engine.

I'd advise you to focus on extensive (rather, complete) and ergonomic runtime error handling instead. You mentioned how compile-time checks would allow your code to be "panic-free". I'm a bit worried: are you planning to handle all or most logical errors in the data by panicking? I think you should probably treat such errors as non-fatal and use Results (as it is idiomatic in Rust) instead. Have a look at some popular database drivers (rusqlite comes to mind) – they almost always return Results from functions that execute DML instead of panicking.

I don't exactly see how this is true if you can differentiate between keys that belong in a given DB from those that do not. If you can say "this incoming key is in DB #1, but the silly user wants to use it for querying DB #2", then you could just use that bit of information embedded in the key to choose the correct database at runtime anyway.

In any case, I think having a pre-defined global variable with the single "connection" to the DB is the worst option of all. It's very ugly, not at all flexible, and the users pay all this price only to get rid of a specific, rare error, which could have been prevented by reasonable architecture of the user code (e.g. by means of the repository design pattern).

leudz · September 29, 2019, 9:37pm

I'd probably do something like this but not with a Mutex<Vec<TypeId>> (off the top of my head a lock-free linked list seems good enough) and properly handle the error.

droundy · September 29, 2019, 10:49pm

I like that! It does have panicking (which is potentially annoying), but only on database opening, which is way nicer than having every insert or read as a potential wrong-database bug. I far prefer unreachable!() to an actual error that can be triggered by an unwary user.

I could imagine going one step further by removing the Database type altogether. I'm thinking something like this where the "key type" defines the database. But it's probably actually prettier to have a database type, so you don't have to keep turbofishing the key type for each insert.

I'm currently thinking something more like this.

droundy · September 29, 2019, 10:51pm

I agree that error handling is important, but I prefer that to be for errors, not bugs. e.g. reading a database file can always have lots of possible errors. But a user of the library giving invalid input is a bug, and I prefer the bugs be caught at compile time rather than run-time.

H2CO3 · September 30, 2019, 6:01am

Well it might be a bug from the perspective of the user's code, however letting it crash the DB engine itself would be weird and I'd say bad design. I don't really see either how is being unable to read the DB file better. Surely there's also something involuntarily bad happening if the user's code or yours insists on reading a file that e.g. doesn't exist?

I understand the desire to move detection of errors to compile time as much as possible; that's basically what Rust itself is all about. I just think that what you want here is outside of the limits of what is possible with a good design.

droundy · September 30, 2019, 3:44pm

That's different because the existence and correctness of the file are outside the control of the programmer. Using the library correctly is within the control of the programmer. So if the programmer uses it incorrectly, it's best they are notified as early as possible so the bug can be fixed.

system · December 29, 2019, 3:44pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Code Structuring for Databases help	3	605	January 12, 2023
In memory databases as first class types help	13	1782	February 5, 2022
[New crate] dbstruct: derive a typed embedded db announcements	1	389	December 7, 2022
General Advice: Relational database model in Rust? help	1	785	July 8, 2021
Database: How to do type-specific IDs? help	3	620	March 22, 2022