I’ve been working on a datalog implementation in Elm, just to learn how datalog works. The summary motivation: datalog is great because it provides a relational and total language and guarantees termination for finite data sets. If you do a few tricks, you can also extend it to handle negation and aggregation as well.
I’m excited to try this as the data layer in an Elm application, since you can join whatever things you need into the data needed for your view.
Let’s model a simple contact book application with this: suppose you have people. Each person has many email addresses, and people can also share an email address (for example support@fruits.com
.)
You might model it like this in Elm:
type alias Person =
{ id : Int
, name : String
}
type alias Email =
{ id : Int
, address : String
}
How do you join these two data structures? You might:
- embed
emails : List Email
inPerson
, orpeople : List Person
inEmail
(or both) - add
id
toPerson
andEmail
, and then embed a list of IDs in each ofPerson
andEmail
- make another joining structure (
{ email : EmailID, person : PersonID }
) to join them
All these are reasonable, depending on the constraints of your application and how OK you are with things getting out of sync (e.g. with emails being associated with people but not the other way around.)
But to be honest, I can way more about the experience that the person using my apps get than I care about the fine-grained semantics of how data is stored. I more-or-less just want a database that I can query instead of having to join things on IDs in Dict
s or whatever.
This is where datalog shows up! It lets you specify a bunch of tuples (which, uh, aren’t typed by default… although typed datalogs do exist!) You might go from this Elm:
people : List Person
people =
[ { id = 1, name = "Brian" }
, { id = 2, name = "Atlas" } -- my cat
]
emails : List Email
emails =
[ { id = 1, address = "brian@domain.com" }
, { id = 2, address = "support@fruits.com" }
]
To this datalog:
person(1, "Brian").
person(2, "Atlas"). -- my cat
email(1, "brian@domain.com").
email(2, "support@fruits.com").
To join things up, we can define a relationship between person
and email
:
personToEmail(1, 1).
That’s not really that friendly, though—we probably want to present all of someone’s emails at once to the user of our program. We can do that by writing a relation:
personEmail(Name, Email) :- -- read as: personEmail for Name and Email exists IF
person(PersonID, Name), -- … there's a person named Name
email(EmailID, Email), -- … and an email named Email
personIdToEmailId(PersonID, EmailID). -- … and their IDs appear together in the join
This involves a few new concepts: :-
means “implies”, as in “the things to the right of me imply the thing to the left of me.” Multiple clauses in a rule (separated by commas) all have to be true, and variables are introduced implicitly the first time they appear.
The output of this looks like:
personEmail("Brian", "brian@domain.com").
We can also see people without an email address, or email addresses without people:
peopleWithoutEmails(Name) :-
person(_, Name), -- person exists
not personEmail(Name, _). -- but doesn't have an email
-- produces peopleWithoutEmails("Atlas").
emailsWithoutPeople(Email) :-
email(_, Email), -- email exists
not personEmail(_, Email). -- but doesn't have a person
-- produces emailsWithoutPeople("support@fruits.com").
At no point did I have to write my own joining plan with dictionaries or whatever! I just encoded the relationship between different domain objects and their fields, and the datalog system took care of all the logic for me!
At this point you might ask something like “but, how is that better than SQL?” Well, first of all it’s all in the client so your server can do much less work (just send up the data, or sync it from browser storage.) Second, that’s not all it can do: recursion is also allowed!
link(1, 2). -- there's a directional link from node 1 to node 2
link(2, 3).
link(3, 3).
link(3, 4).
-- you can get to node B by starting from node A
reachable(A, B) :- link(A, B).
reachable(A, C) :- link(A, B), reachable(B, C).
-- reachable(1, 4)
-- reachable(2, 4)
-- reachable(1, 3)
-- reachable(3, 4)
-- reachable(3, 3)
-- reachable(2, 3)
-- reachable(1, 2)
-- you can also make the graph here undirected (links can be walked both ways) by adding either of these two clauses:
reachable(A, B) :- link(B, A). -- to just change it for the reachability relation
-- or
link(A, B) :- link(B, A). -- to extend the `link` set directly
So with this engine, you get a whole recursive database in the browser, and you know queries will terminate. I’ve got quite a bit of work left to do on this to make it fast enough to handle large data sets and massive amounts of queries, not to mention the work of making a nice Elm-flavored API on top of it, but it’s pretty good already!
So, if you’ve read this far, thank you! Now, I’m curious: what would you build with this? What kinds of data modeling challenges would you see coming up because of it?
If you want to play around with it more, you can get an in-browser interpreter here: datalog.bytes.zone. You can save and share programs from it, as long as they can fit in a URL. I’ve done that in several links above here if you want something to start with!