Brainstorming: relational programming in Elm

I’ve been working on a datalog implementation in Elm, just to learn how datalog works. The summary motivation: datalog is great because it provides a relational and total language and guarantees termination for finite data sets. If you do a few tricks, you can also extend it to handle negation and aggregation as well.

I’m excited to try this as the data layer in an Elm application, since you can join whatever things you need into the data needed for your view.

Let’s model a simple contact book application with this: suppose you have people. Each person has many email addresses, and people can also share an email address (for example support@fruits.com.)

You might model it like this in Elm:

type alias Person =
    { id : Int
    , name : String
    }

type alias Email =
    { id : Int
    , address : String
    } 

How do you join these two data structures? You might:

  • embed emails : List Email in Person, or people : List Person in Email (or both)
  • add id to Person and Email, and then embed a list of IDs in each of Person and Email
  • make another joining structure ({ email : EmailID, person : PersonID }) to join them

All these are reasonable, depending on the constraints of your application and how OK you are with things getting out of sync (e.g. with emails being associated with people but not the other way around.)

But to be honest, I can way more about the experience that the person using my apps get than I care about the fine-grained semantics of how data is stored. I more-or-less just want a database that I can query instead of having to join things on IDs in Dicts or whatever.

This is where datalog shows up! It lets you specify a bunch of tuples (which, uh, aren’t typed by default… although typed datalogs do exist!) You might go from this Elm:

people : List Person
people =
    [ { id = 1, name = "Brian" }
    , { id = 2, name = "Atlas" } -- my cat
    ]

emails : List Email
emails =
    [ { id = 1, address = "brian@domain.com" }
    , { id = 2, address = "support@fruits.com" }
    ]

To this datalog:

person(1, "Brian").
person(2, "Atlas"). -- my cat

email(1, "brian@domain.com").
email(2, "support@fruits.com").

To join things up, we can define a relationship between person and email:

personToEmail(1, 1).

That’s not really that friendly, though—we probably want to present all of someone’s emails at once to the user of our program. We can do that by writing a relation:

personEmail(Name, Email) :-             -- read as: personEmail for Name and Email exists IF
  person(PersonID, Name),               -- … there's a person named Name
  email(EmailID, Email),                -- … and an email named Email
  personIdToEmailId(PersonID, EmailID). -- … and their IDs appear together in the join

This involves a few new concepts: :- means “implies”, as in “the things to the right of me imply the thing to the left of me.” Multiple clauses in a rule (separated by commas) all have to be true, and variables are introduced implicitly the first time they appear.

The output of this looks like:

personEmail("Brian", "brian@domain.com").

We can also see people without an email address, or email addresses without people:

peopleWithoutEmails(Name) :-
  person(_, Name),          -- person exists
  not personEmail(Name, _). -- but doesn't have an email

-- produces peopleWithoutEmails("Atlas").

emailsWithoutPeople(Email) :-
  email(_, Email),           -- email exists
  not personEmail(_, Email). -- but doesn't have a person

-- produces emailsWithoutPeople("support@fruits.com").

At no point did I have to write my own joining plan with dictionaries or whatever! I just encoded the relationship between different domain objects and their fields, and the datalog system took care of all the logic for me!

At this point you might ask something like “but, how is that better than SQL?” Well, first of all it’s all in the client so your server can do much less work (just send up the data, or sync it from browser storage.) Second, that’s not all it can do: recursion is also allowed!

link(1, 2). -- there's a directional link from node 1 to node 2
link(2, 3).
link(3, 3).
link(3, 4).

-- you can get to node B by starting from node A
reachable(A, B) :- link(A, B).
reachable(A, C) :- link(A, B), reachable(B, C).

-- reachable(1, 4)
-- reachable(2, 4)
-- reachable(1, 3)
-- reachable(3, 4)
-- reachable(3, 3)
-- reachable(2, 3)
-- reachable(1, 2)

-- you can also make the graph here undirected (links can be walked both ways) by adding either of these two clauses:
reachable(A, B) :- link(B, A). -- to just change it for the reachability relation
                               -- or
link(A, B) :- link(B, A).      -- to extend the `link` set directly

So with this engine, you get a whole recursive database in the browser, and you know queries will terminate. I’ve got quite a bit of work left to do on this to make it fast enough to handle large data sets and massive amounts of queries, not to mention the work of making a nice Elm-flavored API on top of it, but it’s pretty good already!

So, if you’ve read this far, thank you! Now, I’m curious: what would you build with this? What kinds of data modeling challenges would you see coming up because of it?

If you want to play around with it more, you can get an in-browser interpreter here: datalog.bytes.zone. You can save and share programs from it, as long as they can fit in a URL. I’ve done that in several links above here if you want something to start with!

21 Likes

Oh, just to spark the imagination a little, here’s how you would write the same relations if you just wanted to have bags of attributes for each ID:

attr(1, "name", "Brian").
attr(1, "job", "developer").
attr(1, "email", "brian@company.com").
attr(1, "email", "brian@personal.com").

attr(2, "name", "Atlas").

attr(3, "email", "support@fruits.com").

personEmail(Name, Email) :-
  attr(Id, "name", Name),
  attr(Id, "email", Email).

peopleWithoutEmails(Name) :-
  attr(_, "name", Name),
  not personEmail(Name, _).

emailsWithoutPeople(Email) :-
  attr(_, "email", Email),
  not personEmail(_, Email).
1 Like

Thank you Brian, I very much enjoyed reading about this.

I would be interested in building a datalog model for the purpose of recording facts derived from source code, in such a way as to make that information more easily queriable.

Perhaps ‘easily’ isn’t quite the right way of putting it. Source code is parsed into an AST, and that is a top-down structure - analogous to making say Person the root of the data model, and puting Email directly inside it as a field. Ok when you want to find someones email address, not so good when you have the email address and want to find the person.

Similarly with code. The top-down AST structure is good when you want to do something like say iterating over a block of code, and all child nodes underneath it, for the purpose of pretty printing it, or maybe compiling it and so on. Its less good when you want to query it in a different way - for example say I want to query over a whole codebase and find all usages of a particular function.

I think this might have been posted here before, gives a good idea of the kind of setup I am interested in:

https://petevilter.me/post/datalog-typechecking/

===

I am writing queries over the AST for my Salix language, and ended up pulling out re-usable query functions as I went: Query - salix 4.0.0

This kind of querying over the AST has the advantage of being simpler to get started with particularly when I am at the stage of experimenting with the language design. In practice, I end up using these queries combined with ad-hoc code that can be very complex and time consuming to write (For example, elm-aws-codegen/AWSStubs.elm at master · the-sett/elm-aws-codegen · GitHub). With a datalog model maybe I can crack open the AST in such a way that I end up with a query API onto it that is more complete and lets me tackle those more complex queries in a simpler way.

My Salix work actually derives from an earlier project I did, which was mostly written in Java. But the data modelling language was translated into Prolog facts and the queries written in Prolog. I made a mistake on that project of storing facts in Prolog as lists of name/value pairs. The Prolog code ended up a lot like functional code doing recursion over lists, rather than letting the Prolog engine figure out the querying. I did at one point have the idea of refactoring the Prolog code into flat tuples - also known as Datalog - and I wish I had done it that way in the first instance.

===

Some thoughts:

  1. Great to start out with untyped datalog to get a feel for how it works and how to write a query engine for it in Elm. But Elm is typed, and I think to be really useful in Elm it needs to be able to fit into Elms type model. This presents some difficulties. For example, if everything is stringly typed the datalog model can in the first instance just be a Dict Int (String, String) or maybe Dict String (Dict Int String). Elm is strict with typing so a typed version won’t let you have a single database - or else you need to encode/decode to some uniform representation such as String or Json.Encode.Value.

  2. A really slick query API in Elm is what would make this great to use. Would love to see a draft of the query API as it takes shape. Some kind of query combinator DSL?

2 Likes

me too! That’s something I’m really excited about with this work.

I’ve seen Pete Vilter’s post on this—that’s one of the things that motivated me to really dig in here!

This looks really neat. Thanks for sharing, and I’m looking forward to seeing where it goes!

You’re right that it needs some type information. The implementation is closer to Json.Encode.Value than String. The relevant code looks like:

type Term
    = Constant Constant
    | Variable Variable


type Constant
    = String String
    | Int Int


type Variable
    = Named String
    | Anonymous

It’s definitely possible to add more stuff to Constant!

I’d love suggestions here, especially if you could write out a program in your suggested DSL!

Right now, this program:

human("Socrates").

mortal(thing) :- human(thing).

Is written almost like this in the Elm DSL:

import Datalog
import Datalog.Atom exposing (atom)
import Datalog.Negatable exposing (positive)
import Datalog.Rule exposing (fact, rule)
import Datalog.Term exposing (string, variable)


Datalog.program
    [ fact (atom "human" [ string "Socrates" ])
    , rule (atom "mortal" [ variable "thing" ])
        [ positive (atom "human" [ variable "thing" ]) ]
    ]

I say “almost” because rule actually returns a Result. My test code strips that off and crashes on failure :laughing: That’s part of why I’d like a better API! The other part is… well, my gosh, those imports. Wow.

Is your WIP code available on github? Would quite like to have a go at trying it out, that might help get me into it far enough that I start having some ideas.

Ah wait - I found it: brian/bad-datalog: A bad datalog, which hopefully will become better over time. Maybe someday it'll even be released as a package! - bad-datalog - Git in the Bytes Zone

Will have a go.

1 Like

For those like me who are both fascinated and ignorant, here is a fantastic primer in Datalog:

x775 - Introduction to Datalog

Many thanks Brian for bringing this to Elm, I can look forward to many sleepless nights.

4 Likes

I think this might be more interesting than actually useful, but the Berlin Functional Programming group just hosted a talk about using Souffle (a typed datalog) in Haskell: Leverage the Power of Logic Programming With Souffle-haskell - YouTube

There is plenty of stuff about API design & DSLs for datalog in there, although I suspect a lot of it probably relies on Haskell magic that might not be possible in Elm (I don’t really know enough about Haskell though).

Oh, cool! I’ve done a little prototyping with Souffle. I’ll have to watch that, thank you for sharing it!

Hey @rupert did you have a chance to sketch out any API ideas? My current module architecture is so full of holes that I need to find something else and I would love to collaborate on something nice!

Not had a chance to look into yet.

All good. Thank you!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.