Elm, Postgres, and Nothing Inbetween

Hello friends, happy 2024! :confetti_ball:

This year, I’m planning to open-source a product that I have developed and managed as a business for many years. A communities-of-practice management platform, similar in spirit to Discourse or Google Groups, for somewhat less general market.

The app contains a significant amount of Elm code, used in production starting from Elm 16 (now updated to Elm 19), and has been exposed to approximately 1 million end users over time. In these last 7-8 years about 70% of all user-visible features became Elm. (The service launched in 2004, so it still includes legacy code that I don’t intend to open-source. The Elm portion works standalone, albeit with fewer features).

A few months ago, I decided to sunset this business and open-source the code. No particular goals with open sourcing, just seems like a nice way to close a chapter. If anything, could be a data point in “Elm used in commercial production” argument. Or, dreaming a bit, with enough contributors it can become one of the Elm reference codebases.

Never open-sourced anything before so I’d like to take a gradual approach and “market” it to you in hopes of getting feedback, generating interest, and potentially finding interested contributors.

A good first step appears to be hosting an online meetup to share the story, demonstrate what’s available, and delve into details relevant to this community of practice, in hopes of receiving advice on how to proceed.

The most attractive part, the one I’m the most proud of, I’ll show first: Elm talking to Postgres directly in an efficient way, with nothing inbetween. Put into production back in 2017, has that nice “proven in production” stamp.

This in turn is connected to the code generation tool I had to develop to generate protocol parsers from SQL queries. The approach somewhat differs from codegen solutions I’ve seen in this community, so hopefully brings yet another point of view to the subject.

I’d like to present this in an interactive online meetup on any day from January 30th onwards.

If this proves to be well received, we can have follow up meetups to inspect other parts of the product and discuss how to best proceed.

Technicalities

Those of you with experience, please help with advice how to go about organizing such a meetup.

Is it enough for everyone to express interest here in comments, or are there usual ways it has been done in this community?

Presumably, we’ll need a platform similar to Google Meet. I’ll secure a subscription that accommodates the size of our interested audience. (I’ll make this subscription available for the community to host other meetings if there’s interest in the future).

Is there a particular day of the week that works best for most interested participants? Has anyone tried multiple sessions to accommodate multiple time zones?

35 Likes

Very nice of you to share your work like this! I hope I’m able to attend the meetup when it happens.

I do know there’s https://meetdown.app/group/10561/Elm-Online-Meetup which meets semi-regularly, that might be a practical place for you to share. @supermario is the organizer and might have more information.

4 Likes

I’d love to hear more about the code generation, sounds very cool! I hope there’s a recording or some written material for those of us that miss the presentation.

1 Like

Hello friends, the date is now set for Wednesday February 7, 2024 at 20.30 CET / 7.30pm UK / 2.30pm EST. I hope you can make it!

@supermario has graciously accepted to emcee the event. It will be 5.30am where he is at!

The format will be more or less like the last Elm Meetup (the one with breakout rooms in Google Meet instead of discord), except with only one speaker.

@supermario will shortly set up the usual Meetdown link for you to sign up. He’ll announce the link in this thread.

We’ll record the event.

Topic

The first showing of the product I intend to open source: a niche community management tool with Elm frontend and Postgres backend. The product presently has no name (it’s offered as white label SaaS) but by the time we meet I’ll hopefully have sorted out the trademark for the name of the open source version.

The demo will be Talking + Live demo + Maybe Slides. Source might not be immediately accessible as there are details to check in terms of rights, licenses, existing contracts, etc. and I don’t yet know how long will that take.

Agenda

10 minute presentation of the project and its history, then 15-ish minutes getting into the details of the Elm/Postgres communication and associated codegen, followed by Q&A and then hopefully discussion about open sourcing the code.

I’d like to call the event a consultation: while I’m keen to show my work and hear comments, I’m even more excited to learn from everyone about open sourcing code and managing an open source project.

Very excited to connect on February 7! Hope you can make it!

3 Likes

As a prep for the presentation: in my experience, talking about the details of how Elm works with Postgres is fairly underwhelming unless one has (good or bad) experience with connecting a frontend with a database.

Wondering what some of you use for database connections? Presumably, graphql is popular–either with roll-your-own nodejs backend, or a “graphql server.” There were a few mentions of postgrest a few times, so I presume there’s also that.

Also, does anyone build their backend using Postgres language sql and language plpgsql functions?

I don’t work on our backend at work but we do use Postgres and our backend ends up being somewhat of a “forward messages between Elm & Postgres” (by way of GQL) situation. There’s other backend stuff as well, but I’ve been quite excited at the possibility to remove those layers, or to shift them to Elm if it all possible.

Wondering what some of you use for database connections? Presumably, graphql is popular–either with roll-your-own nodejs backend, or a “graphql server.”

Yep! Hasura was a Postgres (it also supports other databases) to GraphQL server we used in my previous company. End-to-end type safety and all that niceness because GeaphQL is actually typed. More recently Supabase has become quite popular as well.

The Meetdown event is now up! https://meetdown.app/group/10561/Elm-Online-Meetup

4 Likes

Hasura is kind of a proxy for the database, but someone still had to manage all the Postgres tables and queries, etc. Was it all just GQL converted on the fly, or did you also program Postgres functions?

IME, massive entities have the institution of a DBA, but those are usually Microsoft or Java shops with massive IT bureaucracies where the culture would anyway dictate using something populist like React or Angular. Also not sure how many of those use Postgres instead of Oracle or MSSQL.

Curious how does that work in companies that work with Elm? Is there an expectation or a cultural bias for Elmers to also do Postgres (“full stack”), or is there a division of labor between “frontend people” and “dbas”?

If anyone else has any insight into this, please share.

We are using https://nhost.io. It uses Hasura/Postgres amongst other technologies. So far, it has been great.

When you say end-to-end type safety and GQL being typed: how exactly does that work from programming perspective? You don’t have to write anything in SQL? Or does SQL become type-safe?

Genuinely curious, never used GQL in production nor did we ever set up an “API server.”

Do you host your own open source version or you subscribe to a hosted plan?

Hopefully you’ll get some inspiration and won’t be disappointed with the presentation. I’ve been happily running nothing but Postgres and Nginx on the servers for years. It’s really a simple solution, nothing sophisticated, yet still feels liberating. And low latency delights me to this day!

Still, it often feels like a non-issue, very invisible. Hope the live demo conveys it well. :crossed_fingers:

1 Like

We subscribe to the hosted plan.

Hasura is kind of a proxy for the database, but someone still had to manage all the Postgres tables and queries, etc. Was it all just GQL converted on the fly, or did you also program Postgres functions?

Postgres functions are supported but I personally used it very little (only to enforce certain constraints that otherwise wouldn’t be representable in the table schema). Everything was GQL converted on the fly by Hasura.

When you say end-to-end type safety and GQL being typed: how exactly does that work from programming perspective? You don’t have to write anything in SQL? Or does SQL become type-safe?

From a programming perspective, you can think of Postgres’s relational database as being Strongly Typed :tm:. From there, Hasura generates GQL mutation and query commands from the schema (Hasura gives a pg_admin type web UI to manage the underlying Postgres schema as well as create manual relations if the built-in FK constraint isn’t enough to express some relationship). Migrations to the database schema are written in SQL as well as complex default values for a given column, but otherwise nothing is actually written in SQL.

Then in order to query or modify the DB, we’ll write GQL by hand, which is then compiled to type safe Elm code using graphql-to-elm library. The Elm code is then compared to a generate GQL Schema (pulled down from the Hasura instance, based on the underlying Postgres schema).

If I write an invalid GQL query, it simply won’t compile. And if I write Elm code that is trying to query or mutate fields in a way unsupported by the GQL endpoint/Postgres schema, that also won’t compile.

Every time I make a Postgres migration that breaks the existing code, the GQL schema changes, which cascades down to the auto-generated Elm code and Elm will refuse to compile, and I have to fix whatever changes in the schema was made.

Essentially, at any given point, there are 3 moving pieces to maintain:

  1. Postgres schema
  2. GQL queries
  3. Elm code

If any of these don’t agree, the program won’t compile.

Curious how does that work in companies that work with Elm? Is there an expectation or a cultural bias for Elmers to also do Postgres (“full stack”), or is there a division of labor between “frontend people” and “dbas”?

Would love to know this as well, but I’ve never worked in a large corporation (largest company I worked at was 600 when I left, 350 when I joined). All I can say is that in the tech echo-chamber of Silicon Valley, most companies require engineers to be full stack. I’ve never worked with a DBA in my 12 years of professional development and have done some deep dives into the inner bowels of Postgres myself to resolve performance issues or odd behaviours here and there. And I’m just an generic “software engineer”.

Hi! I do have a general question about this interesting concept: if you don’t have a backend, how do you prevent malicious users from executing malicious SQL queries?

I’m not sure I will be able to attend the online meetup, but there seems to be some confusion around the date: the meetdown event says Tuesday February 6th while your post says Wednesday February 7th. Can we get a confirmation of the day it’s happening on?

The correct day is Wednesday, Feb 7, the usual time slot for Elm Meetup.

Meetdown now correctly shows this date as well.

Regret the inconvenience this confusion might have caused.

2 Likes

You don’t send SQL query text from the client. Instead, you send some kind of query ID and parameters to the server, which then executes the query based on that ID.

To execute an extended query using Postgres wire protocol, a client has to send three messages: Parse {query text, parameter types} , Bind {numparams, param values}, Execute.

We keep the P{query text, types}, B{numparams} and E message fragments on the web server, giving each query an opaque 16-byte ID. All queries are packaged together into a read-only hash table and the single file is copied to a web server that listens for query requests.

The app POSTs a binary message body starting with the ID followed by all parameter values in the order expected by the Bind message.

The web server looks up the query based on the first sixteen bytes of the request body and rebuilds the original P/B/E sequence by zipping the parameter values into the B fragment. The number and types of parameters in the request body must correspond to what was stored in the query hash table. The rebuilt sequence is sent off to Postgres over TCP. No db driver involved. I picture this to myself as a compiled driver.

Postgres will only ever see valid and authorized queries exposed through our API definition hash tables.

The wrapping/unwrapping of the wire protocol into/from HTTP happens in-process in an Nginx module. Nginx also has a connection pool that we hook into, so we don’t need a connection pooler process.

The module doesn’t have any smarts: it literally looks up the value in a hash table, merges the request body into a designated slot, and puts the combined bytes on the wire to upstream Postgres. Being in-process, it has no latency penalty of FCGI or a proxy hop. It doesn’t even build the hash table on startup, just loads a precompiled one into memory.

The web server can handle any number of these hash tables. We can separately package admin-related queries, user-related, or public (things like login, register, etc). Each api will have its own web server location, letting us configure security checks using established practices such that for example a non-admin user can never touch the admin-related api, nor can the non-authenticated user touch any apis that require auth. We can rate limit, etc.

Each location can connect to the db server with its own credentials, so that even if a malicious developer somehow sneaks in backdoor API request into a, say non-authenticated API hash table, it might not have privileges to create damage in the db.

1 Like

My previous reply feels out-of-touch with Elm. Let me try to correct that with a philosophical version. :grinning:

When it comes to stuff exposed to the web, we don’t want the client app to ever know the text of our queries*, our secret keys, or similar details. Don’t trust the client is our maxim as web developers.

Also, clients only speak HTTP. Database servers prefer their own protocols.

Ever since 1989, programming db-driven web apps went the same way: the browser sends some serialized values with some moniker (cgi script name, php file name, REST url, query string parameter …) to the server, let some code behind the web server associate that moniker with a query that the DB server understands, and pass it on through a db driver to the database.

As we started moving more and more stuff to the client circa 2004, writing backends started to feel tedious to write. Granted, javascript is as dreadful as most backend languages, tedium mostly came from having to do stuff twice. Then Elm’s elegance created a stark contrast to the boring backend implementation languages and made it even more painful and tedious to even think of these things.

Tedium is a programmer’s midnight oil, so inevitably a “generic backend” was borne, exemplified in the likes of Postgrest and “GraphQL servers”.

A relief for sure, but “generic backends” are still backends, implemented with some runtime/language that speaks to the database through a stock db driver. It must make up for the genericness by working harder on interpreting its inputs, so generic backend == a giant interpreter in the cloud. A db driver is also an interpreter, so we get a giant double interpreter. Tad bit less tedium points for not having to write things in double, but plenty more complexity and latency.

It occurs to me that we always used the defaults we had within reach to convert from requests the browser knows how to send into requests the db server requires: from unixy text lines, to XML that Java and .NET proselytize, to JSON only because the parser was built into both the browser and Node. Server side, we went from inetd/CGI/Perl, then FCGI and ruby/php or WSGI/python; we developed full web servers in scripting languages to get rid of FCGI, then required a reverse proxy (nginx, haproxy, …) hops to make up for their slowness. Ditto for “cloud functions” and microservices. But we were always stuck accommodating the default parsers to feed the beast of the backend that increasingly did nothing more than forward database queries/replies.

Elm comes with elegant binary encoding/decoding in the standard library. It’s a textbook definition of visionary to create a nice default—before it has obvious applications—that elegantly nudges users towards new solutions.

It was the existence of binary encoders/decoders that guided me towards a path to eliminate the backend.**

Ever since, whenever I see those occasional Why hasn’t Elm changed in so long?, I think to myself How much more potential is there that we didn’t yet put to use?


(*) GraphQL foregoes the dictum the client shall not know your query text in the name of flexibility, paying for it in increased complexity and latency to protect against malicious queries.

(**) To be clear, I don’t think of myself as inventing electricity or such; I simply used what was already there. While I am proud of the solution and find it very liberating, I’m certain other people came to the same logical conclusion following a simple elimination game.

8 Likes

Hello Friends! We’re still on track to meet on Wednesday.

The agenda remains as announced: first, a few notes from the 20-year period to frame the deliverables and technology choices (will talk at 1.5x to keep it under 15 mins :smile:). As I think how to compress 20 years into a short time, most what comes to mind are more general observations, more-or-less “CTO perspective.” I’ll only say things that I can link back to Elm so I’m hoping it’ll be interesting even if that’s not your angle. That’ll all lead to my favorite, Postgres communication and then codegen. Then finally I’ll talk shortly how I’d like to contribute to the community.

Does the reply I gave about protocol sound too technical? Irrelevant? Please let me know in comments or during the presentation, I can provide a more general overview of the protocol with a few slides or a quick wireshark session. From experience I know that it’s hard to get excited about something invisible, hence the story to help frame it.

I took a look at the code I want to show and the codegen tools are rather old, 2019! Not nearly as polished as the most excellent elm-codegen. Still, I believe I’ll be able to convey how and why I think it’s a complementary approach. Sadly, more words than live code, but if it turns out to be interesting to anyone, we can then organize a session to delve into it and I can commit to elaborate a prototype further.

Looking forward to learn about your perception of open source in general. One thing I’m very curious to understand is the present situation. As an avid reader of ‘Made with Elm’ and listener of Elm podcasts, my first impression is that most of the open source projects are either tools or hobby apps, almost always written and maintained by one person, at most two. Is that your impression, too?

When I first thought of open sourcing the code I was imagining a multi-contributor project. That certainly won’t be easy nor quick! If you have advice and thoughts of how to attract contributors and create a community around it, I’d love to learn more! Either in comments, DM, or during the meetup or after.

Looking forward to meeting tomorrow!

5 Likes