I’m pretty new to the Elm world, but at first glance it seems there isn’t a strong set of tools around manipulating tabular / relational data in a declarative way. I remember seeing a talk by Evan where he mentioned one value proposition that Elm could strive for might be in-browser analytics scientific computing, specifically data visualization (see edited note). I work as a data scientist who primarily writes internal packages for other data scientists, and it seems like a DataFrame construct with declarative grammars for data manipulation and graphics (like Python’s pandas or the tidyverse in R) would be a good foundation for data science and analytics work. I wouldn’t start on this right away, but it has been swirling around my head and wanted to see what others thought.
First question: Is it true that this is a use-case that doesn’t have a strong toolset? Does this even seem valuable for the users of Elm?
I can imagine implementing a DataFrame either as a list of records or a list of dicts. There are upsides and downsides to either approach.
Using records, there is native support for varying column types, and it becomes easy to pass around accessors to row attributes using the .x
syntax. But the idea of joining two tables made of records would be difficult, since using merge syntax I would need to know the attributes of the record in advance. This fits in well with type safety but is not scalable. If I have two tables with 20 fields each and wish to join on a single key field, listing the other 19 fields during a join is tedious. I can think of some ways around this, such as treating joins like a linked list of records, but that seems like a difficult mental model to reason about.
Using dicts, the big upside is the schema is much more flexible, and could even be decided at runtime. This obviously isn’t the most idiomatic approach in a language that with a type system like Elm’s, but I think careful wrapping in Maybe / Result or some other custom monadic type could handle this and still force the user to handle undefined operations, invalid schema, etc. The biggest complication here is that to put values of different types in a dict, they would need to be wrapped in some other type. Because of this, it would probably be limited to a pre-defined set of types, most likely just numbers, strings, chars, and dates.
Second question: Do either of these approaches seem reasonable? My gut says that the flexibility of a DataFrame and performing sql-like operations lends itself to the dict approach, but I know I come from languages with a very different tradition of dynamic typing compared to Elm. I would want a solution to feel as natural and idiomatic to Elm as possible given the use cases.
Edited Note
I found the talk I was referring to, it was “What is Success” at 2018 Elm Europe. Specifically he was referring to data visualization, but I think that the way most data scientists / analysts are introduced to code-driven data-visualization is through tabular data with declarative semantics for data wrangling and plotting.