So, I’m here researching for an article I want to write on “Scaling Elm Web Applications” and I come across this 8 year old comment by Evan. It’s not the first time I’ve seen him make a comment like this so I find it very telling that he has to repeat this over and over. Do software developers not know about data abstraction? Are they not taught that anymore? Data abstraction is what Evan is describing in his comment. Data abstraction is what he described in his presentation “The life of a file”. (Yes, and by Richard Feldman as well in “Make Data Structures”.)
When I was taught Pascal programming in secondary school in the 90s we learned about data abstraction. It was one of the key concepts that was stressed. Build your program structures around your data structures. Pascal programmers valued structured programming, modularity, and data abstraction. You can shrug at Pascal syntax all you want but those who took the time to learn the language were also exposed to programming practices that stood the test of time. On reflection I realized that I’ve been implicitly using those ideas I learned in Pascal in my Elm programs.
Well as you can imagine I got sidetracked and decided to look into books on Pascal programming to see what they taught developers at the time. I found a book called “Data abstraction and program development using Pascal” from 1988. Here are a few excerpts:
This book integrates data abstraction and the use of abstract data types in programming into a course on data structures and algorithms.
… the main theme is the useful guide to the design of programs in modular form provided by data abstraction. The choice of suitable modules out of which to build large programs is the main problem in top-down design or step-wise refinement. Once the modules have been designed, programming can proceed at a much higher level than that provided by the usual programming language constructs.
There has been extensive discussion of these issues in the literature. Over the past 10 to 15 years it has become increasingly clear that programming must be liberated from the burden of implementation detail. This means that low-level hardware features and peculiarities of particular programming languages should be kept out of programming for as long as possible, noting that the semantics of a program are determined only by the semantics of the data. Programming should therefore take place in two stages:
- The design of the abstract data types which are required and of the algorithms for the data types. This is the actual problem solving part of programming.
- The representation of the abstract data types and of their algorithms in a programming language.
The first stage begins with the recognition of the information which is available. Decisions must be made about the operations which will be performed on the data. In other words, programming begins with an abstract view of the data to be used by the program. This is the most important phase of programming because all decisions about modularization of the program ultimately depend on it.
The book demonstrates how the use of abstract data types can lead to the design of programs which are modularized in a way that makes them easy to understand and easy to modify. M.A. Jackson in Principles of Program Design (Academic Press, London, 1975) stated that the structure of the data should be reflected in the structure of the program.
… A good modularization becomes a natural consequence of the use of abstract data types.
Are you kidding me? He said “Over the past 10 to 15 years it has become increasingly clear that programming must be liberated from the burden of implementation detail.”. So that’s what since the 1960s. And we’re still programming in languages like JavaScript, Ruby, Python, etc. that could care less to allow you to hide your implementation details (i.e. no opaque types). Anyway, rather than get sidetracked let’s dig into this book he mentions “Principles of Program Design” from 1975. Let’s see what it says:
page 10, … program structures should be based on data structures.
page 11, … If, therefore, we give our program the same structure as the data it processes we can expect that we will have no difficulty.The design techniques.
- We start by considering the data structures, which we then use to form a program structure.
- We list the executable operations needed to carry out the task.
- We allocate each operation to a component of the program structure.
The quality of the work we do as we take these steps will determine the quality of the programs we write.
This must be a joke. And this book gives all its examples in COBOL. When did all this useful information get lost? So, you’re telling me that in 1975 COBOL programmers knew how to design better programs than present-day JavaScript developers. Are we too focused on programming language syntax and features and not enough on the timeless principles of program design?
Out of curiosity, I skipped ahead to Chapter 5 on “Errors and Invalidity” to see what he had to say about that.
Error processing accounts for a high proportion of the program code in a data processing system.
… We will use the term “error data” for data containing errors of this kind, and we will contrast error data with “good data”. The distinction between error data and good data is meaningful to the user of a system: broadly, good data is what he tries to present to the system; error data is what he presents when he makes a mistake. But no such distinction is meaningful to the system itself. From the point of view of the system and its programs, both error data and good data require processing, and the processing must be correct according to the specifications.
… Because error data must be processed correctly, just as good data must, we must design our programs to take account of error data. And this means that the data structures, on which the program design is based, must also take account of errors. It would be wrong to design a program to handle only good data, hoping to fit the error processing into a structure determined solely by the good data. The result would be a partially designed program–partially correct, partially intelligible and partially maintainable.
Pure gold. Impeccable deduction. You can expect that I’m going to be drawing inspiration from these books for my future articles.
Data abstraction by example
Tic-tac-toe
You can play a full game without a UI.
Wordle
- Bag: also known as a multiiset.
- Dictionary
- History
- Letter
You can play a full game without UI.
Calculator
The calculator is fully unit tested and driven by tests instead of by a UI. No need for elm-program-test, not to imply that it is never needed though.
And more…
SICP
SICP has an entire chapter on “Building Abstractions with Data”.
Java
I recall learning about it in Java as well. For e.g. here’s some notes on it from a Princeton course.
The Clean Architecture, Hexagonal Architecture, Onion Architecture, Screaming Architecture DDD, etc.
Seems to me to be derivable from data abstraction. It’s couched in OOP but has nothing to do with OOP. Read Domain Modeling Made Functional by Scott Wlaschin to learn more.
Domain modeling
… Domain Driven Type Narrowing, CQRS, state machines. Just start with data abstraction.
A short history lesson
I recently discovered this and found it quite interesting. Barbara Liskov introduced abstract data types and the principle of data abstraction.
- TED Video: How Data Abstraction changed Computing forever
- Paper: Programming with abstract data types
Conclusion
Data abstraction is a useful concept that transcends programming language paradigms. It’s a technique that software developers learned a long time ago to help them tame complexity in large software systems. Computer scientists have traditionally shown how to use the concept with various data structures like stacks, queues, trees, and graphs but it extends beyond that to other domains as well.
P.S. These are unpolished thoughts that still need refinement but I couldn’t help but share my findings. Back to thinking about scaling Elm web applications.