Hi, I recently had an interesting exchange with Jeroen Engels on (among other things) Elm’s packaging and decided to write down my thoughts. Would love some feedback and thoughts on the proposed additions.
Elms package system gets some things right - particularly automatic enforcement of semantic versioning is its strong point. Maybe other languages have this too? But not many have done this so well as Elm has.
Elm piggy backs on github to serve the packages, which makes sense in terms of effort required to create the package system, and cost to run it. So it makes sense to me that Elm has the system it has just from the economics and time involved to get a reasonably robust system working.
Centralization cuts both ways, some might prefer more freedom to manage packages in different ways, but centralization has helped the communities effort to grow the number of packages and to be able to find them. Centralization may have had the most benefit earlier on.
Some of the downsides are:
Renames or tag removal on github repos mean packages can dissappear. Its a shame when this happens because repeatable builds are a real strength of Elm, occasionally let down by this.
Github can go down occasionally, leaving us without a plan B.
Github threatens to change how checksums are calculated occasionally, which would really mess things up. They never have actually gone through with this though.
elm publish only does a local build. On non-case sensitve filesystems, this means packages can be published that do not build on case-sensitive file systems.
There is no easy support for private packages. Of course you can always just keep your code private, but it might be useful to have semantic versioning of packages shared just within 1 organisation, for example.
Here is a proposal I wrote up a few years back which discusses some of the features I think could be beneficial: Elm Package Server · GitHub
By now most languages come with their own dependency manager
There are new languages that are foregoing packages management, e.g. Odin. I think starting from a basis that includes that is worthwhile as dependency management is not one-size-fits-all.
The dependency management is, depending on the language, one of the most-used tools after the compiler/interpreter
Most projects arent having their dependencies managed often. If you’re writing tests, you’re likely running those 1+ times for every significant change. You’re likely running your linter/static analysis tool many, many times per session, particularly if you’re using editor integrations. You might be able to make an exception here for JS as the dependency trees can easily reach into the hundreds and any 1 of those can cause a cascade of changes. Rust may also be an exception as cargo is modeled after npm.
[Elm] combines the worst from all other dependency managers
I’ve used far worse, and Elm has good features that nearly no other dependency manager has, such as automated version bumping. I do think Unison does better here though.
Elm is dead
You deride Elm’s use of GitHub and then claim Elm is dead based on a tool that uses GitHub activity as 20% of the reasoning for that “dead ness”.
In fact, the namespaces feature was in part inspired by the “Package Domain Tags” section. Though I took the concept and just dragged it straight into the code.
I would beg to differ. The dependency manager is usually invoked on every build to fetch missing dependencies (Go, Rust, Elm). Having it invoked separately, might actually be the outlier (outside of adding dependencies).
Unison seems interesting. If I understand correctly, it’s bringing its own repository format+tool as well. Not being able to release a version from my work environment (at least the docs only speak of drafts) sounds somewhat tedious, even if it’s just visiting the website to remove the draft status.
I think the dependence on GitHub is, in fact, a bad thing, yes. As the most popular code hosting platform, it makes sense though to gauge the activity of a language based on it. As stated, the inability to use an alternative platform for hosting code, also makes it harder for Elm to be used in certain industries (medtech for example, each and every dependency is vendored on a company server and retrieved from there only after an extensive review).
It might be invoked on every build (if you’re online, not if you’re offline), but I think that’s more coincidental than something I’d categorize as managing dependencies. For me managing dependencies is more about adding, removing, and upgrading and not the act of downloading files. I’d say that’s more of a subset of the actions taken during the process.
A bad good example
When I was at Vendr we had:
about 600K lines of Elm
about 1.2M lines of TS
roughly 30-40 total dependencies in our elm.json, and an additional 1-2 vendored dependencies
I don’t even know how to count how many TS dependencies, likely hundreds
Super-centralisation
Centralization makes it easy to find common projects. GitHub might not be one’s choice today, but there’s a lot more that goes into this than “M$ bad” (and yes I agree on the disliking M$ approach to tech/github).
But we can just add the dependency using a link to the repo, right?
This is definitely a risk, bot not really any more so than with any other centralization. I’ve had builds fail in a variety of other languages (Go, JS) due to their centralized repos going down. And unlike say npm, Elm defaults to caching dependencies meaning you can almost always rely on that cache being present and are only hitting the centralization issue when you blow away your cache.
No integrity check Not good enough integrity checks for a small subset of possible use cases
Easily and quickly solved with vendoring. If your project needs absolutely certainty that things aren’t changing out from underneath you, you should be vendoring.
Use SemVer!
Elm does and better than most others!
no development packages
I suppose this is mildly annoying, but in practice it doesn’t really matter. I’m not speculating here, this has been my experience both in Elm and non-Elm languages. Some companies are strict about versioning, but I find it’s more of a formality and adherence to process for process sake than actually looking at the content of the versioning. There are exceptions, but I’m talking about the majority of practice.
No maintenance tools
Not sure what this section is about. There are maintenance tools like elm-json.
the only way to update the description is releasing a new version.
I’m not sure I’ve seen a package in other projects update the description of an old release. Maybe that’s a thing? Usually when you’re dealing with important security issues there’s a place to get information. In practice Elm projects don’tt have that many dependencies so it’s easy to follow them, if that’s what you want to do.
A real world example of this that I’ve implemented is depending on timezone-data 11.1.0. You don’t want your TZ data to be behind (I’ve experienced this), so I added a tiny step to our CI to automatically pull down the latest version and add it as a dependency. In ~4 years at Vendr, this script was used about 10 times. I think you could spend another 2-3 paragraphs on the benefits & safety of this, but succinctly it was an easy and huge benefit.
No separated namespaces
This can definitely suck, and elm-units-prefixed 2.8.0 is a terrific example! I think I’ve also encountered this once in the past 7 years of writing Elm.
No Licensing information
There’s at least 2 tools for this that have been around since at least 2019
I built mine after having to deal with this issue with JS for work back in like 2017 / '18. It was far easier to trust a tool to give a quick and useful report than to hope a developer read a website/file.
Alternatives
simply imports a git repo
This is essentially vendoring, and I’m a huge fan of it. I vendor in Elm, in Odin, and JS. I recommend it to anyone that’ll listen.
add a checksum
Seems reasonable I suppose
Allow for custom registries
Nice to have I guess, but definitely not make or break. It’s also been proven in Elm, JS, and other languages that the community can do this when they want to.
Package maintenance through a file
mistakes can quickly be remedied
I’m not sure this has to do with dependencies. Feels like more about the language itself. E.g. if a vulnerability is found and you work at “place that only allows vetted dependencies” then you still have to go through the whole process of vetting the dependency changes.
Import-Aliases
My guess is this would be a whole language change and not just a dependency management issue, but I could be wrong.
Attestation
IMO this gets weird quickly. Specifically thinking about things like
I wouldn’t want a review by user1968431 to have the same weight as one by Google.
and incidents like left-pad where a “reliable” person took down a huge repository. There are other examples too, but that’s a common enough one that I expect most to have heard of it.
How to store artifacts/sources
Is there a problem with how Elm stores source today? Is ELM_HOME bad in some way? From what I can tell you like the Elm approach and maybe aren’t aware of it but it’s hard to say.
Their use of essentially vendoring but using the CLI to do so is interesting but not really necessary. I also don’t fully trust to Go team when it comes to things like versioning as they’ve broken versioning norms before by releasing minor versions of the CLI that included breaking changes.
Unison
There’s not much else like it and I think there’s still a lot to be explored and learned here.
Roc
They’re also trying some novel ideas that could maybe be learned from.
I was after something a little different than namespacing with the domain tags idea:
"A potential alternate compiler for the Elm language, as opposed to The Elm Architecture, may support only a sub-set of these tags. This feature is not useful at the present, but is included now, so that it is part of the package server API from the start.
The intention is to be able to have an Elm build system that only supports say core, for running Elm code outside of the browser, with no virtual DOM to render as a view.
A future version may cater for private tags. For example ACME Corp can tag all its packages as com.acme, making it obvious when code depends on that private package domain. The private package domain at the head of those packages will not allow them to be published upstream, unless the private tag is removed by using only packages with public tags."
Another use might be if Elm was ported to some other runtime target. For example, if I was to compile Elm to run on the JVM, at a minimum I would want to support the “org.elm-lang.core” there. Any packages that only depend on core would also be able to run. If more of the standard libs were ported then support for larger subsections of the standard runtime would also work, as would any packages that depend only on whatever is ported.
Its quite similar to the idea of “platforms” in roc. A way of tagging the Elm standard lib with a view to splitting it into support on a variety of different platforms, as well as the possibility of introducing new private domains or a new standard library.
I agree that for small-scale projects, subtrees are a decent option. But not everyone wishes to vendor all their dependencies (be it to keep their code search more on topic, or just to have a leaner repos). Subtrees have also confused some colleagues I’ve worked with (mainly because they were unaware they exist, and their graphical client did a terrible job at displaying them)
I have since revised that section based on feedback.
Retracting versions, marking known CVEs (not that it is likely to happen that often). In general updates to package metadata. Sure, manually following the sources is an option, but not ideal, in my opinion.
Yes, this section has been removed. I’ve mistakenly taken an application elm.json and extrapolated from there, which was a mistake.
I am not sure myself, but since the “namespace” is defined in the pseudo-package-definition, I thought it might be worthwhile to mention.
my go-to example for why I dont like central registries
This was more about the aptly named “elm-stuff” in the project. Personally, I would much prefer the build cache to be in ELM_HOME
I still don’t understand the reasoning of designing cargo around it. It seems like a terrible example to follow in most metrics.
It is? You mean if a bad package is published it would never go away? I can think of ways of ameliorating that issue. For example, a blacklist that marks things as bad dependencies requiring explicit opt in to override.