Ideas on how to improve Elm's packaging in a potential future version

Hi, I recently had an interesting exchange with Jeroen Engels on (among other things) Elm’s packaging and decided to write down my thoughts. Would love some feedback and thoughts on the proposed additions.

2 Likes

Elms package system gets some things right - particularly automatic enforcement of semantic versioning is its strong point. Maybe other languages have this too? But not many have done this so well as Elm has.

Elm piggy backs on github to serve the packages, which makes sense in terms of effort required to create the package system, and cost to run it. So it makes sense to me that Elm has the system it has just from the economics and time involved to get a reasonably robust system working.

Centralization cuts both ways, some might prefer more freedom to manage packages in different ways, but centralization has helped the communities effort to grow the number of packages and to be able to find them. Centralization may have had the most benefit earlier on.

Some of the downsides are:

  • Renames or tag removal on github repos mean packages can dissappear. Its a shame when this happens because repeatable builds are a real strength of Elm, occasionally let down by this.
  • Github can go down occasionally, leaving us without a plan B.
  • Github threatens to change how checksums are calculated occasionally, which would really mess things up. They never have actually gone through with this though.
  • elm publish only does a local build. On non-case sensitve filesystems, this means packages can be published that do not build on case-sensitive file systems.
  • There is no easy support for private packages. Of course you can always just keep your code private, but it might be useful to have semantic versioning of packages shared just within 1 organisation, for example.

Here is a proposal I wrote up a few years back which discusses some of the features I think could be beneficial: Elm Package Server · GitHub

2 Likes

By now most languages come with their own dependency manager

There are new languages that are foregoing packages management, e.g. Odin. I think starting from a basis that includes that is worthwhile as dependency management is not one-size-fits-all.

The dependency management is, depending on the language, one of the most-used tools after the compiler/interpreter

Most projects arent having their dependencies managed often. If you’re writing tests, you’re likely running those 1+ times for every significant change. You’re likely running your linter/static analysis tool many, many times per session, particularly if you’re using editor integrations. You might be able to make an exception here for JS as the dependency trees can easily reach into the hundreds and any 1 of those can cause a cascade of changes. Rust may also be an exception as cargo is modeled after npm.

[Elm] combines the worst from all other dependency managers

I’ve used far worse, and Elm has good features that nearly no other dependency manager has, such as automated version bumping. I do think Unison does better here though.

Elm is dead

You deride Elm’s use of GitHub and then claim Elm is dead based on a tool that uses GitHub activity as 20% of the reasoning for that “dead ness”.

1 Like

In fact, the namespaces feature was in part inspired by the “Package Domain Tags” section. Though I took the concept and just dragged it straight into the code.

I would beg to differ. The dependency manager is usually invoked on every build to fetch missing dependencies (Go, Rust, Elm). Having it invoked separately, might actually be the outlier (outside of adding dependencies).

Unison seems interesting. If I understand correctly, it’s bringing its own repository format+tool as well. Not being able to release a version from my work environment (at least the docs only speak of drafts) sounds somewhat tedious, even if it’s just visiting the website to remove the draft status.

I think the dependence on GitHub is, in fact, a bad thing, yes. As the most popular code hosting platform, it makes sense though to gauge the activity of a language based on it. As stated, the inability to use an alternative platform for hosting code, also makes it harder for Elm to be used in certain industries (medtech for example, each and every dependency is vendored on a company server and retrieved from there only after an extensive review).

It might be invoked on every build (if you’re online, not if you’re offline), but I think that’s more coincidental than something I’d categorize as managing dependencies. For me managing dependencies is more about adding, removing, and upgrading and not the act of downloading files. I’d say that’s more of a subset of the actions taken during the process.

A bad good example

When I was at Vendr we had:

  • about 600K lines of Elm
  • about 1.2M lines of TS
  • roughly 30-40 total dependencies in our elm.json, and an additional 1-2 vendored dependencies
  • I don’t even know how to count how many TS dependencies, likely hundreds

Super-centralisation

Centralization makes it easy to find common projects. GitHub might not be one’s choice today, but there’s a lot more that goes into this than “M$ bad” (and yes I agree on the disliking M$ approach to tech/github).

But we can just add the dependency using a link to the repo, right?

Yep! We can do that easily with git subtree (a resource I’ve used for learning this Git Subtree: Alternative to Git Submodule | Atlassian Git Tutorial), an option that’s been around for over a decade now and works very well.

risk of the Elm registry “disappearing”

This is definitely a risk, bot not really any more so than with any other centralization. I’ve had builds fail in a variety of other languages (Go, JS) due to their centralized repos going down. And unlike say npm, Elm defaults to caching dependencies meaning you can almost always rely on that cache being present and are only hitting the centralization issue when you blow away your cache.

No integrity check Not good enough integrity checks for a small subset of possible use cases

Easily and quickly solved with vendoring. If your project needs absolutely certainty that things aren’t changing out from underneath you, you should be vendoring.

Use SemVer!

Elm does and better than most others!

no development packages

I suppose this is mildly annoying, but in practice it doesn’t really matter. I’m not speculating here, this has been my experience both in Elm and non-Elm languages. Some companies are strict about versioning, but I find it’s more of a formality and adherence to process for process sake than actually looking at the content of the versioning. There are exceptions, but I’m talking about the majority of practice.

No maintenance tools

Not sure what this section is about. There are maintenance tools like elm-json.

the only way to update the description is releasing a new version.

I’m not sure I’ve seen a package in other projects update the description of an old release. Maybe that’s a thing? Usually when you’re dealing with important security issues there’s a place to get information. In practice Elm projects don’tt have that many dependencies so it’s easy to follow them, if that’s what you want to do.

A real world example of this that I’ve implemented is depending on timezone-data 11.1.0. You don’t want your TZ data to be behind (I’ve experienced this), so I added a tiny step to our CI to automatically pull down the latest version and add it as a dependency. In ~4 years at Vendr, this script was used about 10 times. I think you could spend another 2-3 paragraphs on the benefits & safety of this, but succinctly it was an easy and huge benefit.

No separated namespaces

This can definitely suck, and elm-units-prefixed 2.8.0 is a terrific example! I think I’ve also encountered this once in the past 7 years of writing Elm.

No Licensing information

There’s at least 2 tools for this that have been around since at least 2019

I built mine after having to deal with this issue with JS for work back in like 2017 / '18. It was far easier to trust a tool to give a quick and useful report than to hope a developer read a website/file.

Alternatives

simply imports a git repo

This is essentially vendoring, and I’m a huge fan of it. I vendor in Elm, in Odin, and JS. I recommend it to anyone that’ll listen.

add a checksum

Seems reasonable I suppose

Allow for custom registries

Nice to have I guess, but definitely not make or break. It’s also been proven in Elm, JS, and other languages that the community can do this when they want to.

Package maintenance through a file

mistakes can quickly be remedied

I’m not sure this has to do with dependencies. Feels like more about the language itself. E.g. if a vulnerability is found and you work at “place that only allows vetted dependencies” then you still have to go through the whole process of vetting the dependency changes.

Import-Aliases

My guess is this would be a whole language change and not just a dependency management issue, but I could be wrong.

Attestation

IMO this gets weird quickly. Specifically thinking about things like

I wouldn’t want a review by user1968431 to have the same weight as one by Google.

and incidents like left-pad where a “reliable” person took down a huge repository. There are other examples too, but that’s a common enough one that I expect most to have heard of it.

How to store artifacts/sources

Is there a problem with how Elm stores source today? Is ELM_HOME bad in some way? From what I can tell you like the Elm approach and maybe aren’t aware of it but it’s hard to say.

Inspirations

Rust / cargo

Cargo isn’t all roses either, see Why doesn’t Rust care more about compiler performance? | Lobsters. Cargo is also heavily based on npm, which I would categorize as one of my least favorite package managers.

Go

Their use of essentially vendoring but using the CLI to do so is interesting but not really necessary. I also don’t fully trust to Go team when it comes to things like versioning as they’ve broken versioning norms before by releasing minor versions of the CLI that included breaking changes.

Unison

There’s not much else like it and I think there’s still a lot to be explored and learned here.

Roc

They’re also trying some novel ideas that could maybe be learned from.

I was after something a little different than namespacing with the domain tags idea:

"A potential alternate compiler for the Elm language, as opposed to The Elm Architecture, may support only a sub-set of these tags. This feature is not useful at the present, but is included now, so that it is part of the package server API from the start.

The intention is to be able to have an Elm build system that only supports say core, for running Elm code outside of the browser, with no virtual DOM to render as a view.

A future version may cater for private tags. For example ACME Corp can tag all its packages as com.acme, making it obvious when code depends on that private package domain. The private package domain at the head of those packages will not allow them to be published upstream, unless the private tag is removed by using only packages with public tags."

Another use might be if Elm was ported to some other runtime target. For example, if I was to compile Elm to run on the JVM, at a minimum I would want to support the “org.elm-lang.core” there. Any packages that only depend on core would also be able to run. If more of the standard libs were ported then support for larger subsections of the standard runtime would also work, as would any packages that depend only on whatever is ported.

Its quite similar to the idea of “platforms” in roc. A way of tagging the Elm standard lib with a view to splitting it into support on a variety of different platforms, as well as the possibility of introducing new private domains or a new standard library.

2 Likes

I agree that for small-scale projects, subtrees are a decent option. But not everyone wishes to vendor all their dependencies (be it to keep their code search more on topic, or just to have a leaner repos). Subtrees have also confused some colleagues I’ve worked with (mainly because they were unaware they exist, and their graphical client did a terrible job at displaying them)

I have since revised that section based on feedback.

Retracting versions, marking known CVEs (not that it is likely to happen that often). In general updates to package metadata. Sure, manually following the sources is an option, but not ideal, in my opinion.

Yes, this section has been removed. I’ve mistakenly taken an application elm.json and extrapolated from there, which was a mistake.

I am not sure myself, but since the “namespace” is defined in the pseudo-package-definition, I thought it might be worthwhile to mention.

my go-to example for why I dont like central registries :slight_smile:

This was more about the aptly named “elm-stuff” in the project. Personally, I would much prefer the build cache to be in ELM_HOME

I still don’t understand the reasoning of designing cargo around it. It seems like a terrible example to follow in most metrics.

Unless the registry is designed to be “append only”. Once something is there, its there forever.

Which is a bad idea for that exact reason. Thats just inviting supply chain issues.

1 Like

It is? You mean if a bad package is published it would never go away? I can think of ways of ameliorating that issue. For example, a blacklist that marks things as bad dependencies requiring explicit opt in to override.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.