What do folks think about the state of the Elm package index?
A few observations…
there are some packages that don’t compile (e.g. due to dependency issues), or are just plain missing as the source repos have been moved or deleted
the search functionality in the Elm package appears quite limited, making discovery harder as the ecosystem grows (e.g. recent Slack thread)
it’s possible to pollute the package index (unintentionally, one would hope) with, say, trivial forks (no offense intended, but this recent Slack thread for a fork of elm-pages is illustrative – seems like the community would have been better served with a PR instead of a published fork)
I asked about removing obsolete packages in Slack and received a suggestion to raise a GH issue fpr elm/package.elm-lang.org – which I’ve done – as well as a suggestion that it may be possible to hide some packages. If there is in fact a process to request removal / hiding of packages from the index, perhaps we should find a way to advertise it?
I’m not sure how the governance works now, and of course how it should work is an age-old debate. But it seems like at a minimum, there should be a widely accepted way to report and resolve packages that no longer exist.
I’m not sure what kind of dependency/compilation issues you’re referring to here, could you link to some examples and maybe share some error messages? This seems like it should be fixable in the case of compilation issues. For packages that are moved/deleted, that may be a separate discussion, but this does seem like it’s happened a few times and it would be nice to have immutable releases (i.e. independent of GitHub hosting). I know this would require some additional infrastructure, though.
It seems like the changes are fairly useful, but I’m not sure that it’s widely used as an alternative to package.elm-lang.org. And I’m not sure if there are any new findings since that thread, I’d be curious to hear if we’ve learned anything from that experiment.
It’s unfortunate when we have pollution in the package repository. I’m not sure there are any easy answers here, though, as deleting packages introduces a whole new set of problems which are probably worse than the problems it attempts to solve. I do think it would be nice if there was an official way to mark a published package as deprecated, though, and have a deprecation message show up similar to deprecated NPM packages. One way to handle deprecation could be to use the existing summary field in a package’s elm.json and check for a specific pattern. So if you had "summary": "Deprecated: This package is no longer maintained. Use username/package-name instead.", then the package website could display the deprecation warning for that version of the package and from the main package search. A feature like that could make it possible to add visibility to the status of a package but in a safe way that doesn’t risk installation problems due to removal of packages.
Creating an ecosystem for being able to create packages and distribute them is a really hard problem!
I’ve just attended (and presented) at the first PackagingCon 2021 that occured for the past 2 days. You can have the schedule of all the talks that happened there: https://pretalx.com/packagingcon-2021/schedule/
It was a very interesting event, full of knowledge from people handling the packaging ecosystems of Julia, Python, Rust, LaTeX, containers, linux distributions, with talks about all the subjects that matters (user interface, supply chain security, dependency resolution, funding, etc.). All the talks will be released in a few weeks I think. If I’ve time, I’d love to try make a report of all the good and the bad happening out there, and what could be done for Elm to improve the situation. I wanted to wait until the talks are published before mentioning this but obviously I had to mention it in this post ^^.
So I’d say, “go watch some of the talks there!” but instead I’ll say, “wait a bit, and then go watch the talks there!”
To be clear, the problems listed there are only related to dependency resolutions and are thus only a fraction of the problematic packages, since this does not account for failures at tarball download and build time (for example when the package author changed name, or the github tarballs changed of checksum according to how the elm binary was computing those checksums).
Maybe I can answer that, as I spent quite a bit of time last year writing a build system for all Elm packages. Code available here: GitHub - eco-pro/eco-server: Alternate Elm package server. Unfortunately, I have been too busy over the last 6 months to complete this, but I hope to come back to it soon.
As far as I recall packages can and do fail for these reasons:
A package dependency cannot be made to work for any of these reasons (take the transitive closure of failed packages).
Files in the package have incorrectly cased names, suggesting they were built on a Windows box or Mac using case insensitive file system. E.g. module is called ‘MyModule’ and the file is ‘mymodule.elm’ - fails on unix case sensitive file system.
A package is too big - builds on clean compilation, but fails on incremental build. The binary .elmo files somehow consume too much memory on the compiler on a few very big packages (codegen).
The elm.json is invalid - maybe elm 0.19.0 was more lenient in checking validity than 0.19.1.
The package is a cheeky ‘native’ one published by a version of the compiler with native restrictions removed.
I hoped to build a new index of all the packages and tag them with a build status, so you can easily filter out the stuff that is broken.
Its a big job to make a package system relatively infallible, so I can understand how Elm currently has a fairly primitive system, with all the other work that has gone into the compiler.
When building an index of packages that are confirmed to work, I notice that it is possible for package state to go from working to broken to working again without changing version. If a dependency of the package is working, a package can be working. If later a new minor or patch version is published that is broken, and the package depending on it has its dependency range set such that it will pick up the new broken package, it will also be broken. A fix to the dependency being published can un-break it.
My build of all packages processes them in publishing order by the index numbers. At the point of building index number n, I only have up to n-1 available to build against locally. This is because I am building up a copy of the package index locally, but fooling the elm compiler into running against that, rather than feeding the build from package.elm-lang.org, which could (would) pick stuff up from above n, due to dependency version ranges and new versions being published later.
Alternatively, I could build against package.elm-lang.orgor download all available packages before building. Whether or not a package works correctly can depend on the point in time this is done! So it is not a repeatable process.
Building everything strictly in the sequence it is published will definitely find stuff that is broken but later fixed - I know because I tried it both ways, but is also 100% repeatable. Unfortunately it can also miss stuff that works, but later becomes broken due to a bad dependency. Every time an in-range dependency (direct or transitive) of some package changes, that package needs to be re-built to check it still works. If the dependency breaks something earlier in the index it should be rejected so that the published index is always unbroken.
This does seem like something I could implement in my build scripts though and where your dependency solver would come in handy @mattpiz .
So you can see, building an index of guaranteed buildable packages is quite tricky! Repeatability of builds is a nice property for packaging systems to have.
@rupert I believe, exactly for the problem of reproducibility, that Go has chosen to always use the lowest bound compatible of a dependency range when resolving dependencies. This means that new package updates do not impact your code unexpectedly unless you explicitly mention wanting such new version. Another advantage is that it makes sure that your lower bounds are actually correct, which is often not the case when people do not realize they are using features that appeared in later minor release (Major.Minor.Patch). And finally, it remove the needs to pinpoint exact versions in applications, since by default they are almost equivalent to bounds when lower bounds are selected (not 100% true but close).
I think it should allow publishing packages without GitHub. It’s very unethical to only allow packages whose hosting is controlled by GitHub. I want to eventually leave GitHub because they do evil things such as censorship of repositories and unnecessary account suspensions.
Maybe the packages should instead be stored with IPFS.