9% of Elm packages are broken

169 of 1860 Elm packages currently have their latest version broken (they can not be installed via elm install abc/def).

It’s 9%.

Plain list of broken packages:

Detailed list:

I hoped to get some useful insights from this test, but I don’t have much.
:man_shrugging:t2:

3 Likes

As far as my packages go, they are probably broken because I have since renamed my GitHub username (andreasewering → anmolitor).
Can I unpublish them somehow?

Are these the packages that are directly broken, or all packages that are broken directly or indirectly?

By directly, I mean that the actual package is no longer present on GitHub, or has been renamed (user or package) on GitHub, or no longer has the expected package hash on GitHub due to things like the publish tags being removed or reassigned to a different commit.

By indirectly, I mean that a package will no longer install because one or more of its transitive dependencies are broken.

If it is possible to figure out the difference between direct/indirect breakage and split into 2 separate lists, that could be an insight, although not sure how useful.

This does not matter a great deal, its just noise. A little irritating but not a problem I run into often. Better if the Elm package server would not show broken stuff. The ideal solution would be that it is an append only log of packages, completely immutable, and never breaks ever. That would mean self hosting the package binaries rather than relying on GitHub to serve them, which will come with a $ cost.

The detailed json contains that information, and the list of breakage caused by other packages will be way shorter, ~15 to 20 packages.

The type is declared this way:

data FailedCompilation = FailedCompilation
  { package :: ElmPackage,
    reason :: FailureReason
  }
  deriving stock (Eq, Show, Generic)
  deriving anyclass (Aeson.ToJSON)

data FailureReason
  = CorruptPackageData ElmPackage
  | ProblemDownloadingPackage
  | InstallationFailed Text
  | OtherReason Text
  deriving stock (Eq, Show, Generic)
  deriving anyclass (Aeson.ToJSON)

Initially I thought I would need get some data from GitHub as well, and thought that it would be a webapp indicating problems and causes based on regular scans (backed by GH Actions and GH Pages), but now I don’t think that there are that many hard to understand problems.

Mostly it’s just a bunch of guys who stopped maintaining their own stuff by either renaming the package, the username, or moving the tags. Looks they don’t worth me making a webapp for them. A CLI would be enough. :sweat_smile:

1 Like

But it kind of feels sad that 9% are broken, regardless of the reason. That’s why I posted it.

Even if Elm package registry is completely backed by GitHub, maybe it should periodically drop packages that couldn’t be installed by users anymore. No reason to keep them listed for 5 years. It’s not hard to do, so why keep the unusable trash?

3 Likes

From what I can tell, most of these do get re-published, either by the author under their new name, or by others (sometimes multiple times, which is a different problem), so it’s not like all those became unusable. Those that still haven’t been republished I would imagine were not used by anyone anyway.

I do agree that it’s not nice to keep seeing these broken packages listed. If I understood things correctly, the reason for changing these, is because the registry of packages is an append-only “database” (quotes because I think it’s just a JSON file under the hood), which is useful for tools to cache the list of packages, and get only the new ones.

Here, you can get the list of all package releases (for 0.19 and 0.18 if I’m not mistaken) in order:
https://package.elm-lang.org/all-packages/since/0

If you compute the length of that and discover it’s say 15500 publications, then next time you hit the registry you can ask for all the publishes since then by querying https://package.elm-lang.org/all-packages/since/15500.

If you remove some of these, then the count can get wrong and tools could end up missing some packages. Which would lead them to use .../since/0 instead, which is more work for the server (which costs money, so the less work, the better).

I can imagine that new types of entries could be introduced in this list to say that package X was removed or deprecated, but that’s more work (that existing tools might not handle well) including more design work.

2 Likes

it’s not like all those became unusable

Old ones do become unusable. The criteria is simple: there is a published package, and you can’t install it.

I agree that it’s not some significant problem, it’s just a registry filled with trash by 9%. Not a big deal.

But honestly I don’t buy the argument that there are technical limitations. Come on, it’s 2024, and we’re talking about “deleting dead packages is expensive, we may just keep them forever instead”.

Anyway, I’m just a passenger on this train and I make no decisions. And I’m grateful that it ever works. :man_shrugging:t2:

1 Like

Very valuable! (Thankyou) I see that I have two packages to fix.

2 Likes

@jxxcarlson
Yeah, the elm-stat made me dig this problem in the first place.

https://github.com/jxxcarlson/elm-stat/blob/6.0.2/elm.json
depends on “gampleman/elm-visualization”: “2.0.0 <= v < 3.0.0”,

https://github.com/gampleman/elm-visualization/blob/2.1.1/elm.json
depends on “ryannhg/date-format”: “2.0.0 <= v < 3.0.0”

https://github.com/gampleman/elm-visualization/blob/2.4.1/elm.json
depends on “ryan-haskell/date-format”: “1.0.0 <= v < 2.0.0”,
but gampleman/elm-visualization 2.4.1 isn’t installed with jxxcarlson/elm-stat 6.0.2
Instead gampleman/elm-visualization 2.1.1 is installed.

I tried upgrading a elm-stat’s dependency on gampleman/elm-visualization, and hoped to make a PR, but some other dependencies become incompatible, so I went sad and gave up, uninstalled the library, and switched to counting problematic packages and collecting failure reasons.

1 Like

It seems that most of the issues are from direct renaming of the user account or indirect renaming of user account. Are there other reasons?

Here are the reasons for failure I ecnountered when building all the packages:

Account deleted/renamed
Version tag deleted, or recreated on different hash from original publish point
Kernel code in package (published with hacked compiler)
Windows only build on case insensitive file system (MODULE.ELM won’t build as Module.elm on case sensitive file system)
Invalid elm.json

3 Likes

I don’t think its a technical limitation. It seems more likely to me that its an available effort limitation or a cost limitation - this package system was just what was possible for Evan to get done to support the compiler and I think it is a good first effort, but also obviously just that - a first pass.

Deleting dead packages would require some kind of backend process to be running that monitors the packages for breakages. To confirm a breakage, it would likely run the compiler on a package to test it. So the system would need some compute on the backend also to achieve this. The current system looks like it is a webapp backed by simple database.

To give an example, when you publish a package you only run elm make locally. There is no server side check at all. Hence it is possible to build a package on a case insensitive file system which will not build on other systems. You could have locally modified packages in ELM_HOME that enable your build to run, but fail for other people. And so on.

Anyway, that is what this unfinished project of mine was all about: GitHub - eco-pack/eco-server: Alternate Elm package server

A better way would be to never delete packages, by making a system that works as an append only log. So things can never be removed and are only inserted into the log when they have been built in a controlled CI environment that confirms they will work for everyone when compiling against the current log history.

2 Likes

It seems that @VladimirLogachev has done the majority of the work here, and it could be a purely manual process rather than an automated system- once a year (or however often) identify the invalid packages and delete them from the index. I’ve done this a few times for Gleam and it took less than five minutes, so I’ll probably never automate it.

3 Likes

I don’t think that’s better. In any human system you’re going to need to delete stuff eventually. For instance the moment someone creates a package containing illegal content, etc.

2 Likes

Initially, I was going to compose the following setup:

  • scan runs on cron schedule on GH Actions (free)
  • after each scan, a web app is re-deployed on GH Pages (also free)
    So that everyone could see up-to-date info about broken packages and dependencies in their browsers.

I can hack this setup together in a day.
But it would only make sense if these stats would be used.
From May 17 to 19, 4 packages were already fixed. Thanks! :pray:

But there are more packages to unpublish than to fix.

1 Like

Ok, there might be situations where you really have to, but I don’t think they would happen very often. Maybe someone publishes code that infringes a patent or copyright for example. Has that ever happened yet to any published Elm package?

I am trying to think of ways that a package system could be made that will guarantee repeatable builds and never be affected by account deletions/renames or GitHub threats to change the hashes etc.

Even the TodoMVC app has a “Delete” action.
https://evancz.github.io/elm-todomvc/

I usually recall this video when thinking of my own… hmm… technology choices, but this time I recalled it from this discussion :blush: I brought it here because it’s so fun to see, like a giant mirror, with enough space for most of us.

I probably wouldn’t bother seeing as it only takes a couple minutes to delete some records from a database. You’d need to do that a lot of times to make back the time you spent automating it, and this is something where you likely want a human to check over carefully before taking action so it’d still be manual.

The BEAM’s Hex does quite well here. They’d be good to talk to about this.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.