Towards more reliable CI

Context

It is possible to make CI much more reliable by caching the ~/.elm directory. The directory contains:

  1. Immutable files. All package content is hashed on publication, so downloading requires that the content exactly matches that hash. They will be the same for all downloads.
  2. Deterministic artifacts. Certain build artifacts can be pre-compiled for each package, so those are also stored in ~/.elm. These are deterministic based on the aforementioned immutable files.

Caching these files means that you basically never make HTTP requests to package.elm-lang.org or to github.com, so your builds will be faster and more reliable.

Help with Instructions

This repo is an effort to centralize instructions on how to set up ELM_HOME caching with different CI services:

If you have expertise with a CI system that is not covered, please share a draft recommendation for that system!

Future Plans

It seems that it is fairly easy to detect if elm is running on CI (like here and here) so it may make sense to provide a BIG WARNING in CI when packages are getting downloaded. (This should only happen when new packages are added, so very very rarely.) So it makes sense to try to get the setup instructions in good order before taking more concrete steps in this direction.

24 Likes

Great effort thanks a lot.

Anyway I think there might be a problem with some of the recommendations. I’ve opened issue in the repository where we can clarify this.

I think it’s fair to give some head ups in here in case there are people who went and applied some of these setting in their projects - if you see weird behavior go look into issues in the repo.

1 Like

I think it’s worth mentioning Nix in this context as well. Nix builds run in a sandbox without any network access. So when you use Nix to build your Elm code, preventing network access at build time becomes a necessity rather than a “nice to have” for added stability and performance.

The announcement post for elm2nix makes some excellent points about what Elm could do better to enable Nix and other “build planning” software to work with the Elm compiler more easily:

[I]t would be ideal if there would be [an] elm.lock file or similar with all dependencies pinned including their hashes

Ideally instead of committing [versions.dat] to [the] git repository, one would be able to point to an url that would present [this] binary file pinned at some specific time - allowing it to always be verifiable with [an] upfront known hash.

I think Elm’s built-in package management provides a great beginner experience already! So with limited resources, I would say time is better spent on exposing information, documenting file formats, etc. This way, dedicated third-party build systems (like Nix or more conventional CI pipelines) can plug in easily once a project is ready to “graduate” from the exploratory phase, and the Elm core team can concentrate on other things.

How I use Nix in my Elm projects and Enabling pure Nix builds including elm-review & elm-codegen have more discussion of building Elm with Nix if folks are curious!

3 Likes

What would elm.lock have that isn’t covered by elm.json? The point of "direct" and "indirect" is that the whole transitive dependency DAG has explicit versions. Is it like, “when elm tries to verify the deps for an application, it may try to download something anyway?”

Situation

Usually the primary purpose of *.lock file is to store information tooling (package manager) resolved versions to when it was installing them. Considering that this might be time-dependent – it might resolve different versions at some different point in time depending on a state of registery – makes sense to allow for some reprudicibility. But if this is not time-dependent it sort of looses its utility.

When you have these two files, dependecy declaration and lock, you’re then able to have some reproducibility. If lock file is used you suppose to get exactly the same thing next time as you did last time. If you remove it (or ignore it) you can let tooling resolve new versions based on current state of the registry (update).

With Elm situation is bit different since in Elm there is a distinction between a package and app. Package is using ranges of dependecies and app is pinning them to exact versions. And also semantic versioning is enforced by integration between type checking and packaging.

Elm also does verify hashes (usually stored in lock file) it just doesn’t store them to lock file (but I belive they are part of registry.dat?). It just comes with promise that registry itself is immutable so version is all you need to identify exactly what you expect to get. Hash can live in registry snapshot. And to confirm that you indeed did get right bits from GH you just compare hash from immutable registry snapshot with the zip you got from github.


My Thoughts

To me it seems that addition of lock file doesn’t really solve any real problem for elm other than making it perhaps simpler to write tools which understand what did compiler resolve versions to. It would already by possible to write elm2nix so it doesn’t need the registry.dat binary file comitted it would just need to read and understand data in this file and then synthesize it during the build. There are other limitations in elm2nix as it exists now. Like it doesn’t support multiple src for instance. It simply was done on good enough bases rather than comprehensively implement the best thing which does what elm does.

The philosophy elm takes is to look at the big picture problem and then do what seems the best in best way possible with information it has. Even if it means it does its own thing no other tool does. It avoids problems from occuring but it also means when building a tooling it seems to “break principle of least surprise”. Tooling which is trying to replace it in part (elm2nix in this case) has to rediscover the things specific to what elm could do and thus does. If it’s better for elm to make tool building simpler or if tool makers should just do the right thing (what elm does) if they’re going to attempt to replace it (in part) is a real question. Should upstream project be solving problems for downstream project? It almost never is definitive yes or no. It always seems to be “well sometimes yes and sometimes no”.

P.S. I tried to avoid using “the right thing” and other “worse is better” terminology but failed. But it seems somewhat related to that in my head.

Is it like, “when elm tries to verify the deps for an application, it may try to download something anyway?”

That is a concern, but I think it’s secondary.

I think the key benefit of a .lock file is that it tightens the “circle of trust” or “radius of reproducibility” around an application’s build.

As a new engineer or a CI system, I have the same basic task: pull down the code for a project and build it. I want that process to be as deterministic & reliable as possible, and also to require as little trust in outside systems as possible.

As it stands now, Elm code bases typically only include elm.json. I then have to trust the underlying machinery you described at the top of this thread – including both my local elm binary and remote servers run by elm-lang.org and github.com – to resolve the same code for the specified versions of all the project’s dependencies that the project’s authors expected.

If there were a .lock file with checksums in it, then I only have to trust that my local elm binary will enforce those checksums – any upstream corruption / compromise / innocent movement of tags / etc. etc. at either elm-lang.org or github.com will be detected.

And then if the forrmat of that .lock file and the .elm directory is documented, it also lets tools like CI systems and Nix pre-fetch, verify, and cache the dependencies.

(Of course it doesn’t have to be a separate .lock file, it could be a new section inline in elm.json or various other approaches. The important thing is that it’s documented and in a language-independent format like JSON or YAML. And the format of the .elm directory could remain opaque if there were an elm cache add command or similar, in the spirit of npm cache add.)

In my project, I keep a copy of registry.dat and an elm-srcs.nix in my repo and use the elmPackages.fetchElmDeps Nix function. Together these achieve the same result I could get with an “official” lock file. But I have to make guesses and assumptions about what registry.dat is, and after reading this thread, I’m guessing elm-srcs.nix is probably duplicating information that’s already present in registry.dat, but in an opaque form.

Just to be completely clear I no longer remember where exactly hashes are being stored. It’s been a while since I was looking into that code. But they for sure are - this is why package installation breaks when Github changed how they archive zips.

There is also a completely different way elm can make life of external tooling simpler. It can keep doing what it already does but provide some API (even just cli option) to expose it. For instance there could be elm packages --list or which would spit out all the helpful information to stdout. This would simplify for instance support for libraries in elm2nix a lot. Do you think that would work for you?


P.S.

So it’s not within registry.dat. there is an API for getting the “locked” zip (hash + url to download from).

It would have hashes. See this elm.lock for an example.

Why do we need hashes?

Nix derivations are supposed to be pure and reproducible. As such, all inputs to a Nix derivation must be traced back to the Nix store. In particular, we can’t use the global Elm cache as an input to our Nix derivation. So what we do instead is figure out ahead of time all dependencies our application is going to need and build a cache containing just those dependencies.

To build the cache we need to fetch the dependencies during our build. The problem is that Nix doesn’t give you access to the network during a build. However, if you use a fetcher which returns a fixed output derivation then Nix will allow you to access the network.

All the fetchers require you to know the hash ahead of time. For e.g. this is where the hash is used in a fetchzip call.

How do we currently get the hashes?

We use a tool called nix-prefetch-url to get the hash ahead of time.

Here’s an example call:

nix-prefetch-url https://github.com/elm/json/archive/1.1.4.tar.gz --type sha256 --unpack

Here’s what happens at a high-level.

We read the dependencies from elm.json looking at dependencies.direct, dependencies.indirect, test-dependencies.direct, and test-dependencies.indirect and collect them into a set. Then we loop over the set and call nix-prefetch-url on each one to get its hash.

Do we need support from the Elm compiler?

I don’t think so.

Firstly, when I rewrote elm2nix I decided to use fetchzip because we need the decompressed contents anyway so why not decompress it ahead of time. However, cachix/elm2nix uses fetchurl. The observable difference is that we end up needing different hashes because I hash the contents but they hash the .tar.gz file. So the question will be which hash should the Elm compiler provide?

Secondly, (nix-prefetch-url and fetchurl) or (nix-prefetch-url --unpack and fetchzip) work together in tandem. If Elm is going to produce a hash for us to use in Nix land then it has to reuse the algorithm used by either of the groups.

I’d have to play it out in some more detail, but my gut instinct is that could be made to work, yes.

If there were a pair of elm packages --list and elm cache add commands, I think we solve both of these questions by designing them to work together. For example, we can choose a standard hash algo (sha256 or sha512 most likely), have packages --list output a URL and the hash of that URL’s content in the chosen algorithm, and have cache add accept that URL’s content as input.

Whenever a package is published, I capture the hash of the source code at the time of publication. That information is available from a URL like this:

https://package.elm-lang.org/packages/elm/core/1.0.5/endpoint.json

For any package, you can swap in the author, name, and version. You get the following info:

{
  "url":"https://github.com/elm/core/zipball/1.0.5/",
  "hash":"9288a7574b778b4ebc6557d504a0b16c09daab43"
}

I check the hash on download, but I do not believe it is stored after that. In 0.19.1, if it finds the expected cached files for elm/core 1.0.5 in ELM_HOME it continues happily along.

As for the CLI commands you are discussing, (1) the full list of packages and versions needed to build an application are explicitly listed in elm.json already and (2) the way to put something in the cache is to place files in ELM_HOME.

Both should be doable without any modifications to elm, although these are definitely “internal details” at this time. I do not make any guarantees that the details of build artifacts will be stable from version to version, even in patch versions. So even though I think what you describe is doable at the moment, I can see a case for getting together an implementation pathway that has better properties when it comes to version upgrades.

Really the only usecase such a flag would has would be to expose version resolution elm does interlly in case of libraries in format other tools can just consume.

But even inside nixpkgs we don’t get that much value from packaging elm library as nix derivation. Usually people care about packaging applications written in elm. Even search.nixos.org is an elm application. And these have versions pinned down to the patch. And we know registry is immutable so that should be all the input we need. So it might be overall very low value proposition.

The truth is that we could already be doing much better things with even elm2nix and we don’t need really anything to change with the compiler to do so. The only reason why we don’t is combination of people not having time and a hestitation to depend on too many internal details which might change as you say.

My feeling is that it would be unfair to make demands from compiler to add some stuff for us at this point. We should really first do more on our end of things and let things crystalize.

PS. When I say elm2nix I mean original cachix project. It’s quite possible that @dwayne’s fork addresses most of the limitations.

Thanks, that detail is helpful to have written down!

In 0.19.1, if it finds the expected cached files for elm/core 1.0.5 in ELM_HOME it continues happily along.

I have noticed that I also need to have a current copy of registry.dat; otherwise a build inside the nix sandbox fails with:

-- TROUBLE VERIFYING DEPENDENCIES ------------------------------------- elm.json

I could not connect to https://package.elm-lang.org to get the latest list of
packages, and I was unable to verify your dependencies with the information I
have cached locally.

Are you able to connect to the internet? These dependencies may work once you
get access to the registry!

Note: If you changed your dependencies by hand, try to change them back! It is
much more reliable to add dependencies with elm install.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.