Should be back as of about 8 hours ago, around when @Janiczek posted. I shared the following info on a GitHub issue about the 502.
This has happened once before, but at a better time of day. That time, I spent the next few days setting up
monit so that the server could restart itself.
In this case, it looks like the server went down at the worst possible time for US-based admin. We had
monit set up to restart the server if this ever happened, but it ran into trouble. It seems that restarting needed the
HOME environment variable to be defined, but it was not defined in whatever context
monit was trying to run the restart script.
Anyway, once US work hours started again, we got the server back up. Then figured out what the problem was that prevented
monit from working. So the issue reported here should be resolved as of US morning.
Additionally, I have added a new admin for the site who lives in Europe. Between these two things, I think the window of possible downtime should be a lot smaller. There will be an Elm event in Japan relatively soon, so perhaps we will be able to find someone there to add, further limiting the amount of time that are not covered by people at work.
Making CI more resilient
For folks who had trouble with CI, it is possible with most systems to cache the
~/.elm directory such that you do not need to talk to
package.elm-lang.org or to
github.com on normal builds. Only if a new package dependency is added.
People who had this configured did not experience any downtime on CI. I am asking community members to blog about this and make sure any default configurations out there are doing this.
We are hopeful the
monit fix and the European admin will help with downtime, but configuring CI to save
~/.elm/ is a great way to be resilient to downtime elsewhere and to reduce build times in general.