Core library fixes and improvements: Part 2

So I said I would do some analysis on bugs on the compiler and core packages in order to help inform disucssion around how we can move forward with this. I was thinking I would have time over the xmas holidays, turns out I had even less free time then vs when I am working! Anyway, I now have a spreadsheet organized in a way that enables me to communicate my findings:

Pull Requests

I analysed anything people asked me to, plus all the outstanding PRs on core packages and the compiler. It made sense to focus on the PRs since there are less of them than the open Issues, they consist of proposed solutions so I can be precise in understanding how they would affect the overall system, and since someone bothered to make a PR the issue is obviously important to someone as well as possible to fix in a relatively small amount of code.

But… The main reason to focus on PRs is that is where the process gets stuck. You can discuss issues, get help with workarounds, raise github issues, develop solutions. Once something is submitted as a PR it mostly seems to get stuck there - there is no working process to get your contribution into Elm.

How bad is the problem though? How much unmerged good stuff is out there?

Categorisation of PRs

I categorised the PRs from the most trivial ones (doc fixes), which have zero affect on the overall system, to the most invasive ones, which affect the semantics of the Elm language itself in ways that break compatability with 0.19. The categories in order of increasing impact are:

  1. Doc Fixes
  2. Performance Improvements
  3. Runtime Bug Fixes
  4. API Changes to Implementation
  5. API New Functions
  6. API Changes
  7. Language Semantics

It is entirely possible that we could fix things in levels 1 - 4 as a community release branch of Elm without breaking compatability with the official 0.19 release.

Level 5 does not break compatabilty, but new code relying on it would not be able to revert to the official release unaltered.

Levels 6 - 7 break compatability and are heading more in the direction of a fork, as opposed to a release branch.

Counts of number of issues in the sheet given below.

Doc Fixes - 68

Its great to improve the docs and there is plenty good stuff in here. However, fixing the docs is not really what we have in mind when we think of making progress with the issues that are causing us problems.

Performance Improvements - 13

Nice to have perf improvements too.

Runtime Bug Fixes - 11
API Changes to Implementation - 26

These 2 categories are where I thought I would find the most interesting stuff. These are the real bugs in the implementation and they are fixable without breaking things. Generally these fixes are very localised and deal with the rough edges that people have found in the wild.

There are only 37 items in these 2 categories which reflects on the very high quality of Elm 0.19. It is also likely there are more issues that could be fixed in these categories but no PRs have ever been submitted because potential contributors have been put off by the lack of a functioning process.

API New Functions - 19

Many of these are nice-to-haves rather than essential fixes. Almost all can be worked around in the usual ways by using a port or other javascript tricks.

Worth mentioning the app.kill one, which is a frustrating issue for some people. It might be a bit misclassified here, as its a JS API rather than an Elm one.

API Changes - 2

Language Semantics - 4

Compiler Stuff - 24

I didn’t include compiler PRs in the other categories, since compile time is different to runtime, and preserving compatability of the runtime and language is the primary focus of my analysis.

=========

Original thread:

29 Likes

My proposal is simple: I will create GitHub forks of all core packages at their current published versions and accept patches at levels 1 to 4. Consuming these alternative packages will require you to manually install them (GitHub - pdamoc/elm-pkg-patch: Simple example on how to patch elm packages).

This is the minimal process for getting beyond where we are currently stuck, without breaking anything.

12 Likes

Wow! You have made an awesome work! One question? How is the review process going to work for the patches? In other words, how can we assess the quality of the PR to merge? Should there be an open review/discuss/improve/vote process or something equivalent? I cannot stop thanking you for taking your time with this situation!

2 Likes

I will try and put some thought into that. I don’t plan to rush into this, but I think I will just review (already done it really) and merge in a few PRs myself to get a feel for how it will go. Maybe then write up a short guide that explains my way of working. After that anyone who wants to help review and merge patches would be welcome to join in. It would be up to whoever gets involved to determine how much discussion is needed - maybe an occasional online call plus keeping a conversation going on the Discord server.

I think limiting the work to level 1 - 4 will take the heat out of any discussions; its usually pretty obvious where things fit in my classification.

1 Like

I think it is a very good approach. Thanks again

Thanks for the effort you’ve put into this @rupert !

I think keeping the high level of overview clarity both helps move things forward constructively, and also results in high quality input for Evan to more easily ingest and consider these items in future when he cycles through these areas again.

I look forward to jumping in and assisting where it makes sense once you’re ready for that! :slight_smile:

The initial list of PRs are already on the various core packages (obviously!), so I think GitHub is already providing a nice overview of items to consider merging.

Its actually something I am wondering about though, because I will need to keep a very clear log of what I merge. Also I think its possible that PRs and Issues will get created directly on the forks, in which case I want to be able to communicate that to the upstream projects. I haven’t figured out yet exactly how this will work, which is partly why I will just have a go at working on a few issues myself. I feel like I need to explore how GitHub issue tracking is going to help or hinder me. But the intention is to:

  1. Merge only clean patches (some patches have had elm-format run creating large whitespace diffs with the originals). Each patch should be applied in a single commit.
  2. Make sure its easy to understand/find what has been merged.
  3. Upstream all patches.
  4. Comply with license/copyright/etc.
1 Like

Oh I had totally forgot that I created a PR myself (a doc fix). Thanks for doing this work, hopefully it will help having them merged.

Two comments:

I would advocate creating a GH organization in which to host these forks. (Whether the name of that organization may contain the string elm, I don’t know. If not, plenty of fun name ideas here.) I think it will contribute to the community’s confidence to see a semi-official organization rather than having these patches aggregated under an individual user’s GH account.

The branches that you merge should all be built directly off the tip of the last official release in the main official repo. This way you can ensure that everything can be upstreamed easily – and in particular, can be upstreamed independently. If PRs end up being built instead from the ever-growing tip of the forks, then git machinations like cherry-pick become necessary, making it that much less likely that the fixes will ever make it upstream. (In rare cases it may be technically necessary to build one fix on top of another, of course.)

Thanks for continuing to work towards improving the process of maintaining Elm’s core libraries!

1 Like

Good one. I like “frontier”, because the trigger of the fork is policy.

Yes, that is how I am intending to do it. Also to ensure each PR gets put into a single git commit, so each applies or fails atomically.

The idea would be that if a newer official release comes out (say elm/core 1.0.5 → 1.0.6), I would re-apply the set of patches starting from the tip of 1.0.6. Hopefully that would show up as conflicts anything that has been independantly fixed on 1.0.6, and that patch can be retired. It probably won’t be as simple as that in every case and a careful reading of the patch notes on both sides would be needed anyway.

To ensure 1 git commit per patch I would use git merge --squash. Issue with that is that it loses the original authors identity, so is there a better way? I guess not. If the original author made 5 commits and I have to squash them into 1, that 1 squash commit is going to come from me, so git will record me as the author. Probably nothing I can do about that, other than asking the original author to do the squashing.

Also, tests! I notice that there have been some contributions towards getting tests running on the core packages. Not sure what the current status is, but it would be worthwhile trying to get them running and some sort of CI build to check for regressions.

At least GitHub ui allows you to squash while preserving authorship info, so it’s presumably somehow possible in git.

1 Like

Looked into it, and there is a way. You just list them as co-authors in the commit message like this:

Co-authored-by: John Smith <john.smith@example.com>

https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

I haven’t tested it, but I think this might work too:

git merge --squash --no-commit
git commit --author "John Smith <john.smith@example.com>"

The idea is to merge without committing and then commit changing the author.

You can also amend the last commit before pushing:

git commit --amend --author="John Doe <john@doe.org>"
1 Like

Squashing is going to lead to problems down the road. You’re better off just doing an actual merge. That way upstream can merge the exact same commit and git will be much better able to figure out how to handle future merges. Here is the kind of structure I would encourage you to create:

$ git log --graph --format="%h %s" upstream fork --date-order
* 4b1289f upstream 3.1.0
| *   b2eb7dc Merge branch 'upstream' into fork
| |\  
| |/  
|/|   
* |   39304b0 Merge branch 'fix-c' into upstream
|\ \  
| | *   99fc899 Merge branch 'upstream' into fork
| | |\  
| |_|/  
|/| |   
* | |   fd2ccb0 Merge branch 'fix-b' into upstream
|\ \ \  
* | | | ce94266 upsteam 3.0.0
| | | *   ce3a46f Merge branch 'fix-c' into fork
| | | |\  
| | | |/  
| | |/|   
| | | *   09a1d9b Merge branch 'fix-b' into fork
| | | |\  
| | |_|/  
| |/| |   
| | | * 820d8f3 Merge branch 'fix-a' into fork
| |_|/| 
|/| | | 
| | * | 74245cc fix c, part 2
| | * | 2770bf9 fix c, part 1
| |/ /  
|/| |   
| * | 9b87b84 fix b
|/ /  
| * 26b11da fix a
|/  
* 21bb3c8 upstream 2.0.0
* de20f8c upstream 1.1.0
* b203269 upstream 1.0.0

In this example, fixes A, B, and C were built off of upstream 2.0.0 (21bb). Upstream ignored them while building 3.0.0 (ce94), but then decided to accept only fixes B and C; then 3.1.0 (4b12) was built on top of that.

Meanwhile the fork accepted all three fixes immediately. But it was able to keep current with upstream’s 3.0.0 release via simple merges (99fc and b2eb); the fork hasn’t yet absorbed upstream’s 3.1.0 in this example.

This history may look at first like it’s too tangled for a human to sort through, but in my experience it turns out to be much easier than having the same logical change in multiple squashed commits across forks. And git will help – for example it can show you very succinctly how upstream and fork have diverged:

$ git log --graph --format="%h %s" upstream...fork --date-order --boundary --no-merges
* 4b1289f upstream 3.1.0
| * 26b11da fix a
o | 39304b0 Merge branch 'fix-c' into upstream
 /  
o 21bb3c8 upstream 2.0.0

Or you can get the same info with a bit more context:

$ git log --graph --format="%h %s" upstream...fork --date-order --boundary
* 4b1289f upstream 3.1.0
| *   b2eb7dc Merge branch 'upstream' into fork
| |\  
| |/  
|/|   
| *   99fc899 Merge branch 'upstream' into fork
| |\  
| * \   ce3a46f Merge branch 'fix-c' into fork
| |\ \  
| * \ \   09a1d9b Merge branch 'fix-b' into fork
| |\ \ \  
| * \ \ \   820d8f3 Merge branch 'fix-a' into fork
| |\ \ \ \  
| | * | | | 26b11da fix a
| |/ / / /  
o | | | |   39304b0 Merge branch 'fix-c' into upstream
|\ \ \ \ \  
| | |_|/ /  
| |/| | /   
| |_|_|/    
|/| | |     
o | | | fd2ccb0 Merge branch 'fix-b' into upstream
| |_|/  
|/| |   
| o | 74245cc fix c, part 2
|  /  
o / 9b87b84 fix b
|/  
o 21bb3c8 upstream 2.0.0

(GitHub’s UI hides a lot of the power of the git command line from you. That makes things simpler in the simple case, but maintaining long-lived forks is a more complex use-case, and IMO something that’s often better done from the git command line.)

2 Likes

Yeah, I prefer the command line too. Thanks for this write up, I didn’t quite grasp it on the first read, but I will make my own example project and try it out. I’m definitely a learn-by-doing type.

1 Like

Ok, I get it now.

git merge --squash does not do merge tracking, so its harder to figure out what has been merged or not.

You approach does not squash PRs with multiple commits into a single one. I like the idea of doing that, because when doing a rebase any conflict will fail the entire patch, so avoid partially applying something. But I can use git rebase -i to squash commits on the working branch, and then merge and get the best of both?

My high-level philosophy is that we should strive to create a git commit history that’s helpful for future maintainers: it should produce useful, coherent results for git blame and git bisect (and git log, of course).

That means you want each commit to be a logically separate, self-contained change.

But to me, that only means that most PRs will usually contain a single commit. I don’t think it means that every PR must contain exactly one commit.

One example is a PR that fixes a bug that wasn’t previously covered by test code. I like to break such things into two commits: one that adds a test that captures the current, buggy behaviour of the code; and a second that fixes the bug and changes the test to expect the new, correct behaviour. (This PR is a more complex example where I think it was worthwhile to keep the commits separate.)

If it is necessary to rebase -i to squash things, I think it’s better to ask the PR author to do that in their fork before merging into the main fork. This way, that PR author can still submit the exact same branch to the upstream Elm project in a different PR, and git’s merge tracking will be able to follow what happened more easily.

Ok, I will follow that advice in the case of more complex PRs with multiple commits. Most are just a single commit and tidy enough to merge in. And as you say, if something is bigger than that but has a logical sequence of commits that is better to preserve, will do that.

A few have run elm-format so introduce a large amount of whitespace diffing - generally happy to restore the formatting on those and rebase -i if its an anotherwise good patch. Some PRs have been sitting round for a while and the original authors might have moved on, so we may have to do a little work sometimes to get something accross the line. But would generally try and set the bar and push back on authors to reach it for new ones.

I was thinking single commit makes it easier to identify and reverse a whole patch if the need arises. Say I merge something in and it later turns out to not have been a good idea. But as I always will have the base (core 1.0.5 say), and the PR in another branch, comparing those will yield the entire patch for the purpose of reversing it.

Similary for rebasing the patch set against new upstream releases. Can git merge pr-123 --no-ff --no-commit to check a whole PR for conflicts.

I was going to use my eco-pro github org to do this work in. But as that is linked to my personal idea of an offline version of the compiler and alternative package system, I think it will make more sense to create a new GitHub org just for this, so it is on completely neutral ground.

elm-janitor or elm-maintenance might be good names for it.

2 Likes