Roadmap for internal packages?

Does anyone know where private packages might exist in the current roadmap for Elm? We’ve been trying to use it at our company and while the code is very nice and probably the safest part of our app, we’ve had pretty big deployment issues trying to get private packages of reusable code. We used elm-github-install at first but it’s not being upgraded for 0.19 (https://github.com/gdotdesign/elm-github-install/issues/62). I understand and agree with the reasoning that was provided, but it still leaves us with no other option I can see to share code beyond git submodules.

Submodules are usable of course, but there’s a lot of potential pitfalls with that (mainly needing tooling and company-wide education about how git submodules need to work) and we can’t control package dependencies directly or maintain an official package api since it’s just treated as source code.
It feels like I’d have to create special tooling to handle internal Elm packages, and without benefiting from the actual packaging system.

So I get why elm-github-install is getting deprecated, but it feels like there’s not a strong alternative.

So to state the problem more succinctly -

  1. My company wants to use Elm and reuse private code (and it needs to be private).
  2. I want to hook into the normal Elm ecosystem and treat this as the normal package that it is without using hacks.
  3. With elm-github-install no longer a viable option, submodules seem to be the only choice, but submodules include their own issues with safe company adoption and they don’t encourage well-designed packages the way that the normal Elm package system does.

What’s the expected roadmap in regards to this issue? Is it something that we could help with in any way?

6 Likes

These are what I see as legitimate use cases for private packages as elm-github-install provided:

  • Private packages. Some of us are working for corporates and the code we write is not ours to give away under an open source licence. We want to practice good software development lifecycles on shared code within the organization, by managing it as packages.

  • Over-riding kernel packages on the local filesystem. For experimental work on them or simply to educate myself about how they work (sometimes the deeper details become important). To be able to fix bugs with them and submit patches.

  • Over-riding other packages for the same purposes as above.

I’m not interested in using elm-github-install to work around kernel code sharing restrictions. I tried that, and the problem with it is that you end up with good work that you cannot share as a proper part of Elm.

1 Like

I should add that there is a positive side to not having private packages. If you extract some code that you want to re-use there is a big incentive to open source it and share it in order to do so.

There is also a down-side to that. It is tempting to open source and share code that is of absolutely no use to anyone else, simply to be able to share it within your own org. the-sett/auth-elm is an example of where I did that; that code only works against my auth service.

1 Like

Yes, that second case is what we’re trying to avoid. Bringing it up to proper package standards isn’t our problem - it’s more that the code is centralized and frequently re-used, but literally serves no purpose beyond our personal business needs.

But also, bringing it up to proper package standards is beneficial to your business as then you have a nicely documented package with semantic versioning that you share within your org. Its a good way for the wider Elm community to share and work with code and your business can benefit from that model too.

Yes, that’s what I meant. It’s not a problem as in I’m on board with that.

I’m going to present an idea that I think is inline with the current community goal of Sustainability. Why not allow private Elm packages to be published to users willing to pay some nominal (annual) fee? It’s a way for corporate users to feel comfortable publishing packages in a manner they are used to, and let’s those that choose to support the community and development of the language.

Those funds could be used to pay for hosting, financial support for conference attendees or other such programs, or even funding contributor or support people’s labor.

Maybe this is against the Ethos of the language or the community, but I think it looks like nobody loses. Thoughts?

I’ve talked with other companies that are similarly not into git submodules.

I recently moved to a “monorepo” and I know that is what they’ve been doing at Google for many years as well. I’d like to see discussion about why that is or is not a possible path.

1 Like

I’m of two minds about the monorepo approach.

Pro

At NoRedInk, our main application is basically a monorepo—and if it’s not, it’s at least very central to the majority of our engineering staff’s work.

When working on that, I’ve had a good experience developing modules internally with the rest of our code before packaging and releasing them. This is how List.Selection happened, among others. We figured out the minimal API, documented it (our top level modules have their docs enforced by CI anyway), and eventually published it as a package. It’s a a very compelling use case to me, because it tells a nice story about software being developed against a concrete use case before being generalized and released.

Analysis

This would have been much harder to do if we had jumped straight to a package model, and I’m concerned that by making private packages available, we may discard this nice feedback loop. At least, we may make it harder for people to have the nice experience I’ve had.

Contrast the above with “well, we need :wave: some package :wave: to do this, better go make one.” (not a straw man—I’ve literally done this in other contexts, much to my chagrin.) By doing this too early, you throw up barriers to quickly converging on a solution: it’s harder to release new versions and bump them in your target repo than it is to make the changes directly inline. Worse, you’ve now committed resources to a thing that you maybe find out shouldn’t exist later! That sounds like a worse experience subjectively, and in my experience leads organizations to double down on the thing they created, rather than cutting their losses.

This is not exactly a rosy picture, but I think it’s a plausible failure case. I hope I don’t seem to overwrought about it. I’ve just experienced it several times in places where private repos were available! :joy: / :cry:

Con

We also have noredink-ui. It’s public because we’re fine with making it public, but I don’t think it’s optimal. In this situation, we chose to publish in this way because we previously chose to not use the monorepo strategy for two very different projects. They have different tech stacks, deployment patterns, CI, etc. The short story is that the second project, which needs to share this code, is using several experimental strategies which we want to shield the rest of the organization from. If they work out, maybe we’ll bring them into the main project! Who knows!

Anyway, this would be the right time, in my mind, to have a private package. This is code that we’re OK with, but not code that we necessarily planned to ever publish. Some areas were rough, some abstractions didn’t make sense, but we needed to share the code because of decisions made earlier in our lives, and found that this approach was the least bad among those available.

Analysis

I have seen similar things happen when consulting from organizations who were bit by the microservices bug—now all those little critters are living in their own repos. Whoops! I’d love to hear from someone who is facing this specific problem when using Elm, because I know it exists… but what’s the actual lived experience of it in this case?

Separate Note

I’d also like to point out that frequently when people find that their API boundaries are messy, it’s because they have not designed their modules around a single data structure. Wanting to publish a package to encapsulate some messy API boundary is just going to throw up more of those barriers around making it better, so maybe there are better solutions to this particular affliction.

4 Likes

In short, it’s about flexibility.

Where I work we have one Elm application per api-endpoint/url. Some of those Elm applications share some common types and helper functions, all of them share the same view-components and logging functionality. But they’re still different enough that having them as separate applications makes sense. They also have their own deployment pipelines, and are deployed several times per day to both testing and production environments.

Some of the applications are in active development, while some will probably not need much in the foreseable future except for the occasional bugfix.

Now, I could be misunderstanding this whole monorepo thing, but if it means that there’s one elm.json file and that all our applications has to be built using the same compiler and the same dependencies, then that’s bad for us as we lose the possibility to update our applications incrementally.

I realize one could probably get around this by using branches and stuff. But in that case it still seems cleaner and easier to just utilize different repos.

Just on the topic of ‘How do I do private packages?’; where I work we put Elm modules in private npm package, and then added node_modules/@company-name/... as a source directory in our ‘elm-package.json’ file.

It seemed to work pretty well. I can’t remember any problems arising from this approach.

5 Likes

That’s basically the same as using git subtrees.

Problem is that you now have to manually keep the elm.json files in sync and that elm bump/elm diff doesn’t work.

The reason I prefer not to use a monorepo approach, is that I regard the encapsulation offered by Elm packages as a valuable way of enforcing (one of) Parnas’ principles. That is, a package only exports what it wants consumers of the package to see. There can be internal modules that are not visible from the outside. In a monorepo approach, by just including the relevant src directory client code can end up depending on things that it should not.

Elm is interesting in that it has 2 levels of of encapsulation - at the module level, and at the package level. This is great for different scales of software engineering.

Thanks to the folks working with Elm at work for sharing perspectives!

@robin.heggelund, having one elm.json is not a requirement. It just means having all the code in one repository. You can have it all in one src/ directory, or spread across ten directories. You could have one elm.json that lists all those directories, and compile with elm make App1.elm and elm make App2.elm as needed. You can also have a elm-package.json that has an extra legacy/ source directory that gives compatibility for certain packages that are still getting updated.

So I do not understand your concern about separate deploys or different versions. All of that seems possible. The only question is whether you you have the directories in different repos or in the same repo. If you are testing App1.elm on a certain commit, all the stuff not relevant to App1.elm will be not relevant in the tests and deploy too. At least, that’s how it seems to me.

Are you still skeptical that frequent and different deploys are possible? What details am I missing if so?

At my place of work, we have considered a monorepo approach, but opted against it for reasons that are more organizational than technical. We’re a big company, and have grown through new projects and offices spinning up, and we’ve been through multiple acquisitions (on both sides). Additionally, we have products that are shipped to customers on-premise in fixed releases as well as cloud software offerings that deliver continuous updates. Within any of those products some teams work on critical-path items that require a high SLA, and others work on things with lower SLAs where feature delivery has more impact. All of this is to say that we have a very diverse set of development cultures.

Adopting a monorepo approach would mean getting all of those groups on board with it for at least some of their code, and figuring out and adopting tooling to make sure that the right teams are included on code reviews that impact their area of ownership. It means teams who are used to having a lot of autonomy to control their deployment and branching strategies might not be able to do so as much any more.

Also, our different product lines (on prem vs cloud) tend to have different approaches to adopting dependency updates, and it seems to me like it would be more difficult for those products to be on different versions of shared dependencies in the monorepo model (although if there is a solution to this, I would be very interested to hear about it).

That’s not to say that it would be impossible for us to move to a monorepo. We’re certainly not as large of an organization as google, and since we didn’t actually move forward with that approach, I don’t have first hand experience of how it would work for us. That said, I feel confident in saying that it would be a big effort and require a lot of time spent convincing folks that it’s worth it and getting buy-in across the organization. So if I suggest that we should use more Elm, but it comes with the caveat that we need to use a monorepo for private code, Elm just became a much harder sell.

With all of that said, I don’t feel like we’d need a full-on private repository to be successful (Although I would surely miss semver enforcement). Even just having tooling to point builds at a local filesystem path as a dependency would solve many issues dealing with private shared libraries, IMO.

4 Likes

I guess a monorepo would be completely possible, but it seems more confusing than our current model.

Let’s pretend I have 6 apps: App1 through App6.
They all have the same design, and log errors the same way, so they also rely on: app-ui and app-log.

App1 and App2 are in production. The rest of the applications aren’t yet, but they’re deployed to a testing environment for our QA, UX and customer.

Monorepo

All the apps, app-ui and app-log are directories. Building an application is a simple elm make AppN.elm and the deploy scripts knows how to work with this. Life is good.

App1 and App2are in production and is working great. We don’t want to touch those if at all possible. However, in a world where App3 through App6 exist, App1 and App2 needs modifications that actually sends the users to these apps. So I guess we setup a prod branch and a test branch to reflect this.

Time passes, and Elm Conf has just released the talks on youtube. We watch a talk by some Richard Feldman guy and realize that if we make certain modifications to our app-ui library, we can turn certain UI bugs into compile errors. It’s a breaking change, but one worth making. However, App1 through App4 are pretty much done UI-wise, so there’s no reason to convert those to our new ui api if we don’t have to. Another branch I guess?

A day later 0.20 later is released, and it has some really nice properties that would be great to have in App6. We don’t have any issues with App1 through App5, so we really only want to update App6 at this point and maybe update the rest of them in a month or so when things have quieted down. A 0.20 upgrade of App6 requires a 0.20 compatible version of app-ui and app-logger though. Lucky for us, branches are lightweight.

Multirepo

There are several ways to deal with private dependencies when making use of a multirepo setup. Both git submodules and git subtrees essentially clone a repo into a sub-folder and commits it to history. I prefer the latter method, as it’s harder to mess up. In either case, we have a copy of some repo inside our repo, and we connect the dots using source-directories.

Are they really that different?

I don’t think so? It feels cleaner to me to separate everything as repositories, having separate wiki and issue trackers per repo, than to have one huge thing and a bunch of branches. But that could just be because I’m used to it from my open-source work, and I just want to do the same thing at work except make internal packages public.

At Gizra, the difficulty we would have with a mono-repo is that we have Elm projects for many different clients, and it is important to our corporate culture to give clients visibility into their repos. But, we wouldn’t be able to give clients visibility into a mono-repo that covered multiple clients.

6 Likes

Comments

@matt.cheely, Google definitely had a lot of internal stuff to help with their code structure. I believe their code review tool (which was really nice!) was created by the creator of Python when he worked there. So I can confirm that it has that cost.

@rgrempel, that seems like strong example. Does that mean clients have access to certain private repos? If you have three clients that all have projects that depend on some shared-thing package, do they all get access to that private repo as well? Isn’t that the same problem you are talking about still? What if one client needs to evolve it in a different way? (I’m not sure if these should be rhetorical questions in the interest of keeping the thread manageable.)

I think pattern here is about code that gets “left behind” on old versions of things. I do not actually know how that works at Google. It may have been a “never remove functions” kind of policy where if you wanted it to work different, it was a new function or a new directory. I do not know though, and I’d be very curious to hear how they handle it.

Questions

First, what does Google do about the problems raised in this thread. Can someone find out and start a new thread describing the techniques?

Second, I am curious about the particular nature of the private code that is shared across projects. I have a certain thing in mind, but I’d like to verify. Can @matt.cheely, @robin.heggelund, @rgrempel, @brian, and others describe the what ends up shared like this in each of your respective companies? Specific examples ideally! Not sure what patterns may mean here, but I think it’s worth checking.

Third, are people putting applications together based on specific commits? Like App1 is the combination of 3ab1c2 and c4de56 and 1e42a9? Or something else?

1 Like

At NoRedInk we have noredink-ui, which I mentioned above. It’s licensed as open source, but doesn’t make sense for anyone but us to use. This means we have to share private code in the following ways:

  • We require package consumers to provide asset URLs, but the package manages asset names. This requires using a shared scheme for assets internally.
  • We never issue a breaking/major change for noredink-ui. If we need to change something, we add a new module called Nri.Whatever.V2 and publish a feature/minor version. It would be a little easier for us to manage this as multiple packages, but it would mean creating more packages the majority of Elm users will never care about.
  • If something really needs to stay private (like the icon URLs, but also UI stuff or data structures which should not be OSS), we will just copy module sources between repos.

We also have to make sure not to share things which should remain proprietary in that repo, which is often difficult. There have been situations where we’ve seen things later and said “well, it’s probably fine that that’s public now but we wish we’d have caught it in review.” To be fair, this is not an Elm-specific problem. I’ve done the same in some of our Ruby repos!

What goes in private packages

We don’t really have a rule for what goes and what doesn’t go in a private package. Anything that is useful to multiple apps are candidates to go in a private package. However, we’re ok with code duplication. Only when we’re absolutely sure that the code would work identically across all relevant apps do we bother packaging it.

In practice, these are the sort of packages we have:

  • ui components (all apps share the same graphical design)
  • common types with helpers (like User and UserSession)
  • logging

How we’re bundling applications

App1 is in it’s own repo. Ideally, whatever is on master can be published into production (in practice we might have a testing branch as well for our test environment). Whenever we deploy an app into production we tag the commit with a version number, so if the deploy fail or we discover a bug, we can easily revert back to an earlier deploy and re-deploy that.

For private packages, they’re essentially cloned into the repo at a specific tag. So, App1 might have v1.2.3 of app-ui checked out under lib/app-ui.