Roadmap for internal packages?

At my place of work, we have considered a monorepo approach, but opted against it for reasons that are more organizational than technical. We’re a big company, and have grown through new projects and offices spinning up, and we’ve been through multiple acquisitions (on both sides). Additionally, we have products that are shipped to customers on-premise in fixed releases as well as cloud software offerings that deliver continuous updates. Within any of those products some teams work on critical-path items that require a high SLA, and others work on things with lower SLAs where feature delivery has more impact. All of this is to say that we have a very diverse set of development cultures.

Adopting a monorepo approach would mean getting all of those groups on board with it for at least some of their code, and figuring out and adopting tooling to make sure that the right teams are included on code reviews that impact their area of ownership. It means teams who are used to having a lot of autonomy to control their deployment and branching strategies might not be able to do so as much any more.

Also, our different product lines (on prem vs cloud) tend to have different approaches to adopting dependency updates, and it seems to me like it would be more difficult for those products to be on different versions of shared dependencies in the monorepo model (although if there is a solution to this, I would be very interested to hear about it).

That’s not to say that it would be impossible for us to move to a monorepo. We’re certainly not as large of an organization as google, and since we didn’t actually move forward with that approach, I don’t have first hand experience of how it would work for us. That said, I feel confident in saying that it would be a big effort and require a lot of time spent convincing folks that it’s worth it and getting buy-in across the organization. So if I suggest that we should use more Elm, but it comes with the caveat that we need to use a monorepo for private code, Elm just became a much harder sell.

With all of that said, I don’t feel like we’d need a full-on private repository to be successful (Although I would surely miss semver enforcement). Even just having tooling to point builds at a local filesystem path as a dependency would solve many issues dealing with private shared libraries, IMO.

4 Likes

I guess a monorepo would be completely possible, but it seems more confusing than our current model.

Let’s pretend I have 6 apps: App1 through App6.
They all have the same design, and log errors the same way, so they also rely on: app-ui and app-log.

App1 and App2 are in production. The rest of the applications aren’t yet, but they’re deployed to a testing environment for our QA, UX and customer.

Monorepo

All the apps, app-ui and app-log are directories. Building an application is a simple elm make AppN.elm and the deploy scripts knows how to work with this. Life is good.

App1 and App2are in production and is working great. We don’t want to touch those if at all possible. However, in a world where App3 through App6 exist, App1 and App2 needs modifications that actually sends the users to these apps. So I guess we setup a prod branch and a test branch to reflect this.

Time passes, and Elm Conf has just released the talks on youtube. We watch a talk by some Richard Feldman guy and realize that if we make certain modifications to our app-ui library, we can turn certain UI bugs into compile errors. It’s a breaking change, but one worth making. However, App1 through App4 are pretty much done UI-wise, so there’s no reason to convert those to our new ui api if we don’t have to. Another branch I guess?

A day later 0.20 later is released, and it has some really nice properties that would be great to have in App6. We don’t have any issues with App1 through App5, so we really only want to update App6 at this point and maybe update the rest of them in a month or so when things have quieted down. A 0.20 upgrade of App6 requires a 0.20 compatible version of app-ui and app-logger though. Lucky for us, branches are lightweight.

Multirepo

There are several ways to deal with private dependencies when making use of a multirepo setup. Both git submodules and git subtrees essentially clone a repo into a sub-folder and commits it to history. I prefer the latter method, as it’s harder to mess up. In either case, we have a copy of some repo inside our repo, and we connect the dots using source-directories.

Are they really that different?

I don’t think so? It feels cleaner to me to separate everything as repositories, having separate wiki and issue trackers per repo, than to have one huge thing and a bunch of branches. But that could just be because I’m used to it from my open-source work, and I just want to do the same thing at work except make internal packages public.

At Gizra, the difficulty we would have with a mono-repo is that we have Elm projects for many different clients, and it is important to our corporate culture to give clients visibility into their repos. But, we wouldn’t be able to give clients visibility into a mono-repo that covered multiple clients.

6 Likes

Comments

@matt.cheely, Google definitely had a lot of internal stuff to help with their code structure. I believe their code review tool (which was really nice!) was created by the creator of Python when he worked there. So I can confirm that it has that cost.

@rgrempel, that seems like strong example. Does that mean clients have access to certain private repos? If you have three clients that all have projects that depend on some shared-thing package, do they all get access to that private repo as well? Isn’t that the same problem you are talking about still? What if one client needs to evolve it in a different way? (I’m not sure if these should be rhetorical questions in the interest of keeping the thread manageable.)

I think pattern here is about code that gets “left behind” on old versions of things. I do not actually know how that works at Google. It may have been a “never remove functions” kind of policy where if you wanted it to work different, it was a new function or a new directory. I do not know though, and I’d be very curious to hear how they handle it.

Questions

First, what does Google do about the problems raised in this thread. Can someone find out and start a new thread describing the techniques?

Second, I am curious about the particular nature of the private code that is shared across projects. I have a certain thing in mind, but I’d like to verify. Can @matt.cheely, @robin.heggelund, @rgrempel, @brian, and others describe the what ends up shared like this in each of your respective companies? Specific examples ideally! Not sure what patterns may mean here, but I think it’s worth checking.

Third, are people putting applications together based on specific commits? Like App1 is the combination of 3ab1c2 and c4de56 and 1e42a9? Or something else?

1 Like

At NoRedInk we have noredink-ui, which I mentioned above. It’s licensed as open source, but doesn’t make sense for anyone but us to use. This means we have to share private code in the following ways:

  • We require package consumers to provide asset URLs, but the package manages asset names. This requires using a shared scheme for assets internally.
  • We never issue a breaking/major change for noredink-ui. If we need to change something, we add a new module called Nri.Whatever.V2 and publish a feature/minor version. It would be a little easier for us to manage this as multiple packages, but it would mean creating more packages the majority of Elm users will never care about.
  • If something really needs to stay private (like the icon URLs, but also UI stuff or data structures which should not be OSS), we will just copy module sources between repos.

We also have to make sure not to share things which should remain proprietary in that repo, which is often difficult. There have been situations where we’ve seen things later and said “well, it’s probably fine that that’s public now but we wish we’d have caught it in review.” To be fair, this is not an Elm-specific problem. I’ve done the same in some of our Ruby repos!

What goes in private packages

We don’t really have a rule for what goes and what doesn’t go in a private package. Anything that is useful to multiple apps are candidates to go in a private package. However, we’re ok with code duplication. Only when we’re absolutely sure that the code would work identically across all relevant apps do we bother packaging it.

In practice, these are the sort of packages we have:

  • ui components (all apps share the same graphical design)
  • common types with helpers (like User and UserSession)
  • logging

How we’re bundling applications

App1 is in it’s own repo. Ideally, whatever is on master can be published into production (in practice we might have a testing branch as well for our test environment). Whenever we deploy an app into production we tag the commit with a version number, so if the deploy fail or we discover a bug, we can easily revert back to an earlier deploy and re-deploy that.

For private packages, they’re essentially cloned into the repo at a specific tag. So, App1 might have v1.2.3 of app-ui checked out under lib/app-ui.

Answers from there: https://www.youtube.com/watch?v=tISy7EJQPzI

  • stuff that isn’t tested deserves breaking
  • “live at head” (i.e. code that is left behind is abandoned). The principle is that code is built and tested frequently. Google has the means to store external stuff in a vendor folder, and maintain it if it becomes a problem.

Here is our example use where it seems like a private package method would have been better.

We were building a page with a datepicker. As a small company we don’t have the resources (time) to build this from scratch ourselves (especially as we are still learning Elm), so we wanted to use a package.

This was great, however we weren’t able to do everything we wanted with the package api - there were a few modifications that we had to put in because we were using a datepicker in a slightly non-standard way.

Therefore we forked said package and submitted changes as a PR to the package, but we couldn’t wait around for these to be merged (and it may not even have been right for them to be merged in), so we now had the dilemma of how do we distribute the altered package in our codebase.

The easiest solution (and the one we went for) was to publish our forked version of the repo as a separate package. This solved our issue but felt bad because we were essentially cluttering up the elm package list just so we could easily distribute our code

2 Likes

We use private packages to distribute application independent concepts like shared API types and UI components. We have multiple teams in different parts of an organization, so things that rarely change (e.g. UI components) are shared across the board, while API types can be local to just a few apps.

Some apps and packages live together in multirepos while others are independent (different things play into this; dev preferences, history and bureaucracy). Versioning is necessary to prevent slow apps holding back progress elsewhere (like upgrading to 0.19 but being dependency blocked), so we tag versions for release and distribute them privately with npm.

So I’m a consultant, which sometimes can influence how the client organises their repos, but mostly I’m limited to giving advice. I’m quite often in the position to suggest Elm though. Very often they have java/jvm stack, with maven repos for sharing internal jars and private npm repos for sharing js related stuff. In addition many companies uses proxies(local cache) for repos using tools like nexus/artifactory etc.

Given that clients instinctively want to be able to create shared code that they do not want to publish to the global elm package repo; I think it’s a hard sell to convince them to use Elm and at the same time tell them that they need to reorganise their repos to a mono-repo.

6 Likes

Have you attempted sharing Elm code using npm? What issues are you noticing? We have started using a private npm package holding shared UI code which seems to work fine.

Transitive dependencies can be a pain. In the eyes of the compiler you just include additional source dirs, so there isn’t really an elm.json to speak of. You can create one for your own sake while developing the package, but you have to make sure your applications also have all the direct and indirect dependencies of the private packages they use.

You also have to do manual versioning. That is less of a pain, but automatic semver would still be a nice-to-have.

Yes - a local proxy can be very important to cover the situation where the public package repo goes down and then you have 20 developers sitting around unable to get on with their work. Ideally we would create plugins for nexus and artifactory and a modified tool chain that can pull from them.

1 Like

Someone showed me this link that gets into why large companies use monorepos. I encourage folks to check it out.

Separately, I made a plan about how you can do a multi-repo setup here, but I still encourage you to check out both links. They have counters :wink:

I like the usage of monorepos as well but wanted to point out for the “store everything in one repo” crowd that Google has their own DVCS and lots of tooling for it. They can (aka have no other choice than to) checkout slices since the full repo is too large for a single machine and the data is stored in their Big Data infrastructure. They also have user permissions for subtrees which Git doesn’t by design. They also have tooling for sweeping changes over large parts of the repo as well as rollback etc. I’ve read that their deployment pipelines always point to the latest master of their libraries so everybody is pretty careful not to check in broken builds whose tests are greeen :slight_smile: . Git or your git repo host may also have problems with huge repos in the long run since it wasn’t designed for that use case, BitBucket seems to have that at least and GitHub has a soft limit of 1GB, Facebook also struggled.

I’m also wondering if something fairly simple could be set up using squid to proxy the package server and github onto my local network:

http://www.squid-cache.org/

For example, I already set up a VM to run deb-squid-proxy, so every time I commission a new box it is really fast and I can do it without an internet connection too.

When? I think you are writing this as a warning to people going down this path, but I wanted to point out that they ran into this stuff once they reached a certain size. That threshold may be quite high. Does it happen at 200 engineers? Or 1000? Or 100? At any of those sizes having some people do this work doesn’t seem like that big a deal.

Relative to what? I got to watch some companies go through the “grow from 50 to 200 employees” transition, the amount of time and energy spent on getting microservices and multi-repos working was really high. There was a dedicated team and lots of projects had to integrate with that, and the integrations weren’t friendly for all different languages. Point is, a serious amount of work exists on this path as well.

In summary: It seems like companies go from monorepo to multi-repo back to monorepo as they grow, and I’m not convinced that means it is the fastest, cheapest, or easiest path.

Point taken, like I said the monorepo feels like the best option to me as well and it’s good to think about the questions you posed. That’s very useful information by the way :slight_smile:

Google wrote their own distributed VCS at around 50,000 full time employees.

Here’s a pretty good talk from Google about their use of a monorepo.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.