Should I prefer big Elm files?

My question is almost entirely philosophical in nature, and it stems from this quote of the Elm official documentation:

“Prefer shorter files.” In JavaScript, the longer your file is, the more likely you have some sneaky mutation that will cause a really difficult bug. But in Elm, that is not possible! Your file can be 2000 lines long and that still cannot happen.

I don’t come from Javascript so I’m not sure if this suggestion applies to me as well. Even in imperative languages I try to keep a “pure” approach, never relying on global mutable state; that being said, I would still frown in front of a 2000 lines long file out of concern for code readability, editor navigation inside the file and separation of responsibilities. This is not to say that I would split it by any means necessary, but I would frequently consider the idea.

From the tone of the documentation, it almost seems like Elm files should be preferably longer in general. Is this the case, or is it a simplistic advice given specifically to help developers coming from Javascript?

I’ve seen Evan’s talk The Life of a File, but it didn’t really clarify my doubts.

5 Likes

Ooh, fun question!

So, personally, I don’t really agree with that part of the docs:

  • I’ve seen code bases with tiny files which were still vulnerable to the types of mutation bugs described there. And, in these cases, I also had a harder time coming up to speed on the codebase because I had to jump around between files so much!
  • I’ve also worked in codebases with God objects, and despite being in very long files I would not say they were easier to work with, or had more or less mutation bugs.

If you can get mutation similar bugs regardless of file length, file length is not the problem, mutation is. And besides, the kinds of mutation bugs described in this paragraph are not possible in Elm because Elm does not allow mutation, not because Elm encourages a particular file length.

So to answer your question directly, no, I wouldn’t say that Elm files should be longer. They may be longer as a consequence of the things I’m gonna say below, but that’s a consequence only—not a goal!

Now, with that out of the way, we can get to the more interesting question: what is the right file length? I think to answer that you’ve gotta answer “what is a file for?” The basic answer is encapsulation: we can choose to expose or hide things in modules, put different functions in different modules, et cetera. The second answer is namespacing—we can have a toString in both Foo and Bar—but we can’t have namespacing without encapsulation so that’s secondary.

I think that means the right file length is the one where your encapsulation works to prevent bugs, either by preventing encapsulation-violating behavior or making your code easy enough to understand that you can verify it’s doing the right thing. That doesn’t mean long files or short files, but correct-for-the-situation files.

I realize that this basically boils down to “it depends”, which I know is a bit unsatisfying. Sorry! Maybe it will help if I share some assorted things I use to know if a module needs to be broken up (or not):

  • Yes, split if: there’s a function that should not be able to access internals of a data structure, but which can because it lives in the same module. (Of course, the opposite is true too… if there’s a function that needs to be able to access internals but can’t the module boundary may be in the wrong place!)
  • Yes, split if: there are two independent data structures in the same module. One smell here is if I have to write fooToString for the main one and barToString as well, and foo and bar are independently valuable, it may be time to split them into their own modules.
  • No, don’t split if: there is just “too much code” in a file. I’ve split apps into Model.elm, Update.elm, View.elm before and almost always regretted it. Even if it’s a little tricky to navigate a long file, it’s way better than having to perform module gymnastics to prevent import loops etc.

All that said: I would gently encourage people to write long files and then break them up instead of making tiny files before it’s actually necessary. I’ve been using Elm for something like half a decade now and the only reliable way I’ve found to put module boundaries in the right places is to wait and see what the right place is. I suppose that means I am, in effect, advocating for long files over short ones but again: that’s a consequence, not a goal!

Finally, you mention The Life of a File. Good talk, and I’d recommend also watching Make Data Structures by Richard Feldman.

18 Likes

No, this is cool and I think I understand what you are trying to say; plus, I agree that file length is not a problem per se.

This clarifies everything. Thanks!

5 Likes

Defining a separate function for each case, is helpful, but it makes the file longer. OK, no problem with longer files, plus your code can be debugged easily.

One thing worth mentioning is that, effective Elm programming involves combining simple functions together.

In my experience, splitting in files has the additional advantage of helping differentiate when some functionality belongs to scope A or scope B. Specially when people are learning and might get overwhelmed by learning Elm concepts + learning HTML/CSS concepts + learning the company business logic, forcing code splitting into different files, even if they are small, helps developers realize what logic is misplaced.

1 Like

I recently wrestled with this problem in my project. I don’t really like the idea of having massive files with various concerns mixed in. I also don’t really agree that separating concerns in a single file by having comments as delimiters / separators (as has been suggested by the community before) is a suitable alternative to having separate files.

Anyways, the problem I was facing was that my Main module was beginning to get a little unwieldly since my update function was a huge case statement dealing with 15 different Msg values. Here is a snapshot of the Main.elm file before my refactor. Note that it’s not even a thousand lines long, but it was already beginning to be too big for my own liking.

In this commit (and a follow-up commit) you can see I moved my update logic into a dedicated Model.elm file. This file also includes the definition of the Msg type.

It has turned out to be a really nice set up for me. But this is just my preference.

I prefer big files and flat model structures. And have no problems navigating them. In one of my project the Update file contains 6900 lines, and a view file of 5800 lines :grin:
I feel in complete control and have a nice structure of separate files for API endpoints and Data structures/types with encoders and decoders ++
(total size of my app is about 80k lines)

1 Like

The biggest file we have at work is 4674 lines. I don’t mind the length much when navigating between functions, but the case ... of in the update is a little bit unwieldy since I don’t know of any good way of jumping between branches and Elm prints the entire thing any time there’s an error in it which results in miles of scrolling in the terminal.

IntelliJ gets noticeably sluggish in that file. It’s not unbearable, but enough to make me want to split the file up a little.

I think these kinds of concrete issues is a good metric for file length, rather than some kind of philosophic rule. The pragmatic thing to do in our case is to split the file a little – the super ambitious thing is to improve tooling so the file length doesn’t matter anymore.

4 Likes

I take it as more of a game: how long can you go without having to split? Sounds like that’s close to what you did here, and you’re happy with the result. The point is to avoid splitting too early and getting module boundaries in awkward places.

Cool thing here: https://www.unisonweb.org/ One big idea is to see what happens if the primary form of program storage is compiler binary instead of strings of source code.

3 Likes

I wouldn’t mind having big files if I knew how to navigate properly between the main functions!

How do you folks manage, for instance, to go from the Model to the Update and then to the correct branch in the Update without scrolling? I’m using vscode.

I kind of minimized the pain using ASCII decorators along with the minimap but it feels hackish and I still have to scroll the minimap. And It means I also have to be consistent when I create a new branch or functions. And I’m not so it becomes quickly a mess. And my screen is too small :roll_eyes:

Do you know the existence of a tool that would generate a tree view for all the functions in the file alphabetically ordered? If not, what is the complexity of such project? It would be cool to be able to just click the function or branch and start editing!

I recently switched to vscode from sublime and much prefer it. For navigating around the page you can use [command t] on OSX, it will also take you to function definitions in other files. For keyboard shortcuts for other OS’s check the [Go -> Go to symbol in workspace] menu option.

Don’t know about getting to specific branches in update though.

You might also find [shift command O] useful; [Go -> Go to symbol in editor]. It provides a list of all the symbols in the editor, and you can click or navigate up and down through the menu to jump to that part of the page.

Also, just in case you’re unaware, command clicking a function will take you to it’s definition.

As for the main question, I’d have to say ‘it depends’. I’d agree with others that the focus shouldn’t be on small or large files, but what makes sense/feels right/works for the particular situation and individual/team. I don’t think there is a right or wrong answer - other than don’t split for the sake of it (as already mentioned).

1 Like

When I want to go to a specific branch in update, it’s usually because I’m looking at onClick MyMsg or something. Then I use “go to definition” on MyMsg which takes me to it in type Msg =. There, I use “show usages” on MyMsg and pick the ones that ends with an arrow MyMsg ->. That usually takes me to where I want in update.

If I wasn’t looking at onClick MyMsg, I sometimes use ctrl+shift+o to bring up the Outline menu where you can type part of the definition you’d like to go to. I type “Msg” to go to Msg. There I scroll to MyMsg and do the “show usages” thing.

2 Likes

Just to make sure I’m not missing something, there is no solution to the update function becoming too long, correct? Since TEA only has one point to manage all update messages that function will inevitably become bloated if there are a lot of events, and I don’t think structuring the model to split up into sub-updates is a good solution (i.e. I wouldn’t want to change data structures because a file is becoming too long).

While an editor function to jump between cases might be useful it’s a very specific use case, I don’t think I’ve ever heard anything like it.

Isn’t this (splitting model-view-update in different files/modules) considered an anti-pattern for the The Elm Architecture?

You are not forced to have a huge update function. Having one file is a recommendation, not a requirement.

There’s no obvious “just do this” answer. When I started at work the file was ~800 lines. Then it has slowly grown for 1.5 years and it’s just recently that I have started noticing it being a little bit unwieldy, so I haven’t looked into yet how we could improve it. But I’m confident we’ll find a nice way. It’s important to remember that it’s possible to split things up in other ways than doing a whole “sub-TEA” in every file. It should also be noted that the page this file is describing contains a lot of features. So it’s also a bit of a UX problem.

Isn’t this (splitting model-view-update in different files/modules) considered an anti-pattern for the The Elm Architecture?

Maybe. But I subjectively find it to be a well-organized structure that works for me (for the time being at least).

Usually, an Elm file should be treated similar to a class in OOP

If my application needs to handle 1000 message types my update function will be at minimum 2000 lines long, right? One line for the case and another to call a subfunction that handles it. I’m aware the example is extreme, just making sure I’m not missing something obvious. I can move the handling of single cases in different files (and that has always been enough in my experience), but the update function itself cannot be split.

If the complexity of a page warrants it, you can split it up into “components” with each component into its own module with its own Model, Msg, update and view. The main idea is to reach for this option only when all other options have ran out.

The way I handle complexity is like this: Usually, the context is a Page where something happens to some data. I move all the Business data into its own module. In that module I handle all the serialization and the various state changes and conversions/transformations for that specific data. Then, all the backend calls related to the page are implemented into an Api.elm module. This is sufficient for most simple interactive pages and it keeps the file to bellow 1k LOC which I find highly manageable for page modules.

As the complexity of the code rises, I find that I sometimes move some of the UI view functions into a separate UI only module. Usually they are view bits that can take parts of the page data and maybe a couple of message handlers and render some simple view. These UI modules can easily grow beyond 1k with very little concern. Most of these functions are short and can be easily understood without much context. The exposing part of these modules act as a TOC and CMD-D in Sublime takes me to the next instance of the text which is usually the implementation (if I need to debug anything about it).

As the complexity continues to rise, some of the helper functions that might have general usefulness find a new home in some helper module. These modules are similar to the UI modules above, just a long list of simple and easy to understand functions. I don’t mind if they go above 1k.

Sometimes I do some simple refactoring with these modules and group the functions under various namespaces like Ui.Layout or Ui.Icons.

As the last resort, when all of the above strategies have been deployed and the page code is still very large, I identify and split a section of the UI into its own module.

By the time I get to this point, the complexity of the code in that single page far exceeds the complexity of the entire package.elm-lang.org or elm-spa-example apps.

1 Like

I much prefer smaller files even in Elm. Files longer than a few hundred lines stress me out (they still happen, but I’m grumpy about them), and I love it when I can keep them under 100 (though for some types of files, Update in particular, I rarely if ever manage under 100). I find it much easier to navigate around and keep track of what depends on what when stuff is divided into a lot of smaller files vs a few huge files.