Tailwind and Elm = Love?

Hi there!

I’m currently working on a library to implement Tailwind framework on Elm. I wanted to have a typesafe library but without making the regular Elm heavier. I also wanted to override some classes if another Tailwind class is put at the end.

To explain what I want, here is a small example:

{- A default implementation -}
a : List (Attribute msg) -> List (Html msg) -> Html msg
a attributes =
    Html.a
        (attributes
            |> (::) (class "m6")
        )

{- A usage in a view -}
div []
    [ a  [ class "m4" ] [ text "Test" ]
    ]

And the idea is to remove the default “m6” because of the “m4” that has been append at the end of list.

Anyway, I did something that works. Not great, but it works. Currently, the ability to filter the classes based on conflicts is not implemented yet. It is available over here:

As you can see in the Github Action, the first compilation during the elm-test is very, very, very long. Too long. Really. I’m talking about 30 min of compilation.

Do you think the library is simply to complex? Creating a custom Class type is maybe overkill where I can remove it and the Translation to directly give the tailwind class as a string, and developping the filter feature by analysing the string.

Cheers guys :slight_smile:

2 Likes

Wow, 170 000 lines of Elm code! That’s more than our entire code base at work (110 000 lines).

It’s interesting that elm make takes so long to run! At work it compiles all our code in seconds. Maybe your code is hitting some edge case that causes especially bad compile times.

Most compile times are pretty quick these days, but I have a theory that explains the outliers in compile times I have heard about so far.

EDIT: MY THEORY IS NOT CORRECT AT ALL IN THIS CASE!

It turned out later in this thread that this project has a case with 19k+ branches. It seems like that makes exhaustiveness checking slow :sweat_smile:

Anyway, here’s what I wrote before we found that all out!

If the theory is correct, I would expect @NicolasGuilloux’s project to have two or three of these characteristics:

  1. .elmi files bigger than 5 MB in elm-stuff/
  2. heavy use of records or extensible records in the functions and types available through the various module ... exposing (..) in the project
  3. very long user name, project name, module names, and/or function names

Is that the case?

Your scenario could be good evidence that this theory is valid, and help me know if some of the ideas I have will actually help. Very curious to hear!

The Theory

Since a type alias is just a name for another type, it is stored in interface files as (the name + the aliased type) so that all the information needed for type inference is available. So saying Record -> Record in Elm could map onto a much bigger type in the interface files. And when those files are read into memory, they take up a lot of space because there is a copy of the underlying aliased type for each usage. So compilation ends up being slow mostly due to file IO and frequent GC since the program keeps overrunning the heap size.

If people are mostly using opaque types and strong boundaries, they tend to use type which does not have this issue. I suspect @lydell has a project like that. But there are some cases (especially when integrating with some OO kind of system) where it makes sense to use a large type alias more often, and that seems to be the predictor for having these outlier compile times.

Temporary Work Arounds

First, calling elm with additional heap space can also make GC less frequent, which can help a lot in some cases. Elm normally asks for 128mb of heap space, but the following call would ask for 1 GB instead:

elm make src/Main.elm +RTS -H1024m

This can help a lot, especially if you are on a computer that has a bunch of RAM and you can make the heap bigger than needed. This is an easy change that can help anyone running into this.

Elm 0.19.1 was built with GHC 8.6.3, so you can tweak this more with the flags listed here.

Second, it may be possible to identify particular modules that trigger these issues and change them around. The steps are:

  • Look through elm-stuff/ for elmi files that are quite large (like 20 MB)
  • Identify any type alias that is large and commonly used in the corresponding Elm file
  • Temporarily swap it to a type

If you try this and it makes a difference, I would be very curious to hear! This would be good evidence in favor of the type alias theory.

I would only recommend trying this second path in certain cases though. If it makes your code better, then 100% go for it! If it does not help with compile times in practice, let me know and revert it. If it does help with compile times, let me know as well! I can only really recommend on a case-by-case basis whether it’s worth it in that last case.

I suspect this approach may not be viable for a library that wants to have large records as a central feature of its API design though, so this probably works better for applications and companies.

Ideas for a Compiler Fix

I explored a revamp of elmi files that would ensure that each type alias is only stored once per elmi file. This would make those outlier files much smaller (helping with file IO) and would mean that every usage of a type alias would point to the same underlying type in the heap (helping with GC). I also explored storing type names as UInt32 using some tricky scheme. That would cut down the size of elmi files even more.

I suspect these two changes would help with the performance issue, but they are quite tricky to implement. The elmi idea requires some surprising topological sorting and an alternative to Haskell’s Data.Binary library to permit ideal sharing in the heap. The UInt32 idea is particularly disruptive because it requires changes in type inference and error messages. So while it appears to be quite valuable for perf, it ends up requiring changes across the whole compiler.

Anyway, this is an offshoot of the exploratory compiler work I’ve been doing. I know enough to know they seem possible, and that there are a couple avenues to explore for elmo files as well. So I think the best I can do on timeline is what I wrote here. While I am comfortable saying that I really want to get these ideas into the next release of Elm, I would not want anyone to (1) imagine that it is just around the corner or (2) think I have hard data that these changes will definitely resolve their outlier compile times. It’s still a theory. Furthermore, large infrastructure projects like this take significant time to do well (and require coordination with tools like elm-test that peek at interface files) and I am trying to balance these ideas with more ambitious explorations.

Hopes

I hope the information in this post is useful to @NicolasGuilloux or anyone else with outlier compile times. I also hope that sharing my ideas on how to improve the compiler does not create animosity towards me or the project. I am working as fast as I can, but sometimes a thing that is easy to describe can take a long time to implement.

Finally, I hope that people will not take this detail into account in API design unless they are experiencing very extreme compile times. I think it would be a shame to have API design based on behavior of the compiler that may be resolvable through compiler infrastructure changes, but I appreciate that some balance must be struck on a case-by-case basis in the meantime.

11 Likes

Yeah, there are a lot of lines. Sadly I can’t reduce the number of entries, because this number is actually the number of Tailwind classes available. But I think maybe I can change the way I structured my data, especially the type Class. I though in the first place that it was a good idea to have each class represented as a type but it is not really useful in the end.

Especially if I start this filter feature I mentionned above, since I will need to explicitly describe each conflict between classes, where I can simply use a string, and work around some regex.

Still, it is a good experience to push the Elm compiler and my CPU to its limit :slight_smile: By the way, I developed it with PHPStorm and it was a pain. Every time I generate the code, PHPStorm freezes because it tries to parse the code.

By the way, I should mention that I used the postcss-elm-tailwind but I faced the same issue with the generated file, especially because everytime it recompiles the code, it regenerates this big file and made my IDE struggle. :stuck_out_tongue:

1 Like

Hi! Hey, first post here and the creator took his time for a really complete answer. Thanks a lot :smiley: I’ll do my best to investigate!

I looked at the .elmi files and the biggest one is the one storing all the method that prefixes the type Class with the Tailwind type. I’m not sure about the terms here, sorry. The other biggest file is the one that stores the type Class. Respectively, they are ~1.6MB and ~750kB. I don’t know if it is big or not, but for comparison the Html proxy module is ~40kB.

File Size Description
Tailwind.elmi 1.6MB A method for each value of Class prefixed with Tailwind
Tailwind-Classes.elmi 749.1kB Stores each Tailwind class as a type
Tailwind-Html.elmi 39.9kB A proxy module of the Html module
Tailwind-Attributes.elmi 8.7kB A proxy module of the Html.Attributes module
Tailwind-Translations.elmi 156B Contains only one method to convert a Class into a String

I can’t finish this post tonight, I will edit it to provide more information about some tweak I’ll try. I have to go right now :frowning:

Thanks again for the complete answer :slight_smile:

1.6 MB is not so big as to be a concern. The case I looked into deeply had 20 MB elmi files and a decent bit more lines than you, but they were still having faster compiles than you.

Also, a type should be no problem. It’d only be a type alias that can make things larger than you’d expect.

It sort of sounds like something different may be going on in your case. What are the specs of your computer? RAM? Free memory? CPU?

Also, is there anything weird about the files you are generating? Very long files? Very long individual lines? I guess generated code is a case where there are more things to worry about in terms of outlier behavior. Anything weird there?

The specs of my computer are not an issue, I have 32GB of RAM and a 8 core 4GHz CPU. I am currently comparing performance with various values of the heap space.

Nothing weird in the file I generated except they are very very long. The lines are pretty short, and I use elm-format after generation to make sure everything looks sweet.

File Size Number of lignes
Tailwind.elm 1.9MB 114595
Tailwind.Classes 405.4kB 19103
Tailwind.Translations 1.1MB 57298

I took some time to present the file and their content here: https://github.com/NicolasGuilloux/elm-tailwind/blob/type_implementation/Docs/FILE_CONTENT.md

After some elm make on each file separatly, I notice that the one giving me some trouble is Tailwind.Translations. The other are compiling pretty fast (<6s to compile every class except the Translations module). Replacing all case ... of with a fixed return value fixes the very long compilation time.

Do we have a better approach than a case ... of to compare values?

EDIT: About the benchmark, increasing the Heap space does not impact the compilation time. I build the Tailwind.Html module since it requires all of the other modules. I removed elm-stuff between each benchmark.

Command arguments Time
none 565s
+RTS 549s
+RTS -H1024m 565s
+RTS -H2048m 553s
1 Like

Weird! I guess it could be pattern match exhaustiveness checking.

Can you share what those case ... of expressions look like? How many variants are in your custom types? Are you pattern matching on pairs or triples of values?

Yeap! Everything is explain here: https://github.com/NicolasGuilloux/elm-tailwind/blob/type_implementation/Docs/FILE_CONTENT.md#tailwindtranslations

There are 57299 lines with 19095 cases, the case is only on a single value. I really start to think that I should not use custom type but rather a type that stores a string value.

toString : Class -> String
toString class =
    case class of
        Container ->
            "container"

        SpaceY0 ->
            "space-y-0"

        SpaceX0 ->
            "space-x-0"

        # ...
1 Like

Interesting! Can you make a single file without package dependencies that exhibits this behavior? One that’d be easy to download and build.

This could be a good example to profile and see if there are any changes that could help. I suspect 19095 cases is not very typical, but still could be interesting :sweat_smile:

I concatenate the file containing all the files and the translations, and it has same horrible compiling time. You can download at the link below. Simply execute the build.sh script to build it with the execution time.

1 Like

So I put the initial implementation which creates very long compilation time into the type_implementation branch. For now, I’ll focus on the filter feature :wink:
I can continue the benchmark if anybody find a solution that may improve the compilation time! Cheers guys :wink:

I think doing pattern matching on 50k variants would have quite suboptimal runtime performance as well compared to storing strings as values, as it would have to linearily loop through all the variants on every usage.

Also, storing them as separate values means the unused ones will be cleaned up by dead code elimination, quite a big deal in this case!

2 Likes

That’s true, I forgot about dead code elimination. Maybe the compiler tries to clean the unused cases and therefore takes a lot of time everytime I use once again a function from the library.

I must admit that maybe I over-engineered this, and simply storing the class as a String is much more efficient than creating a type storing all the class, and a module to translate it in a string. I will be able to do the same filtering things on a string. Maybe even a little faster where I can simply check if the beginning of the string is in conflict with another class.

Anyway, it is a good study subject for the compiler performance, even if I will not keep this implementation :stuck_out_tongue:

I know not the details about Tailwind, but it seems to be a hierarchy in the names. So why not keep the hierarchy?

toString : Class -> String
toString class =
    case class of
        Container ->
            "container"

        Space xy ->
            "space-" ++
                 case xy of
                    x a -> "x-" ++ stringFormNumber a
        # ...

As user defined type are not comparable, a Dict is not an option.

It can be a way to improve the library. For now, doing this requires to understand how the class is actually constructed and may be a little bit harder to generate.

The compiler has (toplevel) function-level dead code elimination. So it only removes unused toplevel functions – not stuff within functions such as unused cases of case-ofs. Unless I’m mistaken.

1 Like

I assume, unused cases are removed by minification: https://guide.elm-lang.org/optimization/asset_size.html

1 Like

I’ve never heard of a minifier that knows that it’s safe to remove case "whatever": from a switch. There’s nothing in that link saying anything about this? This can be verified by creating an example, compiling it, minifying it and checking the output but I don’t feel like it. :man_shrugging:

1 Like

Just go to https://xem.github.io/terser-online/ and enter

switch (0) {
  case 0:
    day = "Sunday";
    break;
  case 1:
    day = "Monday";
    break;
  case 2:
     day = "Tuesday";
    break;
  case 3:
    day = "Wednesday";
    break;
  case 4:
    day = "Thursday";
    break;
  case 5:
    day = "Friday";
    break;
  case 6:
    day = "Saturday";
}

and see the result. (Example modified from https://www.w3schools.com/js/js_switch.asp)

Terser is the successor of the mentioned uglifyjs.