Language proposal - Introduce "or" patterns


#1

Or patterns

I’d like to suggest introducing or patterns. I’ll explain the simplest case of this and try to give some justification for inclusion in the language, then work up to more complicated examples. The simplest case is to simply allow multiple cases before the -> in a case expression. Something like this:

isEmpty : Maybe String -> Bool
isEmpty mString =
    case mString of
        Nothing
        Just "" ->
            True
        Just _ ->
            False

Ocaml is an example language that includes this functionality.

Motivation

Allowing the use of these patterns has the following advantages:

  1. It’s a bit easier to see that two (or more) cases produce the same answer.
  2. Some code duplication removal. If multiple cases have the same answer, then currently you have to copy that answer. If the answer is a complicated expression you can use a let (or just another definition within an existing let), however doing so produces names at a scope larger than you would really want, and may involve unnecessary computation (I don’t know if the compiler is smart enough to move such expressions around). In particular the name that is introduced must be available in all cases, even though you are only using it in a subset of them.
  3. Speculative: I would hope it would reduce the instances of _ -> because it’s easier to list the remaining cases, thus meaning that updates to a custom type are more likely to produce a compiler error where appropriate.
  4. More speculative: It’s possible that allowing this can improve the compilation of pattern matching. See for example: http://pauillac.inria.fr/~maranget/papers/opat/

The two patterns in the or pattern above do not introduce any new variables. But they could, and when that happens of course both patterns must introduce the same names at the same types.

In theory or patterns could be restricted to the top level but they could also be sub-patterns, Such as:

containsEmptyName : Tree -> Bool
containsEmptyName tree =
    case tree of
        Node (Nothing | Just "") _ _ ->
            True
        Node (Just _) left right ->
            case containsEmptyName left of
                True ->
                    True
                False ->
                    containsEmptyName right
        Leaf _ ->
            False

Here the or pattern is nested within a constructor pattern. I’ve had to invent syntax for it and used | to separate the two patterns. I’m not suggesting that this is indeed the appropriate syntax.

Small example, consider testing an Http.Error for whether or not it is probably network related:

type Error
    = BadUrl String
    | Timeout
    | NetworkError
    | BadStatus Int
    | BadBody String

Currently you have to write this as:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout ->
            True
        NetworkError ->
            True
        BadUrl _ ->
            False
        BadStatus _ ->
            False
        BadBody _ ->
            False

The lazy might do this following, meaning you won’t get warned if a new constructor is added to the Error type:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout ->
            True
        NetworkError ->
            True
        _ ->
            False

Under the proposal to add or-patterns you can do this:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout
        NetworkError ->
            True
        BadUrl _
        BadStatus _
        BadBody _ ->
            False

It’s arguable, but to my mind the latter is clearer than the lazy version even if we forget about being warned when a constructor is added to the Error type.

Disadvantages

  1. It increases the complexity of the language, I would say fairly modestly.
  2. This would likely lead to developers asking for ‘guards’ on cases a la Haskell, or ‘not’ patterns.
  3. It might be easier to mistakenly include a pattern in a group? Well it almost certainly would be easier, I’m not sure whether it represents a major problem.
  4. Speculative: Might produce error messages that are difficult to understand.

Not a disadvantage but a counter-argument to advantage number 2: This doesn’t solve all instances of this problem because sometimes multiple cases share an intermediate value but not the entire answer.

Other considerations

A potential corner case is related with Elm’s special type-variables

type Number = MyInt Int | MyFloat Float

...
    case n of
        MyInt x
        MyFloat x ->
            x + 1

I think in this case the compiler should reject the or-pattern as defining the same variable with different types, the fact that the expression is valid for both is neither here-nor-there as you could, for example get the same thing by using identity or always when the two different types don’t even share a class. Note that in any case the result of the expression is a different type anyway, though it’s isn’t hard to come up with an expression that would have exhibit this problem but produce the same type.

Extensible record types may also represent something of a corner-case. Consider the same example but instead of Int and Float you have some record type and some extension of the same record type.

Relevant link

  1. Ghc proposal to add Or-patterns to Haskell: https://github.com/ghc-proposals/ghc-proposals/pull/43

#2

Worth giving an example of how we deal with this currently:

case msg of
   Click _ ->
      some long and complex expression
   PageLoad _ ->
      some long and complex expression
   _ ->

Becomes:

let
   expr =  some long and complex expression
in
case msg of
   Click _ ->
      expr
   PageLoad _ ->
      expr
   _ ->

Which isn’t so bad. But it can be a hassle to pull out that let expression, and sometimes you might decide after all that the two events should not be handled the same and you end up copying it back in again.


#3

Right, and with or-patterns:

case msg of
     Click _
     PageLoad _ ->
           some long and complex expression
    _ ->
           doesn't have expr needless calculated nor in-scope.

Of course the ‘calculated’ is probably a bit mis-leading here since you will usually be using some of the argument parts of Click and PageLoad and hence what you probably do is:

let
      f a =
           some long and complicated expression
in
case msg of
       Click a ->
            f a
       PageLoad a -> 
            f a
       _ ->
            ...

With the or-patterns you don’t need the defined function at all. It’s still the same as above because a is in scope for the “long and complex expression”.


#4

I’m really surprised that this is such a recent addition to Haskell. I think it’s been in OCaml since, approximately, forever. And having used it extensively there, I have not noticed any downsides with it and found its workings outright obvious. I think I learned it by just trying the obvious and finding that it just works. Or perhaps I saw it used in existing code and immediately understood what it did. In any case, there was no hurdle to understanding the concept, for me.

OCaml does already have guards however, and I can see people asking about that (though I’ve also already come across several threads asking for it in Elm). I don’t see this as a slippery slope towards guards, however, as there are several strong arguments against it that don’t apply to or-patterns, such as breaking exhaustiveness checking and compile-time optimization, as well as being both far less obvious and less useful (although still very useful I think, and I would personally love having them).

I have never seen anyone ask for “not-patterns”, and don’t think that will really be an issue. If you have so many patterns that you can’t bother to list them all, you likely have bigger problems with your code (though I’m sure there are a few exceptions, there always are, but not enough to justify language-level support I would think).

An alternative syntax could be to include the -> after each item:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout ->
        NetworkError ->
            True

        BadUrl _ ->
        BadStatus _ ->
        BadBody _ ->
            False

#5

Thanks, your experience in O’Caml matches mine, but it was more than a decade ago (which also gives a minimum for how long they have been there).


#6

I miss this feature too, but I would prefer separating by comma. Example:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout,
        NetworkError ->
            True

        BadUrl _,
        BadStatus _,
        BadBody _ ->
            False

#7

I think if or-patterns were to be introduced we would want them also to be nested, which means that they could come within parentheses and hence using a comma to separate them would clash with tuple-patterns.
Consider:

    case expr of
          Just ("", "0") ->
                 ....

That could be either a Maybe String pattern where the argument is an or-pattern or a Maybe (String, String) pattern.


#8

I don’t see a big need for this since you can easily bind the functionality to a let and do

let common x = ...
in case ... of
   One x -> common x
   Two x -> common x

However, one feature that I do miss is pattern guards, the ability to guard a case with an expression of the matched variables, like

case ... of
   One x | x > 10 -> something
   One x -> something else

I realize that this would make completeness analysis harder, but it would be great since the alternative is to have nested cases and duplication.


#9

I see this as a negative development.

It is very easy to make a helper function, or put the expression in a let like @rupert suggested. If you can do something with helper functions then you should. They always help and they never hurt.

Im not really seeing the plusses. One less function definition? A new syntax that better shows that two cases lead to the same result? Those dont seem like anyones bottlenecks. The quantity of stuff to type isnt anyones bottle neck, and I cant recall ever getting in big trouble because I couldnt recognize fast enough that two cases go to the same result.

On the other hand, here is a story from my own career:

Some years ago, I was working for a client who had a React/Redux project. My friend and I were on the job; him being a much older and more experienced developer than I was. We came across the redux reducer in he project (the equivalent of the Elm update function), and it was written as a switch statement. This was the code:

switch(action.type) {
     case "action1":
     case "action2":
        return behavior(state)

Neither of us had any clue what was going on, and for my friend, this was despite his decades of experience of JavaScript and Java. I had go to through the motions of thinking this was broken code, or that "action1" was supposed to lead to some kind of “do nothing” state. Of course, it wasnt either of those things, it was just a bizarre and unexpected language feature that I couldnt figure out just by reading the code.

Before that experience, we both had a fairly simple intuition that “one case leads to one result”, that in all likelihood no one had to teach us, that was in our own practice true 100% of the time up until then. In my view thats a very good place to be as a developer.


#10

This is an interesting experience report, thanks for the feedback.

If I’m honest, I would never have believed that seeing the switch statement you showed, anyone would not have “any clue what is going on”. To me it seems like there is exactly one possible thing that could mean. Of course, as I said, it’s been over a decade since I first encountered or-patterns.

When I started reading the above I thought you were going to document a case where someone mistakenly had the same action for two cases because they had simply forgotten to add the first case’s behaviour. I certainly think that’s a possible negative point for or-patterns.


#11

Hi,

First of all you’re right there isn’t a big need for this. But I guess you’re implying that the negative of increasing the complexity of the language is not worth the advantages here.

Yes, you can always define a lambda to get around two patterns having the same ‘complicated’ expression. The two disadvantages are that you are forced to think of a name for the lambda, and secondly, the lambda may have to be defined at some distance from its use-cases. Often the cases you wish to lump together are the “uninteresting” ones, and often you want to have the “interesting” cases first, which means you end up with:

let
   common x =
        ...
in
case ... of
    Interesting ->
        ... 
    AnotherInteresting ->
       ...
   ... maybe more interesting cases ...
   Uninteresting x ->
       common x
   AnotherUninteresing x ->
       common x

Add to this that although Interesting and AnotherInteresting are different, they may share some of their associated expressions so you have to factor out that common expression as well.

let
   common x =
        ...
   interestingSubExp a =
        ....
in
case ... of
    Interesting ->
        ... interestingSubExp a ...
    AnotherInteresting ->
       ...  ... interestingSubExp a ... ...
   ...
   Uninteresting x ->
       common x
   AnotherUninteresing x ->
       common x

So now, you have to decide the order of your let declarations, either interestingSubExp is separated from its uses by the common declaration or common is even further away from its use cases.

Why do I care so much about definition/use distance? There is at least some theorising (some might even say evidence) that this distance increases the odds of bugs being introduced. See for example:
Software Metrics: Measuring Haskell by Chris Ryder and Simon Thompson

Software Metics: Measuring Haskell
In all but the most trivial program there will be several declarations which will interact.
The interactions between declarations are often described by def-use pairs
[RW85]. For instance, the def-use pair (a,b) indicates that b uses the declaration
a. Metrics that use def-use pairs are most often concerned with the testability of
programs, because def-use pairs indicate interactions that might require testing.
When one considers a def-use pair, there will inevitably be a distance between
the location of the use and the declaration in the source code. One hypothesis is
that the larger the distance, the greater the probability that an error will occur in
the way that declaration is used. Distance may be measured in a number of ways:

(do note the paper is at least a decade old, but interesting nonetheless).
You can argue with their methodology somewhat, but the rough idea is to check whether code changes over time and if it does then you surmise that it needed to be changed, either because it contained a bug, or because the person updating it figured their update was an improvement. The authors define 4 distance metrics:

  1. Number of new scopes
  2. Number of declarations brought into scope
  3. Number of source lines
  4. Number of parse tree nodes

Software Metics: Measuring Haskell
None of the distance measures were significant at the 5% level for the Peg
Solitaire program, however they were all significant for the Refactoring program.
Most of the measures resulted in correlations between 0.4 (0.16) and 0.55 (0.303),
but the greatest correlation was provided by the Distance by the sum of the number
of scopes metric, with a correlation of 0.632 (0.399). These results seem to
confirm that the greater the distance between where something is used and where
it is declared, the greater the probability of an error occurring in how it is used

A couple of things:

  1. I don’t find these results all that convincing or reliable.
  2. However, my intuition is that these results understate the importance of distance between def/use pairs. But I don’t have much concrete logic to explain why that’s my intuition, and of course your intuition might be the exact opposite.

As a final final point, I recognise that often defining a name for something is of value. With an or-pattern though, you perhaps aren’t forced to think of a name for something unnaturally and if you do decide it is better with the name, then the definition of the name is right next to the use of it.


#12

I have a habit of clearly documenting such cases, in any language I use them in, so that it’s clear that there is no mistake or some code forgotten:

switch(action.type) {
     case "action1": -- FALL THROUGH
     case "action2":
        return behavior(state)

#13

Well, the fact that I don’t see it does not mean it’s not there. However, without a real use case, the question or negatives vs. advantages is not possible to answer.

Well, either the code is trivial, like True or a simple function call, and then you don’t need to make a name for it, or it’s complex and then there is every reason to think of a good name for it. So in this case, being forced to do it may actually be a good thing. But again, without a proper use case, it’s hard to say. It may also be that if you handle two different cases in the same complicated way, the data type itself should be merged.

That said, pattern matching could certainly be improved, it’s just a very fine balance between making it easy to use and understand and making it powerful. I’m used to the pattern matching in Elixir, which is extremely powerful, but also hard to understand sometimes.

I think you should come up with a better (more complex) use case, I find the ones you have perfectly fine.


#15

A lot of the discussion so far has been about the long and complex case, and it has been suggested that this can/should actually be a named let-binding. However, I find the “trivial” case to be just as important. At present, in Elm, there is no good way to deal with multiple cases that have the same trivial outcome. @allanderek posted this snippet which demonstrates this:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout ->
            True
        NetworkError ->
            True
        BadUrl _ ->
            False
        BadStatus _ ->
            False
        BadBody _ ->
            False

With or-patterns, this would become

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout | NetworkError ->
            True
        BadUrl _ | BadStatus _ | BadBody _ ->
            False

The benefits that I see here are:

  1. Pain reduction. Elm currently doesn’t have any good way to pattern-match with larger discriminated unions. Without an or-pattern, we add 2 lines for each trivial case, and I can’t see any advantage to doing so. Separating out more detailed cases as individual discriminated union cases can become very annoying because of this pain. Imagine a 12-case union, with 24-line case statements everywhere! The only way out of the pain is by specifying the _ case, which forces you to give up some of the safety of explicit matching.
  2. Understandability. Unlike the switch/case fallthrough which has been discussed, the | here makes it quite clear that we are collecting all the cases together. We currently force the reader to go through the thought process “Timeout, that’s true … NetworkError, that’s true … BadUrl, that’s false …” and so on. Instead, the thought process should be “Timeout or NetworkError, that’s true … BadUrl or BadStatus or BadBody, that’s false”.
  3. Reduced complexity. An or-pattern is likely much easier to understand than already-allowed patterns such as ( Test (Here _), x::Help p::_). I say this based on some experience: I teach a functional programming module using F#, and my students tend to be much more confused by list patterns than by or-patterns, which they accept quite “naturally”. So I respectfully suggest that this is very unlikely to trip up anyone who is starting with Elm. In fact, due to (2) above, I think that or-patterns will actually reduce the cognitive burden on users, and thus reduce the perceived complexity of the language.

#16

And it’s actually 3 lines per trivial case, 36 lines for 12 cases, when you use elm-format:

isNetworkRelated : Error -> Bool
isNetworkRelated error =
    case error of
        Timeout ->
            True

        NetworkError ->
            True

        BadUrl _ ->
            False

        BadStatus _ ->
            False

        BadBody _ ->
            False

#17

Just some other cases for comparison taken from actual code rather than imagined instances.
In all of these the original code is perfectly fine, I just find the or-pattern version to be an improvement, but for all of these they are not objectively better, it’s going to be an opinion so I’m mostly presenting these for the reader to judge for themselves (and this thread has made me less bullish on or-patterns generally).

Most of these are from Richard Feldman’s real world SPA example.

An exact example of the Just ""/Nothing case that I mentioned in the original post:

src : Avatar -> Attribute msg
src (Avatar maybeUrl) =
   case maybeUrl of
        Nothing | Just "" ->
            Asset.src Asset.defaultAvatar

        Just url ->
            Html.Attributes.src url

One where all the cases are the same, you’re just extracting something out of an otherwise richer datatype.

profile : Author -> Profile
profile author =
    case author of
        IsViewer _ val
        IsFollowing (FollowedAuthor _ val)
        IsNotFollowing (UnfollowedAuthor _ val) ->
            val

Great example where _ -> was used to avoid writing out all of the cases:

The full original:

isActive : Page -> Route -> Bool
isActive page route =
    case ( page, route ) of
        ( Home, Route.Home ) ->
            True

        ( Login, Route.Login ) ->
            True

        ( Register, Route.Register ) ->
            True

        ( Settings, Route.Settings ) ->
            True

        ( Profile pageUsername, Route.Profile routeUsername ) ->
            pageUsername == routeUsername

        ( NewArticle, Route.NewArticle ) ->
            True

        _ ->
            False

I see this as much nicer as the following:

isActive : Page -> Route -> Bool
isActive page route =
    case ( page, route ) of
        ( Profile pageUsername, Route.Profile routeUsername ) ->
            pageUsername == routeUsername

        ( Home, Route.Home )
        ( Login, Route.Login )
        ( Register, Route.Register )
        ( Settings, Route.Settings )
        ( NewArticle, Route.NewArticle )
            True

        ( Profile pageUsername, _ )
        ( Home, _ )
        ( Login, _ )
        ( Register, _ )
        ( Settings, _ )
        ( NewArticle, _ ) ->
            False

I particularly like that the three cases are visually separated by the blank lines. With the added bonus that if you change the the Page type you will be forced to update this.

A mild example where picking out the interesting case is harder because of the lack of or-patterns:

can be re-written as:

currentUsername : Model -> Username
currentUsername model =
    case model.author of
        Loaded author ->
            Author.username author

        Loading username
        LoadingSlowly username
        Failed username ->
            username

Here is a subtle one:

viewProblem : Problem -> Html msg
viewProblem problem =
    let
        errorMessage =
            case problem of
                InvalidEntry _ str ->
                    str

                ServerError str ->
                    str
    in
    li [] [ text errorMessage ]

With or-patterns you could re-write this as:

viewProblem : Problem -> Html msg
viewProblem problem =
    case problem of
        InvalidEntry _ errorMessage
        ServerError errorMessage ->
            li [] [ text errorMessage ]

This is very subtle, note that in the original version the author is forced to use a generic str name, because they are already defining the name errorMessage. Put another way because they are introducing a name they have to name the same “thing” in two different ways. I suspect that some people will prefer the first version, or at least prefer defining the name, of course you could still do that and I’d argue that with the or-pattern you would have a bit less noise (I certainly prefer that the result is a bit more prominent in the original).

Here is one from elm/browser:

The full original:

close : Msg -> State -> Maybe Encode.Value
close msg state =
  case state of
    None ->
      Nothing

    BadMetadata _ ->
      Nothing

    BadImport _ ->
      Nothing

    RiskyImport _ rawHistory ->
      case msg of
        Cancel ->
          Nothing

        Proceed ->
          Just rawHistory

I prefer to put the more interesting case first:

close : Msg -> State -> Maybe Encode.Value
close msg state =
  case state of
    RiskyImport _ rawHistory ->
      case msg of
        Cancel ->
          Nothing

        Proceed ->
          Just rawHistory

    None 
    BadMetadata _
    BadImport _ ->
      Nothing

If you’re okay with tuple patterns you might even do this:

close : Msg -> State -> Maybe Encode.Value
close msg state =
  case (state, msg) of
    (RiskyImport _ rawHistory, Proceed) ->
        Just rawHistory

    (None, _)
    (BadMetadata, _)
    (BadImport _)
    (RiskyImport _ _, Cancel) ->
        Nothing

I’m torn between the two, I like that this version has exactly one case for the exactly one set of circumstances under which you do not return Nothing. It’s quite close to the natural language you would use to describe what to do here, something like “If you have a RiskyImport and the message is a Proceed, then return the raw history and otherwise return Nothing.”


#18

Whatever the rights and wrongs of the argument, I think F# syntax does this best:

isNetworkRelated error =
    match error with
        | Timeout | NetworkError -> true
        | BadUrl _| BadStatus _ | BadBody _ -> false

But then again, F# has the fabled Active Patterns, a whole new way of looking at this …


#19

If this feature were added, how would you handle situations where there are variables that may or may not be available depending on which case has matched?

For example, if something matched, say, both Case1 someString someBool and Case2 someString, are you disallowed from using someBool in that branch, since Case2 doesn’t have it? Do you get someBool converted to a Maybe Bool? Something else?


#20

All individual cases of combined “or”-case could be required to have exact same variables.


#21

This is exactly the way that F# (and, I believe, OCaml) do it. There is also another benefit if such an approach is taken. Existing verbose code which deconstructs a value, such as the code that @allanderek quoted from elm-spa, becomes more understandable and easier to read. To reiterate,

profile : Author -> Profile
profile author =
case author of
    IsViewer _ val ->
        val

    IsFollowing (FollowedAuthor _ val) ->
        val

    IsNotFollowing (UnfollowedAuthor _ val) ->
        val

might be written as

profile : Author -> Profile
profile author =
case author of
    IsViewer _ val
    | IsFollowing (FollowedAuthor _ val)
    | IsNotFollowing (UnfollowedAuthor _ val) ->
        val

There is also a lovely expressive alternative (viewProblem) that @allanderek points out, which is presently impossible in Elm. I don’t count it as a benefit because I don’t know how many opportunities there will really be for such refactorings … but it would be nice to have the option! Or, of course, if an author does not wish to use |, there is no reason that they can’t use the more verbose version.