API suggestion for dict/set in core; custom compare functions

opvasger · March 7, 2018, 6:38am

I’ve been reading through a lot of the discussion of comparison in Elm; My particular problem with comparison is that I often want to use unions as keys in sets & dicts.

An approach I haven’t read about so far would be to change the signature of the functions that construct Sets and Dicts in core. For example:

Dict.empty : ( k → k → Basics.Order ) → Dict k v

By simply providing a comparison function instead of using Basics.compare. There’s a lot of flexibility to this, and you can have the old behavior by simply passing Basics.compare as the first argument to these constructing functions.

The API changes to do this are obvious: all constructing functions for the collections would need to take a comparison function as it’s first input, and quite a few functions would have to exchange their comparable input signatures for a type variable

Even if the current signatures are considered more friendly, one could easily expose them in their current form by partially applying Basics.compare.

opvasger · March 7, 2018, 8:35am

~~The comparison functions for union with a lot of constructors can be tedious, but perhaps there’s some clever ways around that?~~

Here’s an example of how the API could work:

module DictAndSetAPISuggestion exposing (..)

import Dict exposing (Dict)
import Set exposing (Set)

type WeekDay
    = Mon
    | Tue
    | Wed
    | Thu
    | Fri
    | Sat
    | Sun


type alias Todo =
    String


basicSetFromList : List comparable -> Set comparable
basicSetFromList =
    Set.fromList Basics.compare


weeklyTodos : Dict WeekDay (Set Todo)
weeklyTodos =
    Dict.fromList compareWeekDays
        [ ( Mon, basicSetFromList [ "Do Stuff" ] )
        , ( Fri, basicSetFromList [ "Go Out" ] )
        ]


compareWeekDays : WeekDay -> WeekDay -> Basics.Order
compareWeekDays a b =
    case a of
        Mon ->
            case b of
                Mon ->
                    EQ

                _ ->
                    GT

        Tue ->
            case b of
                Mon ->
                    LT

                Tue ->
                    EQ

                _ ->
                    GT

        Wed ->
            case b of
                Mon ->
                    LT

                Tue ->
                    LT

                Wed ->
                    EQ

                _ ->
                    GT

        Thu ->
            case b of
                Mon ->
                    LT

                Tue ->
                    LT

                Wed ->
                    LT

                Thu ->
                    EQ

                _ ->
                    GT

        Fri ->
            case b of
                Sun ->
                    GT

                Sat ->
                    GT

                Fri ->
                    EQ

                _ ->
                    LT

        Sat ->
            case b of
                Sun ->
                    GT

                Sat ->
                    EQ

                _ ->
                    LT

        Sun ->
            case b of
                Sun ->
                    EQ

                _ ->
                    LT

utkarshkukreti · March 7, 2018, 8:42am

You can implement the comparison function like this:

compareWeekDays : WeekDay -> WeekDay -> Basics.Order
compareWeekDays a b =
    let
        toInteger weekday =
            case weekday of
                Mon ->
                    0

                Tue ->
                    1

                Wed ->
                    2

                Thu ->
                    3

                Fri ->
                    4

                Sat ->
                    5

                Sun ->
                    6
    in
    compare (toInteger a) (toInteger b)

Still a lot of code but only O(n) lines long, not O(n^2) like your implementation.

IMO a better fix would be to make unions comparable in Elm core. Most of the times we don’t really care about their order. We just want any order so that these values can be used as keys in Dict and values in Set.

opvasger · March 7, 2018, 9:07am

hehe, I appreciate your input, but the goal here isn’t to optimize the implementation; it is to explain the concept and illustrate how this might work

Have you used Elm’s Dict and Set API?
Have you found yourself writing convoluted code to get a comparable for your Dict-keys or Set-values?
Could this API improve that experience?

These are the kinds of questions that could contribute to the discussion

opvasger · March 7, 2018, 9:16am

I’d say defining the semantics of comparability for unions is more complicated than extracting the Basics.compare function from the data-structures and make it an input of it’s constructors; Also, when people can supply their own comparison semantics, you get the added flexibility.

What would the semantics be of having comparable unions?
Should other types also implement a new set of semantics to be comparable?

I agree that comparable unions doesn’t seem crazy at all. I’d assume you say that any given constructor is equal to itself, bigger than the constructors defined before it and smaller than constructors defined after?

utkarshkukreti · March 7, 2018, 9:32am

But do you really care about the order your unions are stored internally by Dict or Set? While a custom compare function will add flexibility, what purpose does it solve?

I’d assume you say that any given constructor is equal to itself, bigger than the constructors defined before it and smaller than constructors defined after?

Yes, that’s how deriving Ord traits in Rust and Haskell work. We just need a well defined function that holds this rule:

if compare a b == LT then compare b a == GT
if compare a b == EQ then compare b a == EQ
if compare a b == GT then compare b a == LT

I’m not suggesting making these types comparable though. This could be an internal implementation detail of Dict and Set. With this, the API of Dict and Set will allow any keys instead comparable.

opvasger · March 7, 2018, 9:42am

so, assuming the type of Set.singleton : a -> Set a , what if I pass a function as the value to Set.empty?

With my example, you’d have to implement a comparison function for functions of that type.
With comparable unions, functions wouldn’t be allowed.

utkarshkukreti · March 7, 2018, 9:49am

That’s a good question. They should be handled the same as == is handled. Dict/Set should accept anything that == does. Another constraint I forgot to mention:

if a == b then compare a b == EQ

With my example, you’d have to implement a comparison function for functions of that type.

My problem with this is that the function you’ll declare has no purpose other than to make it work with Dict/Set. Will you be using it for some other purpose? Also consider how complex implementing that for this type would be:

-- A DOM Node.
type Node
  = Element String (List (String, String)) (List Node)
  | Text String
  | Comment String

It’s fine in Rust and Haskell because you can derive the comparable implementations with a single line of code. In Elm that’ll be 20-30 lines that have no other use.

opvasger · March 7, 2018, 9:58am

I think we can agree that function comparison is silly, and that it is more reasonable to consider the two other options we’ve discussed.

Your points about the uselessness and tediousness of defining comparsion functions seem reasonable, but does that mean the most “reasonable” approach would be comparable unions according to the semantics we’ve discussed?

I think I would try to explore use-cases for comparsion functions outside of this problem-domain to prove myself wrong. But I won’t do that now, at work

I’d say that there’s a lot of silly things you can decide to implement in Elm - simply not doing it is the right answer. Also, this is suggested as an extension of the API of Dict/Set (the “comparsion function” suggestion). Dict.empty (as well as the other constructing functions) could stay exactly the same, as long as another constructor function allowed you to get the flexibility. This would also not require any major changes to anything except for the API of the libraries, which is think is neat.

utkarshkukreti · March 7, 2018, 10:04am

I think we can agree that function comparison is silly

Definitely.

but does that mean the most “reasonable” approach would be comparable unions according to the semantics we’ve discussed?

I believe it’s a good approach but to know if it’s the “most reasonable” one we need inputs from others.

This would also not require any major changes to anything except for the API of the libraries, which is think is neat.

Unless I missed something, my suggestion will not require any changes to the API. All the comparable constraints in Dict and Set will change to “equatable” (that is, everything except functions, like ==, which doesn’t have a name in Elm yet?).

opvasger · March 7, 2018, 11:32am

I do see how maintaining ordering using the compare function will get too expensive, and how an equatable type makes more sense for performance. I’m not aware how important the ordering property of Dicts and Sets are - probably an implementation detail inherited from the underlying tree-structure?

As a digression, do you have to technical knowledge to describe for me why comparable is a type-variable and not a union?

MarkHamburg · March 7, 2018, 3:25pm

An extra comparison function passed as a parameter to every operation on the collection type is problematic because passing different comparison functions over the same key type would lead to erratic behavior and probably corrupted data structures (in that no invariant would reliably hold for the structures). This means that the function should either be supplied at initial construction — e.g., Dict.empty, Dict.singleton, etc — and be stored in the data structure or the comparison function should be defined in some way on the type to be compared. Storing the function in the data structure is more flexible but does result in making these types now runtime exceptions waiting to happen if you compare them for equality.

That said, I definitely feel the pain around this. I use tagged strings for things like account IDs and so to make a dictionary keyed by account ID, I can’t just use a normal dictionary but rather have to wrap it with code that understands how to unwrap an account ID and that either knows how to re-wrap strings into account IDs or that stores the data as Dict String ( AccountID, v ).

Mark

dta · March 7, 2018, 4:05pm

There is a package with this very API:
http://package.elm-lang.org/packages/robertjlooby/elm-generic-dict/latest

opvasger · March 7, 2018, 4:09pm

Only pass the comparison function to the functions that create the data-structure - don’t pass it every time. store it as part of the (in the case of Dict) red/black-tree implementation that underlies the structure.

Yes, the comparison function is passed only for the few functions that has to initialize a new Dict or Set. No, there is no problem at all with storing the function inside the structure, as long as it is used the same way the libraries are currently using Basics.compare to do exactly the same thing; the notable change is that the key won’t have to be comparable. And even if that feels to flexible, just pass in Basics.compare and you’ve got exactly what you have right now. There is no risk of runtime exceptions at all.

It’s good to hear you’re feeling the some pain too I’d really like for some healthy discussion around this, so it can maybe improve!

opvasger · March 7, 2018, 4:12pm

Thanks - it looks neat

I’ve been using Noah’s excellent “Elm-All-Dict” and Gizra’s very nice “Elm-All-Set”.

They’re both highly recommended!

utkarshkukreti · March 7, 2018, 4:14pm

How would that be handled with Dict.union or other functions that take two Dicts and return one if the two Dicts use different comparison functions (but have the same type)?

opvasger · March 7, 2018, 4:39pm

I’m glad you ask, as I haven’t thought of this Dict.merge is in my opinion the crux of any potential problem with my suggestion.

What do you think about this?

{-
The most general way of combining two dictionaries.

You provide three accumulators for when a given key appears:
- Only in the left dictionary.
- In both dictionaries.
- Only in the right dictionary.

You then traverse all the keys from lowest to highest,
building up whatever you want.
-}

Dict.merge
    :  (k1 → v1 → result → result)
    -> (k1 → k2 -> v1 → v2 → result → result)
    -> (k2 → v2 → result → result)
    -> Dict k1 v1
    -> Dict k2 v2
    -> result
    -> result

Given Dict.union and Dict.intersect give preference to values of the 1st Dict, I’d say the comparison-function of the first Dict is preferred too, for consistency. I don’t see this as a problem in any way: You’re getting all the values, mainly based on the 1st Dict.

The weakness of this approach with Dict.merge is that the comparison-function that established uniqueness in the 2nd Dict won’t be the one that establishes uniqueness for Dict.merge, which is definitely an inconsistency.
How bad is it though?

And as we discussed, the problem is only relevant for the case of providing a custom comparison-function, which you already debunked for it’s performance implications

I’m assuming none of this is a problem if union-types are comparable, which also seems interesting to explore!

opvasger · March 7, 2018, 10:18pm

I looked into the use-case for a → a → Order functions, and it pretty much boils down to List.sortBy and the use-case we’ve already discussed. Lists are pretty essential to Elm, and I still think it could be pretty useful to be able to reorder the items of a list in the view using this kind of function.

I also found Matthewsj’s awesome “Elm-Ordering”, which is based on the concept of an Ordering a such that:

type alias Ordering a = a → a → Basics.Order

There is multiple helpers for constructing such functions pretty cool.

for example, this is how the library defines the ordering I waned to do with days in my example:

type Day = Mon | Tue | Wed | Thu | Fri | Sat | Sun

dayOrdering : Ordering Day
dayOrdering =
    Ordering.explicit
        [Mon, Tue, Wed, Thu, Fri, Sat, Sun]

mcordova47 · March 8, 2018, 3:22am

Another library that shares similar ideas is elm-constraint, which aims to provide a solution for other typeclasses like Monoid, Semigroup, and Eq as well as Ord.

MarkHamburg · March 8, 2018, 3:27am

Sorry I missed the fact that you had tied specification of the comparison function to construction. I’ll blame insufficient caffeine.

But you do then have a runtime exception issue when comparing dictionaries or sets for equality because those will now contain a reference to the comparison function and comparing two non-identical functions in Elm generates a runtime exception. In other words: Dict.empty comparison1 == Dict.empty comparison2 could throw a runtime exception.

Mark

Topic		Replies	Views
New package to use custom-types as keys in sets/dicts Show and Tell	12	1233	September 19, 2022
Just for fun: What if there were no type classes at all? Request Feedback	19	2289	May 25, 2020
Why must Set members be comparable? Learn	18	1500	March 11, 2019
Consequences of functions in the model with 0.19? Learn	6	2197	November 10, 2018
New Dict implementation for Elm, now done Show and Tell	2	1278	December 5, 2017

API suggestion for dict/set in core; custom compare functions

Related topics