How to compare dictionary elements to another dictionary?

Hello Everyone!

Recently, I have discussed this issue on Slack as well, but I could not find a solution.
I am trying to iterate through a dictionary elements and compare them to another dictionary and check if one of them or all or nothing exist in the second dictionary, and then count all the occurrences.
for example Dict.fromList [("and",1),("fox",1),("stars",1),("the",2)] when compared to another dictionary of Dict.fromList [("and",1),("fox",1),("now",1),("way",2)] then the answer should be 2 , since there are only two same elements.
What I have tried is the member function of Dict and is giving me the result as string just to know if that string is a member of that dicitonary.

module Main exposing (main)
import Dict exposing (..)
import Html exposing (..)

main =
    Html.text checkIf

checkIf =
    if Dict.member "fox" <| Dict.fromList [("fox",1),("bee",1)] then
    "Yes"
    else
    "No"

This is Simply checking for the mentioned element to check if it is in the dictionary. there is no counting yet. I want to see if it is possible to use all the elements of one dictionary instead of only "fox" and compare the elements of this dictionary to another dictionary.
I will be here if you need more details.

Thank You.

If you want to just count identical keys you can do something like this:

module Main exposing (main)
import Dict exposing (..)
import Html exposing (..)

firstDict = 
  Dict.fromList [("and",1),("fox",1),("stars",1),("the",2)]

otherDict =
  Dict.fromList [("and",1),("fox",1),("now",1),("way",2)]

main =
    noOfIdenticalKeys firstDict otherDict
      |> String.fromInt
      |> Html.text

noOfIdenticalKeys : Dict String Int -> Dict String Int -> Int
noOfIdenticalKeys dict1 dict2 =
  let 
    firstKeys = Dict.keys dict1
    secondKeys = Dict.keys dict2
  in 
    List.foldl (counterFunction secondKeys) 0 firstKeys

counterFunction : List String -> String -> Int -> Int
counterFunction secondKeys keyFromFirst accumulator =
  if List.member keyFromFirst secondKeys then 
    accumulator + 1
  else 
    accumulator

Or you could just do it directly with Dict.foldl instead of list.
The most powerful way is to use Dict.merge function. With that one you know what is only in the first, only in the second, and in both… And you can return all you need from a single function… (like AllMatches, 2 matches, or nothing matches type of result)

1 Like

Hi @hanifhefaz, how about turning the keys into 2 sets and then finding the intersection?

dict1 = Dict.fromList [("and",1),("fox",1),("stars",1),("the",2)]
dict2 = Dict.fromList [("and",1),("fox",1),("now",1),("way",2)]

sharedKeys =
    Set.intersect
        (dict1
            |> Dict.keys
            |> Set.fromList
        )
        (dict2
            |> Dict.keys
            |> Set.fromList
        )

which gives:

Set.fromList ["and","fox"]

Dictionary keys are already a set, as you can only have a single occurrence of each key. They should be easy to count or find members etc after that. Here’s example on Ellie: https://ellie-app.com/9NS3rzczqjja1

1 Like

Like so:

Dict.merge
    (\_ _ acc -> acc)
    (\_ first second acc -> if first == second then acc + 1 else acc)
    (\_ _ acc -> acc)
    firstDict
    secondDict
    0

This only counts occurrences where the key and value match. If you only want to count the shared keys, you can replace the second lambda by (\_ _ _ acc -> acc + 1).

2 Likes

This is a great solution I think. and it works fine for my dictionaries, but I have forgotten to say, that I want to compare my first dictionary to multiple dictionaries, count identical keys, just as you did and then return the one with most occurrences. something like finding maximum count from those.

Hello Chris,
That is a great idea too. but what if i compare the first dictionary to some multiple dictionaries? and then return the one as result, where the identical words are the most?

I found the simplest way to count matches in both Dicts is this:

Dict.size ( Dict.intersect dict1 dict2 )

Finding the Dict with the most similar keys: You need to choose what happens if you give the function an empty list of dicts to check…

  1. Return the first Dict (checkingDict)
  2. Return Dict.empty
  3. Return Nothing (Maybe Dict)

Here is a version with Maybe Dict:
(There is probably a better way, this is just the first thing that came to mind :slight_smile:

dictWithMostMatches : Dict String Int -> List (Dict String Int) -> Maybe (Dict String Int)
dictWithMostMatches dict1 listOfDicts =
    List.foldl (folder dict1) ( 0, Nothing ) listOfDicts
        |> Tuple.second


folder dict1 currentDict ( bestCount, maybeBestDictSoFar ) =
    let
        noOfIdenticalKeys =
            Dict.intersect dict1 currentDict
                |> Dict.size
    in
    case maybeBestDictSoFar of
        Nothing ->
            -- On first pass, just set the first one as best
            ( noOfIdenticalKeys, Just currentDict )

        Just bestDictSoFar ->
            if noOfIdenticalKeys > bestCount then
                -- Set currentDict as best
                ( noOfIdenticalKeys, Just currentDict )

            else
                -- Keep old
                ( bestCount, Just bestDictSoFar )
1 Like

Good solutions above, so I don’t have one to add to that. What I wanted to say is that if you really are counting words and you are scanning large texts, it could make sense to use a Trie implementation instead of a Dict. I don’t know if that is what you are really doing? - perhaps you just tried to illustrate your real problem by giving the word count example above.

A trie shares space between strings with a common prefix, so in [ ("and", 1), ("andrew", 1) ] the “and” part would only exist in memory once.

There is a Trie implemenation with String keys here: StringTrie - elm-tries 1.0.1

The API is identical to Dict, so it should just be a case of swapping it in directly, and the above algorithms should work the same. I haven’t done much performance testing on it, but it is only likely to show an improvement if the word list you are indexing is pretty big.

1 Like

Thank you for the suggestion. but I don’t have a large amount of data in my text, so I think dict will be fine.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.