100x faster parsing. Performant parsers for Lists of Strings and Dict

lazamar · June 4, 2019, 7:50pm

A package to create blazingly fast parsers

I had a list of strings and needed a parser to succeed with the longest matching occurrence.
A prefix tree is a clever data structure that can help with accomplishing this in as few steps as possible.
The package explains what else I tried and how this approach compares to the alternatives.

rupert · June 4, 2019, 9:31pm

By coincidence I have spent the last month writing a package for Tries and will be publishing it soon:

Its a more full-featured implementation and has an API that is compatible with Dict over either List comparable or String keys.

Seems a few of us were needing Tries for various projects - text search in my case. Certainly a data structure that finds many uses in various efficient string algorithms.

lazamar · June 4, 2019, 9:44pm

Nice. At the time I looked into the rluiten/trie package but had to write my own implementation because I needed to use a custom traversal algorithm and that package doesn’t (and it shouldn’t) expose the value constructor of the Trie type.

rupert · June 5, 2019, 9:05am

Yes, I also started out with rluiten/trie buts its API is kind of weird like:

add : ( String, a ) -> String -> Trie a -> Trie a

Rather than the more obvious:

add : String -> a -> Trie a -> Trie a

So it seems to have been built to cater for a specific use case in rluitens other text search packages.

Also, as you say there is no custom traversal. This was a problem for me too as I wanted to implement a case insensitive search, and have more fuzzy searches in mind too.

rupert · June 5, 2019, 9:17am

The elm-scotland/tries package does not currently expose the Trie type. The reason for this is that I may want to change the implementation to a ternary tree with balancing heuristics, paper here:

But it does expose a variation on foldl that allows a traversal of the trie to be carried out with fine-grained control over how the traversal proceeds, as described by the Match type. This includes the ability to Break and terminate a traversal early. The signature for this is a bit wacky:

match :
    (Maybe comparable -> Maybe a -> context -> b -> ( b, context, Match comparable ))
    -> b
    -> context
    -> Trie comparable a
    -> b

There is a fairly extensive comment to explain how this works. If you get a chance would appreciate hearing if you think this provides enough flexibility to cover your use case - I will also take a look at your code myself as I am interested in creating a package that meets real needs.

Great work on the Parsers.

harrysarson · June 5, 2019, 3:38pm

Possibly tangential question: what is a trie in relation to a tree?

rupert · June 5, 2019, 5:33pm

Probably the quickest way to explain is to refer you to the wikipedia page - it has a picture.

system · June 15, 2019, 5:47pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Suffix Trees for text indexing Learn	1	490	March 23, 2019
New Dict implementation for Elm, now done Show and Tell	2	1320	December 5, 2017
New package to use custom-types as keys in sets/dicts Show and Tell	12	1335	September 19, 2022
API suggestion for dict/set in core; custom compare functions Request Feedback	21	2785	March 8, 2018
Introducing assoc-list, a Dict substitute with custom keys Request Feedback	6	2177	January 19, 2019

100x faster parsing. Performant parsers for Lists of Strings and Dict

Related topics