How to get the most relevant text when searching?

Hello all,
I am trying to implement text suggestion in elm, where I have some data as List of Strings. I have tried so many ways for example, pattern matching, using Jaro-Winkler and Levenshtein distances, and bag of words model to find the best match in the data.
In the bag of words model, I can retrieve the most related sentence from the data, when I enter some text in the search box. I am doing it by converting all the search text and the data into words and then look in the data, where most of the words from the search text exist. I simply return that.
The problem is that If there are sentences with equal words in the data, then which one should I return?

EXAMPLE:

SEARCH TEXT: " HELLO THERE JONNY"
LIST 1 IN DATA: " THERE ARE JONNY AND MIKE"
LIST 2 IN DATA: " THERE YOU GO JONNY"

both the list have 2 words equal to the search text, which one should I return? Is there any algorithm for this purpose? Can someone give me an idea?

Here is the Ellie: https://ellie-app.com/bnQNZmFJ59Va1
:heart:

1 Like

It almost sounds like you, as a person, could not decide which represents the “best” match. If you cannot describe what you expect it doesn’t sound like an algorithm problem, but a specification one.

I would consider at least these 2 options:

  • return the shortest phrase. It seems less convoluted to suggest a shorter text than a longer one.
  • if your use case allows for it, return multiple matches if they weigh the same. Let the user decide which is more relevant.
2 Likes

Thank you for the reply.
In my case, I gave the example just to understand the problem. Is it possible in Elm to get the semantic meaning from the phrases? (To get the relativity among the equal matches).

I also should return one match, because I want to use it for a conversation with the user, which means, the user will enter some text and the program will look for the best match in the data and return the most related phrase for him. That is why I asked for an algorithm. If you know any specification solution for this case, please tell me.

I am not sure, if something like TF-IDF will help me? I hope, I will be able to find a way.

You could use the Levenshtein distance as a measure of similarity between strings.

3 Likes

That algorithm is very clever. And it’s awesome that it is already implemented in Elm!

3 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.