How to get the most relevant text when searching?

ahmadsherzai · October 28, 2020, 5:10pm

Hello all,
I am trying to implement text suggestion in elm, where I have some data as List of Strings. I have tried so many ways for example, pattern matching, using Jaro-Winkler and Levenshtein distances, and bag of words model to find the best match in the data.
In the bag of words model, I can retrieve the most related sentence from the data, when I enter some text in the search box. I am doing it by converting all the search text and the data into words and then look in the data, where most of the words from the search text exist. I simply return that.
The problem is that If there are sentences with equal words in the data, then which one should I return?

EXAMPLE:

SEARCH TEXT: " HELLO THERE JONNY"
LIST 1 IN DATA: " THERE ARE JONNY AND MIKE"
LIST 2 IN DATA: " THERE YOU GO JONNY"

both the list have 2 words equal to the search text, which one should I return? Is there any algorithm for this purpose? Can someone give me an idea?

Here is the Ellie: https://ellie-app.com/bnQNZmFJ59Va1

tgelu · October 29, 2020, 8:21am

It almost sounds like you, as a person, could not decide which represents the “best” match. If you cannot describe what you expect it doesn’t sound like an algorithm problem, but a specification one.

I would consider at least these 2 options:

return the shortest phrase. It seems less convoluted to suggest a shorter text than a longer one.
if your use case allows for it, return multiple matches if they weigh the same. Let the user decide which is more relevant.

ahmadsherzai · October 29, 2020, 9:13am

Thank you for the reply.
In my case, I gave the example just to understand the problem. Is it possible in Elm to get the semantic meaning from the phrases? (To get the relativity among the equal matches).

I also should return one match, because I want to use it for a conversation with the user, which means, the user will enter some text and the program will look for the best match in the data and return the most related phrase for him. That is why I asked for an algorithm. If you know any specification solution for this case, please tell me.

I am not sure, if something like TF-IDF will help me? I hope, I will be able to find a way.

RickBradford · November 4, 2020, 5:30am

You could use the Levenshtein distance as a measure of similarity between strings.

mauro · November 5, 2020, 1:35am

That algorithm is very clever. And it’s awesome that it is already implemented in Elm!

system · November 18, 2020, 5:45pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Vectors and Cosine Similarity Learn	5	890	June 11, 2020
Local Pattern Matching Learn	9	787	April 15, 2020
Is NLP or Machine Learning possible in Elm? Learn	5	1710	May 8, 2020
Chatbot and data processing in elm Request Feedback	2	768	March 19, 2020
Elm chat bot idea Learn	8	803	May 1, 2020

How to get the most relevant text when searching?

Related topics