I am trying to search for some text which is stored in lists, each sentence is stored as separate element of the list. Now i want to search in this list and return the most appropriate match to the search string. the search string is any input that is entered by user into a text-box.
Is it possible to search for sentences based on cosine similarity and return only the sentence with the best cosine similarity match?
If your goal is to search by keywords through a set of sentences though, this may not be the best approach. For instance for the word “world”, the first result doesn’t actually include that word, because levenshtein distance counts the number of “edits” to the one string to arrive at the other. So even if two strings have the same words, the edit distance can be very large because of other words in the string.
the edit distance between word and world is much shorter than world and our world, even though “our world” should probably be ranked higher when the query is “world”.
The field that studies the ranking of documents based on a query is Information Retrieval. A simple metric that is closer to a search engine is bag of words. It just counts how many words of the query are in the sentences, and picks the best. Maybe that’s fun to experiment with?