Best practical algorithm for sentence similarity The Next CEO of Stack Overflow2019 Community Moderator ElectionApplications and differences for Jaccard similarity and Cosine SimilaritySentence similarityExtractive text summarization, as a classification problem using deep networksFinding similarity between a word and a sentence (like “restart” and “turn off and on”)Sentence similarity predictionSimilarity metric for clusteringData model and algorithm for recommending “related” interestsWhat is the best way to use word2vec for bilingual text similarity?How can I select a similarity threshold value for strings?How to create clusters based on sentence similarity?

What happens if you roll doubles 3 times then land on "Go to jail?"

I want to delete every two lines after 3rd lines in file contain very large number of lines :

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

Why isn't acceleration always zero whenever velocity is zero, such as the moment a ball bounces off a wall?

Reference request: Grassmannian and Plucker coordinates in type B, C, D

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Recycling old answers

Why isn't the Mueller report being released completely and unredacted?

How to get the end in algorithm2e

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

Is there a way to bypass a component in series in a circuit if that component fails?

Why does the flight controls check come before arming the autobrake on the A320?

Which one is the true statement?

Is French Guiana a (hard) EU border?

If the heap is zero-initialized for security, then why is the stack merely uninitialized?

Why did CATV standarize in 75 ohms and everyone else in 50?

What connection does MS Office have to Netscape Navigator?

What happened in Rome, when the western empire "fell"?

Flying from Cape Town to England and return to another province

Why doesn't UK go for the same deal Japan has with EU to resolve Brexit?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

What does "Its cash flow is deeply negative" mean?

Why is quantifier elimination desirable for a given theory?

Best practical algorithm for sentence similarity

The Next CEO of Stack Overflow

2019 Community Moderator ElectionApplications and differences for Jaccard similarity and Cosine SimilaritySentence similarityExtractive text summarization, as a classification problem using deep networksFinding similarity between a word and a sentence (like “restart” and “turn off and on”)Sentence similarity predictionSimilarity metric for clusteringData model and algorithm for recommending “related” interestsWhat is the best way to use word2vec for bilingual text similarity?How can I select a similarity threshold value for strings?How to create clusters based on sentence similarity?

I have two sentences, S1 and S2, both which have a word count (usually) below 15.

What are the most practically useful and successful (machine learning) algorithms, which are possibly easy to implement (neural network is ok, unless the architecture is as complicated as Google Inception etc.).

I am looking for an algorithm that will work fine without putting too much time into it. Are there any algorithms you've found successful and easy to use?

This can, but does not have to fall into the category of clustering. My background is from machine learning, so any suggestions are welcome :)

asked Nov 23 '17 at 14:40

DaveTheAl

171117

$begingroup$
What did you implement ? I am also facing same, have to come up with solution for 'k' related articles in a corpus that keeps updating.
$endgroup$
– Dileepa
Aug 15 '18 at 3:07

add a comment |

I have two sentences, S1 and S2, both which have a word count (usually) below 15.

I am looking for an algorithm that will work fine without putting too much time into it. Are there any algorithms you've found successful and easy to use?

This can, but does not have to fall into the category of clustering. My background is from machine learning, so any suggestions are welcome :)

asked Nov 23 '17 at 14:40

DaveTheAl

171117

$begingroup$
What did you implement ? I am also facing same, have to come up with solution for 'k' related articles in a corpus that keeps updating.
$endgroup$
– Dileepa
Aug 15 '18 at 3:07

add a comment |

I have two sentences, S1 and S2, both which have a word count (usually) below 15.

I am looking for an algorithm that will work fine without putting too much time into it. Are there any algorithms you've found successful and easy to use?

This can, but does not have to fall into the category of clustering. My background is from machine learning, so any suggestions are welcome :)

asked Nov 23 '17 at 14:40

DaveTheAl

171117

I have two sentences, S1 and S2, both which have a word count (usually) below 15.

I am looking for an algorithm that will work fine without putting too much time into it. Are there any algorithms you've found successful and easy to use?

This can, but does not have to fall into the category of clustering. My background is from machine learning, so any suggestions are welcome :)

nlp clustering word2vec similarity

asked Nov 23 '17 at 14:40

DaveTheAl

171117

asked Nov 23 '17 at 14:40

DaveTheAl

171117

asked Nov 23 '17 at 14:40

DaveTheAl

171117

asked Nov 23 '17 at 14:40

DaveTheAl

171117

asked Nov 23 '17 at 14:40

DaveTheAl

171117

$begingroup$
What did you implement ? I am also facing same, have to come up with solution for 'k' related articles in a corpus that keeps updating.
$endgroup$
– Dileepa
Aug 15 '18 at 3:07

add a comment |

$begingroup$
What did you implement ? I am also facing same, have to come up with solution for 'k' related articles in a corpus that keeps updating.
$endgroup$
– Dileepa
Aug 15 '18 at 3:07

What did you implement ? I am also facing same, have to come up with solution for 'k' related articles in a corpus that keeps updating.

– Dileepa
Aug 15 '18 at 3:07

add a comment |

4 Answers
4

active

oldest

votes

Cosine Similarity for Vector Space could be you answer: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

Or you could calculate the eigenvector of each sentences. But the Problem is, what is similarity?

"This is a tree",
"This is not a tree"

If you want to check the semantic meaning of the sentence you will need a wordvector dataset. With the wordvector dataset you will able to check the relationship between words. Example: (King - Man + woman = Queen)

Siraj Raval has a good python notebook for creating wordvector datasets:
https://github.com/llSourcell/word_vectors_game_of_thrones-LIVE

answered Nov 23 '17 at 15:09

Christian Frei

24615

add a comment |

One approach you could try is averaging word vectors generated by word embedding algorithms (word2vec, glove, etc). These algorithms create a vector for each word and the cosine similarity among them represents semantic similarity among the words. In the case of the average vectors among the sentences. A good starting point for knowing more about these methods is this paper: How Well Sentence Embeddings Capture Meaning. It discusses some sentence embedding methods. I also suggest you look into Unsupervised Learning of Sentence Embeddings
using Compositional n-Gram Features the authors claim their approach beat state of the art methods. Also they provide the code and some usage instructions in this github repo.

answered Nov 23 '17 at 15:15

feynman410

1,738417

add a comment |

bert-as-service (https://github.com/hanxiao/bert-as-service#building-a-qa-semantic-search-engine-in-3-minutes) offers just that solution.

To answer your question, implementing it yourself from zero would be quite hard as BERT is not a trivial NN, but with this solution you can just plug it in into your algo that uses sentence similarity.

answered Mar 24 at 6:12

Andres Suarez

New contributor

add a comment |

You should check out https://github.com/seatgeek/fuzzywuzzy#usage. fuzzywuzzy is an awesome library for string/text matching that gives a number between 0 to 100 based on how similar two sentences are. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package. Also, check out this blog post for a detailed explanation of how fuzzywuzzy does the job. This blog is also written by the fuzzywuzzy author

answered Mar 24 at 10:20

karthikeyan mg

19510

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f25053%2fbest-practical-algorithm-for-sentence-similarity%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Cosine Similarity for Vector Space could be you answer: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

Or you could calculate the eigenvector of each sentences. But the Problem is, what is similarity?

"This is a tree",
"This is not a tree"

Siraj Raval has a good python notebook for creating wordvector datasets:
https://github.com/llSourcell/word_vectors_game_of_thrones-LIVE

answered Nov 23 '17 at 15:09

Christian Frei

24615

add a comment |

Cosine Similarity for Vector Space could be you answer: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

Or you could calculate the eigenvector of each sentences. But the Problem is, what is similarity?

"This is a tree",
"This is not a tree"

Siraj Raval has a good python notebook for creating wordvector datasets:
https://github.com/llSourcell/word_vectors_game_of_thrones-LIVE

answered Nov 23 '17 at 15:09

Christian Frei

24615

add a comment |

Cosine Similarity for Vector Space could be you answer: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

Or you could calculate the eigenvector of each sentences. But the Problem is, what is similarity?

"This is a tree",
"This is not a tree"

Siraj Raval has a good python notebook for creating wordvector datasets:
https://github.com/llSourcell/word_vectors_game_of_thrones-LIVE

answered Nov 23 '17 at 15:09

Christian Frei

24615

Cosine Similarity for Vector Space could be you answer: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/

Or you could calculate the eigenvector of each sentences. But the Problem is, what is similarity?

"This is a tree",
"This is not a tree"

Siraj Raval has a good python notebook for creating wordvector datasets:
https://github.com/llSourcell/word_vectors_game_of_thrones-LIVE

answered Nov 23 '17 at 15:09

Christian Frei

24615

answered Nov 23 '17 at 15:09

Christian Frei

24615

answered Nov 23 '17 at 15:09

Christian Frei

24615

answered Nov 23 '17 at 15:09

Christian Frei

24615

add a comment |

answered Nov 23 '17 at 15:15

feynman410

1,738417

add a comment |

answered Nov 23 '17 at 15:15

feynman410

1,738417

add a comment |

answered Nov 23 '17 at 15:15

feynman410

1,738417

answered Nov 23 '17 at 15:15

feynman410

1,738417

answered Nov 23 '17 at 15:15

feynman410

1,738417

answered Nov 23 '17 at 15:15

feynman410

1,738417

answered Nov 23 '17 at 15:15

feynman410

1,738417

add a comment |

bert-as-service (https://github.com/hanxiao/bert-as-service#building-a-qa-semantic-search-engine-in-3-minutes) offers just that solution.

answered Mar 24 at 6:12

Andres Suarez

New contributor

add a comment |

bert-as-service (https://github.com/hanxiao/bert-as-service#building-a-qa-semantic-search-engine-in-3-minutes) offers just that solution.

answered Mar 24 at 6:12

Andres Suarez

New contributor

add a comment |

bert-as-service (https://github.com/hanxiao/bert-as-service#building-a-qa-semantic-search-engine-in-3-minutes) offers just that solution.

answered Mar 24 at 6:12

Andres Suarez

New contributor

bert-as-service (https://github.com/hanxiao/bert-as-service#building-a-qa-semantic-search-engine-in-3-minutes) offers just that solution.

answered Mar 24 at 6:12

Andres Suarez

New contributor

answered Mar 24 at 6:12

Andres Suarez

New contributor

answered Mar 24 at 6:12

Andres Suarez

answered Mar 24 at 6:12

Andres Suarez

New contributor

Andres Suarez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

answered Mar 24 at 10:20

karthikeyan mg

19510

add a comment |

answered Mar 24 at 10:20

karthikeyan mg

19510

add a comment |

answered Mar 24 at 10:20

karthikeyan mg

19510

answered Mar 24 at 10:20

karthikeyan mg

19510

answered Mar 24 at 10:20

karthikeyan mg

19510

answered Mar 24 at 10:20

karthikeyan mg

19510

answered Mar 24 at 10:20

karthikeyan mg

19510

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

vnh4,mNQnEWi nru0FrMgKpzhVXYGXEXgXvD89e2GP,pE,Wou3MPFhkd8,AtHIcu7YW7Rtl8RUuTAHPxl

搜尋此網誌

Trjtdtk

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers
4

4 Answers
4

4 Answers
4