Doc2vec most similar document to a query string The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering
What do I do when my TA workload is more than expected?
Deal with toxic manager when you can't quit
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Can a flute soloist sit?
Nested ellipses in tikzpicture: Chomsky hierarchy
Why did Peik Lin say, "I'm not an animal"?
How to substitute curly brackets with round brackets in a grid of list
What other Star Trek series did the main TNG cast show up in?
Student Loan from years ago pops up and is taking my salary
How to design a circuit to convert 100 mV and 50 Hz Sine wave to square wave?
How to read αἱμύλιος or when to aspirate
Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?
Was credit for the black hole image misappropriated?
Is this wall load bearing? Blueprints and photos attached
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Why are PDP-7-style microprogrammed instructions out of vogue?
Are spiders unable to hurt humans, especially very small spiders?
Can we generate random numbers using irrational numbers like π and e?
Can withdrawing asylum be illegal?
Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?
Why can't devices on different VLANs, but on the same subnet, communicate?
Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?
Can I visit the Trinity College (Cambridge) library and see some of their rare books
Am I ethically obligated to go into work on an off day if the reason is sudden?
Doc2vec most similar document to a query string
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering
$begingroup$
I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?
word2vec similarity natural-language-process information-retrieval similar-documents
$endgroup$
add a comment |
$begingroup$
I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?
word2vec similarity natural-language-process information-retrieval similar-documents
$endgroup$
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58
add a comment |
$begingroup$
I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?
word2vec similarity natural-language-process information-retrieval similar-documents
$endgroup$
I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?
word2vec similarity natural-language-process information-retrieval similar-documents
word2vec similarity natural-language-process information-retrieval similar-documents
asked Mar 30 at 19:42
qiqiqiqi
61
61
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58
add a comment |
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57
$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15
$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58