Doc2vec most similar document to a query string The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering

What do I do when my TA workload is more than expected?

Deal with toxic manager when you can't quit

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

Can a flute soloist sit?

Nested ellipses in tikzpicture: Chomsky hierarchy

Why did Peik Lin say, "I'm not an animal"?

How to substitute curly brackets with round brackets in a grid of list

What other Star Trek series did the main TNG cast show up in?

Student Loan from years ago pops up and is taking my salary

How to design a circuit to convert 100 mV and 50 Hz Sine wave to square wave?

How to read αἱμύλιος or when to aspirate

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Was credit for the black hole image misappropriated?

Is this wall load bearing? Blueprints and photos attached

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Why are PDP-7-style microprogrammed instructions out of vogue?

Are spiders unable to hurt humans, especially very small spiders?

Can we generate random numbers using irrational numbers like π and e?

Can withdrawing asylum be illegal?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Why can't devices on different VLANs, but on the same subnet, communicate?

Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?

Can I visit the Trinity College (Cambridge) library and see some of their rare books

Am I ethically obligated to go into work on an off day if the reason is sudden?



Doc2vec most similar document to a query string



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering










1












$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58















1












$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58













1












1








1


1



$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$




I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?







word2vec similarity natural-language-process information-retrieval similar-documents






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 30 at 19:42









qiqiqiqi

61




61











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58
















  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58















$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57





$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57













$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15




$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15












$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58




$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High