Doc2vec most similar document to a query string The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering

What do I do when my TA workload is more than expected?

Deal with toxic manager when you can't quit

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

Can a flute soloist sit?

Nested ellipses in tikzpicture: Chomsky hierarchy

Why did Peik Lin say, "I'm not an animal"?

How to substitute curly brackets with round brackets in a grid of list

What other Star Trek series did the main TNG cast show up in?

Student Loan from years ago pops up and is taking my salary

How to design a circuit to convert 100 mV and 50 Hz Sine wave to square wave?

How to read αἱμύλιος or when to aspirate

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Was credit for the black hole image misappropriated?

Is this wall load bearing? Blueprints and photos attached

What was the last x86 CPU that did not have the x87 floating-point unit built in?

Why are PDP-7-style microprogrammed instructions out of vogue?

Are spiders unable to hurt humans, especially very small spiders?

Can we generate random numbers using irrational numbers like π and e?

Can withdrawing asylum be illegal?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Why can't devices on different VLANs, but on the same subnet, communicate?

Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?

Can I visit the Trinity College (Cambridge) library and see some of their rare books

Am I ethically obligated to go into work on an off day if the reason is sudden?



Doc2vec most similar document to a query string



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsExtract canonical string from a list of noisy stringsBest way to search for a similar document given the ngramFinding the top K most similar setsDoc2vec(gensim) - How to calculate the most similar sentence and get its label?How to improve Vector Space Models with semantic similarity?Cosine similarity between query and document confusionDoc2vec to calculate cosine similarity - absolutely inaccurateHow word embedding work for word similarity?Incorporating new features in document similarity taskFind all potential similar documents out of a list of documents using clustering










1












$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58















1












$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58













1












1








1


1



$begingroup$


I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?










share|improve this question









$endgroup$




I'm working on a project and I created doc2vec representation of different academics which include their patents and publications etc. For each publication and patent I have information such as title and abstract. Now, I want to do a search on all of the professors and find which professor is the most similar to a query string, such as "deep learning" or "computer networking". I have tried to use the infer_vector() to create a doc2vec representation of the query string using the already generated model and calculate the cosine similarity between the vectors. But I got terrible results. For example, when I search for "computer networking", it will give me the result of professor from History.
Is there any recommendation of how to find most similar document to a query string?







word2vec similarity natural-language-process information-retrieval similar-documents






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 30 at 19:42









qiqiqiqi

61




61











  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58
















  • $begingroup$
    Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
    $endgroup$
    – Esmailian
    Mar 30 at 19:57











  • $begingroup$
    Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
    $endgroup$
    – qiqi
    Mar 30 at 23:15










  • $begingroup$
    You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
    $endgroup$
    – Esmailian
    Mar 31 at 7:58















$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57





$begingroup$
Welcome to this site! An easy-to-check alternative would be to use Euclidean distance instead of cosine similarity.
$endgroup$
– Esmailian
Mar 30 at 19:57













$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15




$begingroup$
Hm, I tried the euclidean distance, but it gave me similar results to the cosine similarity methods I tried. Is it possible that my query string is too short to give good results?
$endgroup$
– qiqi
Mar 30 at 23:15












$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58




$begingroup$
You can go for an easier-to-pass evaluation to see how far off is the model. To this end, see if a good result shows up in top 3, 5, 10 closest matches. Also, keywords in a paper are way more important than the abstract according to your queries, place a special emphasis on them.
$endgroup$
– Esmailian
Mar 31 at 7:58










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48274%2fdoc2vec-most-similar-document-to-a-query-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Marja Vauras Lähteet | Aiheesta muualla | NavigointivalikkoMarja Vauras Turun yliopiston tutkimusportaalissaInfobox OKSuomalaisen Tiedeakatemian varsinaiset jäsenetKasvatustieteiden tiedekunnan dekaanit ja muu johtoMarja VaurasKoulutusvienti on kestävyys- ja ketteryyslaji (2.5.2017)laajentamallaWorldCat Identities0000 0001 0855 9405n86069603utb201588738523620927

Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

Is this part of the description of the Archfey warlock's Misty Escape feature redundant?When is entropic ward considered “used”?How does the reaction timing work for Wrath of the Storm? Can it potentially prevent the damage from the triggering attack?Does the Dark Arts Archlich warlock patrons's Arcane Invisibility activate every time you cast a level 1+ spell?When attacking while invisible, when exactly does invisibility break?Can I cast Hellish Rebuke on my turn?Do I have to “pre-cast” a reaction spell in order for it to be triggered?What happens if a Player Misty Escapes into an Invisible CreatureCan a reaction interrupt multiattack?Does the Fiend-patron warlock's Hurl Through Hell feature dispel effects that require the target to be on the same plane as the caster?What are you allowed to do while using the Warlock's Eldritch Master feature?