How to cluster text-based software requirementsKeyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation
How is the law in a case of multiple edim zomemim justified by Chachomim?
Why is B♯ higher than C♭ in 31-ET?
Pressure inside an infinite ocean?
Answer "Justification for travel support" in conference registration form
What property of a transistor makes it an amplifier?
Automatically use long arrows in display mode
Identifying my late father's D&D stuff found in the attic
Virus Detected - Please execute anti-virus code
Does this article imply that Turing-Computability is not the same as "effectively computable"?
How can I close a gap between my fence and my neighbor's that's on his side of the property line?
What is a "listed natural gas appliance"?
Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?
What was the state of the German rail system in 1944?
Comment rendre "naysayers" ?
Transferring data speed of Fast Ethernet
Airbnb - host wants to reduce rooms, can we get refund?
How can I support myself financially as a 17 year old with a loan?
Should I replace my bicycle tires if they have not been inflated in multiple years
How encryption in SQL login authentication works
Missed the connecting flight, separate tickets on same airline - who is responsible?
Is there formal test of non-linearity in linear regression?
A mathematically illogical argument in the derivation of Hamilton's equation in Goldstein
In Avengers 1, why does Thanos need Loki?
Why do we use caret (^) as the symbol for ctrl/control?
How to cluster text-based software requirements
Keyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation
$begingroup$
I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?
Thank you very much for your answers!
neural-network clustering unsupervised-learning natural-language-process
$endgroup$
add a comment |
$begingroup$
I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?
Thank you very much for your answers!
neural-network clustering unsupervised-learning natural-language-process
$endgroup$
add a comment |
$begingroup$
I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?
Thank you very much for your answers!
neural-network clustering unsupervised-learning natural-language-process
$endgroup$
I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?
Thank you very much for your answers!
neural-network clustering unsupervised-learning natural-language-process
neural-network clustering unsupervised-learning natural-language-process
asked Apr 9 at 16:40
TakwaTakwa
62
62
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
$endgroup$
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48975%2fhow-to-cluster-text-based-software-requirements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
$endgroup$
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
add a comment |
$begingroup$
I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
$endgroup$
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
add a comment |
$begingroup$
I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
$endgroup$
I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93
answered Apr 9 at 17:06
pythinkerpythinker
8641314
8641314
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
add a comment |
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48975%2fhow-to-cluster-text-based-software-requirements%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown