How to cluster text-based software requirementsKeyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation

How is the law in a case of multiple edim zomemim justified by Chachomim?

Why is B♯ higher than C♭ in 31-ET?

Pressure inside an infinite ocean?

Answer "Justification for travel support" in conference registration form

What property of a transistor makes it an amplifier?

Automatically use long arrows in display mode

Identifying my late father's D&D stuff found in the attic

Virus Detected - Please execute anti-virus code

Does this article imply that Turing-Computability is not the same as "effectively computable"?

How can I close a gap between my fence and my neighbor's that's on his side of the property line?

What is a "listed natural gas appliance"?

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

What was the state of the German rail system in 1944?

Comment rendre "naysayers" ?

Transferring data speed of Fast Ethernet

Airbnb - host wants to reduce rooms, can we get refund?

How can I support myself financially as a 17 year old with a loan?

Should I replace my bicycle tires if they have not been inflated in multiple years

How encryption in SQL login authentication works

Missed the connecting flight, separate tickets on same airline - who is responsible?

Is there formal test of non-linearity in linear regression?

A mathematically illogical argument in the derivation of Hamilton's equation in Goldstein

In Avengers 1, why does Thanos need Loki?

Why do we use caret (^) as the symbol for ctrl/control?

How to cluster text-based software requirements

Keyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation

I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?

Thank you very much for your answers!

asked Apr 9 at 16:40

Takwa

add a comment |

Thank you very much for your answers!

asked Apr 9 at 16:40

Takwa

add a comment |

Thank you very much for your answers!

asked Apr 9 at 16:40

Takwa

Thank you very much for your answers!

neural-network clustering unsupervised-learning natural-language-process

asked Apr 9 at 16:40

Takwa

asked Apr 9 at 16:40

Takwa

asked Apr 9 at 16:40

Takwa

asked Apr 9 at 16:40

Takwa

asked Apr 9 at 16:40

Takwa

add a comment |

1 Answer
1

active

oldest

votes

I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93

answered Apr 9 at 17:06

pythinker

8641314

$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12

$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45

$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21

$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48975%2fhow-to-cluster-text-based-software-requirements%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Apr 9 at 17:06

pythinker

8641314

$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12

$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45

$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21

$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19

add a comment |

answered Apr 9 at 17:06

pythinker

8641314

$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12

$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45

$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21

$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19

add a comment |

answered Apr 9 at 17:06

pythinker

8641314

answered Apr 9 at 17:06

pythinker

8641314

answered Apr 9 at 17:06

pythinker

8641314

answered Apr 9 at 17:06

pythinker

8641314

answered Apr 9 at 17:06

pythinker

8641314

$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12

$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45

$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21

$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19

add a comment |

$begingroup$
thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
$endgroup$
– Takwa
Apr 19 at 14:12

$begingroup$
You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
$endgroup$
– pythinker
Apr 19 at 18:45

$begingroup$
Thank you for the explanation @pythinker !
$endgroup$
– Takwa
Apr 25 at 8:21

$begingroup$
@Takwa You’re welcome
$endgroup$
– pythinker
Apr 25 at 9:19

thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?

– Takwa
Apr 19 at 14:12

You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.

– pythinker
Apr 19 at 18:45

Thank you for the explanation @pythinker !

– Takwa
Apr 25 at 8:21

@Takwa You’re welcome

– pythinker
Apr 25 at 9:19

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

mftWk helYn4UDu

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1