How to cluster text-based software requirementsKeyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation

How is the law in a case of multiple edim zomemim justified by Chachomim?

Why is B♯ higher than C♭ in 31-ET?

Pressure inside an infinite ocean?

Answer "Justification for travel support" in conference registration form

What property of a transistor makes it an amplifier?

Automatically use long arrows in display mode

Identifying my late father's D&D stuff found in the attic

Virus Detected - Please execute anti-virus code

Does this article imply that Turing-Computability is not the same as "effectively computable"?

How can I close a gap between my fence and my neighbor's that's on his side of the property line?

What is a "listed natural gas appliance"?

Did we get closer to another plane than we were supposed to, or was the pilot just protecting our delicate sensibilities?

What was the state of the German rail system in 1944?

Comment rendre "naysayers" ?

Transferring data speed of Fast Ethernet

Airbnb - host wants to reduce rooms, can we get refund?

How can I support myself financially as a 17 year old with a loan?

Should I replace my bicycle tires if they have not been inflated in multiple years

How encryption in SQL login authentication works

Missed the connecting flight, separate tickets on same airline - who is responsible?

Is there formal test of non-linearity in linear regression?

A mathematically illogical argument in the derivation of Hamilton's equation in Goldstein

In Avengers 1, why does Thanos need Loki?

Why do we use caret (^) as the symbol for ctrl/control?



How to cluster text-based software requirements


Keyword/phrase extraction from Text using Deep Learning librariesHow can autoencoders be used for clustering?Encog neural network multiple outputsOne hot encoding vs Word embeddingKeyword Extraction from a text followed by a key value using tensorflowGraph & Network Mining: clustering/community detection/ classificationDeep Learning Network decreasing in accuracyHow Do I Learn Neural Networks?Neural Network for detecting/checking for requirements in diagramsWhy is MLP working similar to RNN for text generation













0












$begingroup$


I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?



Thank you very much for your answers!










share|improve this question









$endgroup$
















    0












    $begingroup$


    I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?



    Thank you very much for your answers!










    share|improve this question









    $endgroup$














      0












      0








      0





      $begingroup$


      I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?



      Thank you very much for your answers!










      share|improve this question









      $endgroup$




      I'm beginner in deep learning and I'd like to cluster text-based software requirements by themes (words similarities/frequency of words) using neural networks. Is there any example/tutorial/github code of unsupervised neural network that groups texts based on themes and words similarities?



      Thank you very much for your answers!







      neural-network clustering unsupervised-learning natural-language-process






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 9 at 16:40









      TakwaTakwa

      62




      62




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
          https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93






          share|improve this answer









          $endgroup$












          • $begingroup$
            thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
            $endgroup$
            – Takwa
            Apr 19 at 14:12










          • $begingroup$
            You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
            $endgroup$
            – pythinker
            Apr 19 at 18:45










          • $begingroup$
            Thank you for the explanation @pythinker !
            $endgroup$
            – Takwa
            Apr 25 at 8:21










          • $begingroup$
            @Takwa You’re welcome
            $endgroup$
            – pythinker
            Apr 25 at 9:19











          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48975%2fhow-to-cluster-text-based-software-requirements%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
          https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93






          share|improve this answer









          $endgroup$












          • $begingroup$
            thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
            $endgroup$
            – Takwa
            Apr 19 at 14:12










          • $begingroup$
            You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
            $endgroup$
            – pythinker
            Apr 19 at 18:45










          • $begingroup$
            Thank you for the explanation @pythinker !
            $endgroup$
            – Takwa
            Apr 25 at 8:21










          • $begingroup$
            @Takwa You’re welcome
            $endgroup$
            – pythinker
            Apr 25 at 9:19















          0












          $begingroup$

          I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
          https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93






          share|improve this answer









          $endgroup$












          • $begingroup$
            thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
            $endgroup$
            – Takwa
            Apr 19 at 14:12










          • $begingroup$
            You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
            $endgroup$
            – pythinker
            Apr 19 at 18:45










          • $begingroup$
            Thank you for the explanation @pythinker !
            $endgroup$
            – Takwa
            Apr 25 at 8:21










          • $begingroup$
            @Takwa You’re welcome
            $endgroup$
            – pythinker
            Apr 25 at 9:19













          0












          0








          0





          $begingroup$

          I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
          https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93






          share|improve this answer









          $endgroup$



          I recommend using word2vec as feature vector of words and LSTM autoencoder to encode a sentence (or text) . After you get a vector for each sentence (or text), you can cluster your sentences (or texts) using a variety of clustering techniques like k-means or dbscan and represent them using t-sne or u-map. Start from here:
          https://blog.myyellowroad.com/unsupervised-sentence-representation-with-deep-learning-104b90079a93







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 9 at 17:06









          pythinkerpythinker

          8641314




          8641314











          • $begingroup$
            thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
            $endgroup$
            – Takwa
            Apr 19 at 14:12










          • $begingroup$
            You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
            $endgroup$
            – pythinker
            Apr 19 at 18:45










          • $begingroup$
            Thank you for the explanation @pythinker !
            $endgroup$
            – Takwa
            Apr 25 at 8:21










          • $begingroup$
            @Takwa You’re welcome
            $endgroup$
            – pythinker
            Apr 25 at 9:19
















          • $begingroup$
            thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
            $endgroup$
            – Takwa
            Apr 19 at 14:12










          • $begingroup$
            You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
            $endgroup$
            – pythinker
            Apr 19 at 18:45










          • $begingroup$
            Thank you for the explanation @pythinker !
            $endgroup$
            – Takwa
            Apr 25 at 8:21










          • $begingroup$
            @Takwa You’re welcome
            $endgroup$
            – pythinker
            Apr 25 at 9:19















          $begingroup$
          thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
          $endgroup$
          – Takwa
          Apr 19 at 14:12




          $begingroup$
          thank you for your answer ! regarding the sentence encoding, there is an existing implementation of the TF-IDF algorithm in sklearn, here is the tutorial (pythonprogramminglanguage.com/kmeans-text-clustering). Thus, i am wondering why it's recommended to use encoding techniques such as word2vec and LSTM. Can you please explain the advantages of using such techniques compared to the one implemented in sklearn for instance?
          $endgroup$
          – Takwa
          Apr 19 at 14:12












          $begingroup$
          You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
          $endgroup$
          – pythinker
          Apr 19 at 18:45




          $begingroup$
          You’re welcome. Actually, the first advantage of using word2vec over tf-idf is that, word2vec contains contextual information but tf-idf does not. The second advantage is that, word2vec uses information from a large dataset (pre-training), so it better models the language than tf-idf. And for the third advantage you should consider that as the vocabulary size increases, the tf-idf size increases, too. However, pre-trained word2vec vectors have fixed size, regardless of vocabulary size.
          $endgroup$
          – pythinker
          Apr 19 at 18:45












          $begingroup$
          Thank you for the explanation @pythinker !
          $endgroup$
          – Takwa
          Apr 25 at 8:21




          $begingroup$
          Thank you for the explanation @pythinker !
          $endgroup$
          – Takwa
          Apr 25 at 8:21












          $begingroup$
          @Takwa You’re welcome
          $endgroup$
          – pythinker
          Apr 25 at 9:19




          $begingroup$
          @Takwa You’re welcome
          $endgroup$
          – pythinker
          Apr 25 at 9:19

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48975%2fhow-to-cluster-text-based-software-requirements%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High