Working ofLSTM with multiple Units - NER Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsThe model of LSTM with more than one unitQuestions about LSTM cells, units and inputshow does minibatch for LSTM look like?How can a GRU perform as well as an LSTM?How to define the shape of hidden and meory state in Numpy, keras?How many RNN units are needed for tasks involving sequences?Structure of a multilayered LSTM neural network?How data are being feed into LSTM cellUnderstanding LSTM structure

How do I deal with an erroneously large refund?

Im stuck and having trouble with ¬P ∨ Q Prove: P → Q

/bin/ls sorts differently than just ls

What were wait-states, and why was it only an issue for PCs?

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

How to produce a PS1 prompt in bash or ksh93 similar to tcsh

Can gravitational waves pass through a black hole?

Protagonist's race is hidden - should I reveal it?

Sorting the characters in a utf-16 string in java

Why does BitLocker not use RSA?

Why aren't road bike wheels tiny?

Is there a verb for listening stealthily?

Proving inequality for positive definite matrix

Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?

What is the definining line between a helicopter and a drone a person can ride in?

Can a Wizard take the Magic Initiate feat and select spells from the Wizard list?

What is the difference between 准时 and 按时?

2 sample t test for sample sizes - 30,000 and 150,000

What came first? Venom as the movie or as the song?

"Destructive force" carried by a B-52?

Putting Ant-Man on house arrest

Will the Antimagic Field spell cause elementals not summoned by magic to dissipate?

How to make an animal which can only breed for a certain number of generations?

When speaking, how do you change your mind mid-sentence?



Working ofLSTM with multiple Units - NER



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsThe model of LSTM with more than one unitQuestions about LSTM cells, units and inputshow does minibatch for LSTM look like?How can a GRU perform as well as an LSTM?How to define the shape of hidden and meory state in Numpy, keras?How many RNN units are needed for tasks involving sequences?Structure of a multilayered LSTM neural network?How data are being feed into LSTM cellUnderstanding LSTM structure










1












$begingroup$


I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.



Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons



1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?



2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?



3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)



4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?



enter image description hereenter image description here










share|improve this question











$endgroup$
















    1












    $begingroup$


    I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.



    Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons



    1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?



    2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?



    3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)



    4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?



    enter image description hereenter image description here










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.



      Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons



      1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?



      2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?



      3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)



      4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?



      enter image description hereenter image description here










      share|improve this question











      $endgroup$




      I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.



      Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons



      1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?



      2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?



      3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)



      4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?



      enter image description hereenter image description here







      deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 5 at 4:24







      B.K

















      asked Apr 4 at 23:47









      B.KB.K

      83




      83




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          Welcome to StackExchange DataScience!



          1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.



          2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.



          3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
          $$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
          Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.



          4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".



          If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.






          share|improve this answer









          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48640%2fworking-oflstm-with-multiple-units-ner%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            Welcome to StackExchange DataScience!



            1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.



            2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.



            3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
            $$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
            Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.



            4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".



            If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.






            share|improve this answer









            $endgroup$

















              0












              $begingroup$

              Welcome to StackExchange DataScience!



              1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.



              2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.



              3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
              $$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
              Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.



              4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".



              If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.






              share|improve this answer









              $endgroup$















                0












                0








                0





                $begingroup$

                Welcome to StackExchange DataScience!



                1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.



                2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.



                3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
                $$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
                Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.



                4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".



                If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.






                share|improve this answer









                $endgroup$



                Welcome to StackExchange DataScience!



                1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.



                2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.



                3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
                $$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
                Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.



                4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".



                If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 5 at 6:02









                user12075user12075

                1,351616




                1,351616



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48640%2fworking-oflstm-with-multiple-units-ner%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                    Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?