Working ofLSTM with multiple Units - NER Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsThe model of LSTM with more than one unitQuestions about LSTM cells, units and inputshow does minibatch for LSTM look like?How can a GRU perform as well as an LSTM?How to define the shape of hidden and meory state in Numpy, keras?How many RNN units are needed for tasks involving sequences?Structure of a multilayered LSTM neural network?How data are being feed into LSTM cellUnderstanding LSTM structure
How do I deal with an erroneously large refund?
Im stuck and having trouble with ¬P ∨ Q Prove: P → Q
/bin/ls sorts differently than just ls
What were wait-states, and why was it only an issue for PCs?
Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?
How to produce a PS1 prompt in bash or ksh93 similar to tcsh
Can gravitational waves pass through a black hole?
Protagonist's race is hidden - should I reveal it?
Sorting the characters in a utf-16 string in java
Why does BitLocker not use RSA?
Why aren't road bike wheels tiny?
Is there a verb for listening stealthily?
Proving inequality for positive definite matrix
Is Vivien of the Wilds + Wilderness Reclamation a competitive combo?
What is the definining line between a helicopter and a drone a person can ride in?
Can a Wizard take the Magic Initiate feat and select spells from the Wizard list?
What is the difference between 准时 and 按时?
2 sample t test for sample sizes - 30,000 and 150,000
What came first? Venom as the movie or as the song?
"Destructive force" carried by a B-52?
Putting Ant-Man on house arrest
Will the Antimagic Field spell cause elementals not summoned by magic to dissipate?
How to make an animal which can only breed for a certain number of generations?
When speaking, how do you change your mind mid-sentence?
Working ofLSTM with multiple Units - NER
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsThe model of LSTM with more than one unitQuestions about LSTM cells, units and inputshow does minibatch for LSTM look like?How can a GRU perform as well as an LSTM?How to define the shape of hidden and meory state in Numpy, keras?How many RNN units are needed for tasks involving sequences?Structure of a multilayered LSTM neural network?How data are being feed into LSTM cellUnderstanding LSTM structure
$begingroup$
I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.
Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons
1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?
2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?
3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)
4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?
deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence
$endgroup$
add a comment |
$begingroup$
I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.
Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons
1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?
2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?
3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)
4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?
deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence
$endgroup$
add a comment |
$begingroup$
I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.
Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons
1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?
2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?
3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)
4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?
deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence
$endgroup$
I am trying to understand working of LSTM networks and kind of not clear about how different neurons in a cell interact each other. I had a look at a similar question, but still not clear about few things.
Assume an NER task where each time step get embedding on one word as input and there are 10 time steps. I defined the LSTM cell with 100 units, As per my understanding each units will have 3 separate gates and memory cell as shown in the second figure (total 100 memory cells and 150 gates). From the theory of LSTM, each of these memory states stores different information, like one might store about singularity, other might store information about gender and so on (please correct me if I am wrong). With this set up, I have few doubts about the working of LSTM and the interaction between each units/neurons
1) In the figure, each cell has multiple neuron, lets say 100. Does the first input X1 (first word vector) will be input to the first neuron of the cell along with previous cell's activation function (say h0) and its output will go to the second neuron and so on..? Or same input X1 and h0 will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step (not sure whether any aggregation happens) ?
2) How different units would be able to store different features (singularity, gender etc..)? is it because they are initialized with random and different weights?
3) If neurons are working parallel, Is there any interaction happens between them to share information? will it be possible that multiple cell state store about same information (say, gender)
4) If LSTM cell 1 has 100 neurons/units and input X1, output h1 will also have 100 values (one for each neuron) ?
deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence
deep-learning lstm recurrent-neural-net named-entity-recognition sequence-to-sequence
edited Apr 5 at 4:24
B.K
asked Apr 4 at 23:47
B.KB.K
83
83
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Welcome to StackExchange DataScience!
1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.
2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.
3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
$$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.
4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".
If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48640%2fworking-oflstm-with-multiple-units-ner%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Welcome to StackExchange DataScience!
1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.
2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.
3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
$$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.
4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".
If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.
$endgroup$
add a comment |
$begingroup$
Welcome to StackExchange DataScience!
1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.
2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.
3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
$$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.
4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".
If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.
$endgroup$
add a comment |
$begingroup$
Welcome to StackExchange DataScience!
1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.
2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.
3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
$$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.
4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".
If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.
$endgroup$
Welcome to StackExchange DataScience!
1) The same input $x_1$ and $h_0$ will be fed to each 100 neurons parallel (independent) and respective outputs will go to next time step. Notice that there are two 100-dimensional vector going to the next time step: $c_1$ and $h_1$, which are denoted by the arrows on the right side in your second figure.
2) This is a complicated question. Firstly, there are usually no one-to-one mapping between a neural and its meaning like neural 1 -> gender.
3) Those neurons do share information by the matrix multiplication operations. For example, the cell value at time step $t$ is usually calculated as
$$c_t = f_t c_t-1 + i_t tanh(W_xcx_t+W_hch_t-1+b_c)$$
Notice that the hidden value in the previous time step $h_t-1$ is multiplied by a matrix $W_hc$, therefore each value in $W_hch_t-1$ is a linear combination of all the values in $h_t-1$.
4) The number of neurons in the LSTM cell, the dimension of hidden vector $h_t$, the dimension of the cell vector $c_t$, and the dimension of the output vector $o_t$, are all the same (in your case, 100). This is known as the "hidden size", "output size" or "number of units/neurons". This is not the same as the dimension of input vector $x_t$, which is usually known as the "input size".
If you have further questions about the dimension of the LSTM variables, I suggest you to look at the first answer here where the author listed the dimension of all variables and parameters.
answered Apr 5 at 6:02
user12075user12075
1,351616
1,351616
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48640%2fworking-oflstm-with-multiple-units-ner%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown