RL: Collecting States (training data) in real-life. Must use fixed timestep?How to generate ratings without training data?Supervised Learning could be biased if we use obsolete dataIs RL applicable to environments that are totally RANDOM?How does Implicit Quantile-Regression Network (IQN) differ from QR-DQN?What strategies and algorithms are suited for using the time wasted in collecting big data?Reinforcement Learning on real time data over a web server
Consistent Linux device enumeration
Storage of electrolytic capacitors - how long?
Pre-Employment Background Check With Consent For Future Checks
Can anyone precisely describe what it means (or feels like) to play exactly what your "inner ear" is hearing?
Limit max CPU usage SQL SERVER with WSRM
Should a narrator ever describe things based on a character's view instead of facts?
Giving feedback to someone without sounding prejudiced
Bash: Why does this Brace Expression work this way?
Why didn’t Eve recognize the little cockroach as a living organism?
When should I pay my rent?
When and why was runway 07/25 at Kai Tak removed?
Mimic lecturing on blackboard, facing audience
How to add numbers in array using forEach
What should be the ideal length of sentences in a blog post for ease of reading?
Why is participating in the European Parliamentary elections used as a threat?
Why does the frost depth increase when the surface temperature warms up?
Extracting patterns from a text
Why would five hundred and five be same as one?
Asserting that Atheism and Theism are both faith based positions
PTIJ: Which Dr. Seuss books should one obtain?
The garden where everything is possible
Why do Radio Buttons not fill the entire outer circle?
How would a solely written language work mechanically
How to preserve electronics (computers, iPads and phones) for hundreds of years
RL: Collecting States (training data) in real-life. Must use fixed timestep?
How to generate ratings without training data?Supervised Learning could be biased if we use obsolete dataIs RL applicable to environments that are totally RANDOM?How does Implicit Quantile-Regression Network (IQN) differ from QR-DQN?What strategies and algorithms are suited for using the time wasted in collecting big data?Reinforcement Learning on real time data over a web server
$begingroup$
I am using a Reinforcement Learning agent to play a 3D game, but have trouble with collecting the "current and next state" pairs.
To decide what action to perform, the network must perform a forward pass.
It performs forward pass in time $t$, but in the meantime the game could have already skipped like 10 frames or more (a varying amount).
The situation is worsened if I run, say 100 games at once on the same computer.
I don't have the ability to stop the game at each frame to do forwardprop. Anyway it wouldn't be possible were I to train, say a real-life robot to walk.
Question:
Should I stick to a 'fixed timestep' approach, only asking to provide an action every 0.1 seconds? While it computes next action, I could pretend the network keeps outputting the most recent action for all the skipped frames. Good idea?
If that's the only option, then should I avoid at all costs situations where forward prop takes more than the 'fixed timestep'? (more than 0.1 sec in my case) So it's better to choose say, 0.2 seconds just to be safe.
Seems quite unreliable - is there a better way to do it?
Is there a paper that explores the alternatives? (I guess such a paper will be about real-life robot training)
reinforcement-learning
$endgroup$
This question has an open bounty worth +50
reputation from Kari ending ending at 2019-03-26 04:59:07Z">in 6 days.
The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.
If possible, please provide a paper that explores the timing-techniques. And/or help with outlining the possible techniques of collecting such data
add a comment |
$begingroup$
I am using a Reinforcement Learning agent to play a 3D game, but have trouble with collecting the "current and next state" pairs.
To decide what action to perform, the network must perform a forward pass.
It performs forward pass in time $t$, but in the meantime the game could have already skipped like 10 frames or more (a varying amount).
The situation is worsened if I run, say 100 games at once on the same computer.
I don't have the ability to stop the game at each frame to do forwardprop. Anyway it wouldn't be possible were I to train, say a real-life robot to walk.
Question:
Should I stick to a 'fixed timestep' approach, only asking to provide an action every 0.1 seconds? While it computes next action, I could pretend the network keeps outputting the most recent action for all the skipped frames. Good idea?
If that's the only option, then should I avoid at all costs situations where forward prop takes more than the 'fixed timestep'? (more than 0.1 sec in my case) So it's better to choose say, 0.2 seconds just to be safe.
Seems quite unreliable - is there a better way to do it?
Is there a paper that explores the alternatives? (I guess such a paper will be about real-life robot training)
reinforcement-learning
$endgroup$
This question has an open bounty worth +50
reputation from Kari ending ending at 2019-03-26 04:59:07Z">in 6 days.
The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.
If possible, please provide a paper that explores the timing-techniques. And/or help with outlining the possible techniques of collecting such data
add a comment |
$begingroup$
I am using a Reinforcement Learning agent to play a 3D game, but have trouble with collecting the "current and next state" pairs.
To decide what action to perform, the network must perform a forward pass.
It performs forward pass in time $t$, but in the meantime the game could have already skipped like 10 frames or more (a varying amount).
The situation is worsened if I run, say 100 games at once on the same computer.
I don't have the ability to stop the game at each frame to do forwardprop. Anyway it wouldn't be possible were I to train, say a real-life robot to walk.
Question:
Should I stick to a 'fixed timestep' approach, only asking to provide an action every 0.1 seconds? While it computes next action, I could pretend the network keeps outputting the most recent action for all the skipped frames. Good idea?
If that's the only option, then should I avoid at all costs situations where forward prop takes more than the 'fixed timestep'? (more than 0.1 sec in my case) So it's better to choose say, 0.2 seconds just to be safe.
Seems quite unreliable - is there a better way to do it?
Is there a paper that explores the alternatives? (I guess such a paper will be about real-life robot training)
reinforcement-learning
$endgroup$
I am using a Reinforcement Learning agent to play a 3D game, but have trouble with collecting the "current and next state" pairs.
To decide what action to perform, the network must perform a forward pass.
It performs forward pass in time $t$, but in the meantime the game could have already skipped like 10 frames or more (a varying amount).
The situation is worsened if I run, say 100 games at once on the same computer.
I don't have the ability to stop the game at each frame to do forwardprop. Anyway it wouldn't be possible were I to train, say a real-life robot to walk.
Question:
Should I stick to a 'fixed timestep' approach, only asking to provide an action every 0.1 seconds? While it computes next action, I could pretend the network keeps outputting the most recent action for all the skipped frames. Good idea?
If that's the only option, then should I avoid at all costs situations where forward prop takes more than the 'fixed timestep'? (more than 0.1 sec in my case) So it's better to choose say, 0.2 seconds just to be safe.
Seems quite unreliable - is there a better way to do it?
Is there a paper that explores the alternatives? (I guess such a paper will be about real-life robot training)
reinforcement-learning
reinforcement-learning
edited 2 days ago
Kari
asked Mar 17 at 4:26
KariKari
619422
619422
This question has an open bounty worth +50
reputation from Kari ending ending at 2019-03-26 04:59:07Z">in 6 days.
The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.
If possible, please provide a paper that explores the timing-techniques. And/or help with outlining the possible techniques of collecting such data
This question has an open bounty worth +50
reputation from Kari ending ending at 2019-03-26 04:59:07Z">in 6 days.
The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.
If possible, please provide a paper that explores the timing-techniques. And/or help with outlining the possible techniques of collecting such data
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Your "fixed timestep" idea is actually very similar to a common technique called frame skipping. Instead of waiting a fixed amount of time, agents wait a fixed number of frames $k$ before choosing a new action. In the meantime, they repeat their most recently chosen action.
Frame skipping was included as part of the Atari 2600 Arcade Learning Environment. It was also used in the foundational DQN paper. Common values of $k$ are 3, 4, and 5. The value chosen depended on the game being played, since different games had important events happen at different time resolutions. In these papers, frame skipping enabled training to happen roughly $k$ times faster. So this is definitely a valid technique to try.
I actually think this would generally be less of a concern in the robotics application. Forward propagation usually happens much more quickly than real-world event timescales. As an example, Stanford famously applies RL to fly small helicopters, which requires considerable precision.
Finally, if your forward propagation really is taking too long, you should consider a faster architecture. One approach would be just to make your neural net smaller. You might consider policy distillation for compressing a large, trained network into a smaller one. Also, make sure you're not using some ridiculously slow activation function like sigmoid or tanh. ReLU is the common choice if you don't need a bounded output for a given neuron. If you do, I recommend softsign.
If your time bottleneck is actually in action selection, due to a large action space and using a value network, you should seriously consider switching to a policy-based method (e.g. actor critic). This would help because sampling from a distribution over actions would potentially be much faster than the $max$ operation involved in value-based methods. You can read more about this in Section 13.7 of Sutton and Barto's RL book.
$endgroup$
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47449%2frl-collecting-states-training-data-in-real-life-must-use-fixed-timestep%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Your "fixed timestep" idea is actually very similar to a common technique called frame skipping. Instead of waiting a fixed amount of time, agents wait a fixed number of frames $k$ before choosing a new action. In the meantime, they repeat their most recently chosen action.
Frame skipping was included as part of the Atari 2600 Arcade Learning Environment. It was also used in the foundational DQN paper. Common values of $k$ are 3, 4, and 5. The value chosen depended on the game being played, since different games had important events happen at different time resolutions. In these papers, frame skipping enabled training to happen roughly $k$ times faster. So this is definitely a valid technique to try.
I actually think this would generally be less of a concern in the robotics application. Forward propagation usually happens much more quickly than real-world event timescales. As an example, Stanford famously applies RL to fly small helicopters, which requires considerable precision.
Finally, if your forward propagation really is taking too long, you should consider a faster architecture. One approach would be just to make your neural net smaller. You might consider policy distillation for compressing a large, trained network into a smaller one. Also, make sure you're not using some ridiculously slow activation function like sigmoid or tanh. ReLU is the common choice if you don't need a bounded output for a given neuron. If you do, I recommend softsign.
If your time bottleneck is actually in action selection, due to a large action space and using a value network, you should seriously consider switching to a policy-based method (e.g. actor critic). This would help because sampling from a distribution over actions would potentially be much faster than the $max$ operation involved in value-based methods. You can read more about this in Section 13.7 of Sutton and Barto's RL book.
$endgroup$
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
add a comment |
$begingroup$
Your "fixed timestep" idea is actually very similar to a common technique called frame skipping. Instead of waiting a fixed amount of time, agents wait a fixed number of frames $k$ before choosing a new action. In the meantime, they repeat their most recently chosen action.
Frame skipping was included as part of the Atari 2600 Arcade Learning Environment. It was also used in the foundational DQN paper. Common values of $k$ are 3, 4, and 5. The value chosen depended on the game being played, since different games had important events happen at different time resolutions. In these papers, frame skipping enabled training to happen roughly $k$ times faster. So this is definitely a valid technique to try.
I actually think this would generally be less of a concern in the robotics application. Forward propagation usually happens much more quickly than real-world event timescales. As an example, Stanford famously applies RL to fly small helicopters, which requires considerable precision.
Finally, if your forward propagation really is taking too long, you should consider a faster architecture. One approach would be just to make your neural net smaller. You might consider policy distillation for compressing a large, trained network into a smaller one. Also, make sure you're not using some ridiculously slow activation function like sigmoid or tanh. ReLU is the common choice if you don't need a bounded output for a given neuron. If you do, I recommend softsign.
If your time bottleneck is actually in action selection, due to a large action space and using a value network, you should seriously consider switching to a policy-based method (e.g. actor critic). This would help because sampling from a distribution over actions would potentially be much faster than the $max$ operation involved in value-based methods. You can read more about this in Section 13.7 of Sutton and Barto's RL book.
$endgroup$
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
add a comment |
$begingroup$
Your "fixed timestep" idea is actually very similar to a common technique called frame skipping. Instead of waiting a fixed amount of time, agents wait a fixed number of frames $k$ before choosing a new action. In the meantime, they repeat their most recently chosen action.
Frame skipping was included as part of the Atari 2600 Arcade Learning Environment. It was also used in the foundational DQN paper. Common values of $k$ are 3, 4, and 5. The value chosen depended on the game being played, since different games had important events happen at different time resolutions. In these papers, frame skipping enabled training to happen roughly $k$ times faster. So this is definitely a valid technique to try.
I actually think this would generally be less of a concern in the robotics application. Forward propagation usually happens much more quickly than real-world event timescales. As an example, Stanford famously applies RL to fly small helicopters, which requires considerable precision.
Finally, if your forward propagation really is taking too long, you should consider a faster architecture. One approach would be just to make your neural net smaller. You might consider policy distillation for compressing a large, trained network into a smaller one. Also, make sure you're not using some ridiculously slow activation function like sigmoid or tanh. ReLU is the common choice if you don't need a bounded output for a given neuron. If you do, I recommend softsign.
If your time bottleneck is actually in action selection, due to a large action space and using a value network, you should seriously consider switching to a policy-based method (e.g. actor critic). This would help because sampling from a distribution over actions would potentially be much faster than the $max$ operation involved in value-based methods. You can read more about this in Section 13.7 of Sutton and Barto's RL book.
$endgroup$
Your "fixed timestep" idea is actually very similar to a common technique called frame skipping. Instead of waiting a fixed amount of time, agents wait a fixed number of frames $k$ before choosing a new action. In the meantime, they repeat their most recently chosen action.
Frame skipping was included as part of the Atari 2600 Arcade Learning Environment. It was also used in the foundational DQN paper. Common values of $k$ are 3, 4, and 5. The value chosen depended on the game being played, since different games had important events happen at different time resolutions. In these papers, frame skipping enabled training to happen roughly $k$ times faster. So this is definitely a valid technique to try.
I actually think this would generally be less of a concern in the robotics application. Forward propagation usually happens much more quickly than real-world event timescales. As an example, Stanford famously applies RL to fly small helicopters, which requires considerable precision.
Finally, if your forward propagation really is taking too long, you should consider a faster architecture. One approach would be just to make your neural net smaller. You might consider policy distillation for compressing a large, trained network into a smaller one. Also, make sure you're not using some ridiculously slow activation function like sigmoid or tanh. ReLU is the common choice if you don't need a bounded output for a given neuron. If you do, I recommend softsign.
If your time bottleneck is actually in action selection, due to a large action space and using a value network, you should seriously consider switching to a policy-based method (e.g. actor critic). This would help because sampling from a distribution over actions would potentially be much faster than the $max$ operation involved in value-based methods. You can read more about this in Section 13.7 of Sutton and Barto's RL book.
edited 6 hours ago
answered 6 hours ago
Philip RaeisghasemPhilip Raeisghasem
1835
1835
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
add a comment |
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
1
1
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
$begingroup$
On a related note, frame skipping can actually be preferred even if you didn't have time constraints. The paper Frame Skip Is a Powerful Parameter for Learning to Play Atari shows that a larger frame skip helps in learning strategies over long time scales.
$endgroup$
– Philip Raeisghasem
6 hours ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47449%2frl-collecting-states-training-data-in-real-life-must-use-fixed-timestep%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown