In calculating policy gradients, wouldn't longer trajectories have more weight according to the policy gradient formula?Understanding the training phase of the tutorial “Using Keras and Deep Deterministic Policy Gradient to play TORCS” tutorialWhy do we normalize the discounted rewards when doing policy gradient reinforcement learning?How to get rid of the expectation in Monte Carlo Policy Gradient method?How an action gets selected in a Policy Gradient Method?Time horizon T in policy gradients (actor-critic)Policy Gradient Methods - ScoreFunction & Log(policy)Policy Gradients - gradient Log probabilities favor less likely actions?Understanding policy gradient theorem - What does it mean to take gradients of reward wrt policy parameters?

Yosemite Fire Rings - What to Expect?

What are some good ways to treat frozen vegetables such that they behave like fresh vegetables when stir frying them?

Creepy dinosaur pc game identification

Can a stoichiometric mixture of oxygen and methane exist as a liquid at standard pressure and some (low) temperature?

Can a Canadian Travel to the USA twice, less than 180 days each time?

Why "had" in "[something] we would have made had we used [something]"?

Extract more than nine arguments that occur periodically in a sentence to use in macros in order to typset

Why is so much work done on numerical verification of the Riemann Hypothesis?

Mimic lecturing on blackboard, facing audience

Is there an injective, monotonically increasing, strictly concave function from the reals, to the reals?

What is the highest possible scrabble score for placing a single tile

Why can Carol Danvers change her suit colours in the first place?

Limits and Infinite Integration by Parts

Calculate sum of polynomial roots

Invalid date error by date command

What should you do when eye contact makes your subordinate uncomfortable?

Does malloc reserve more space while allocating memory?

Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?

How do you make your own symbol when Detexify fails?

What happens if you are holding an Iron Flask with a demon inside and walk into an Antimagic Field?

What if a revenant (monster) gains fire resistance?

Why did the EU agree to delay the Brexit deadline?

Can I still be respawned if I die by falling off the map?

How should I respond when I lied about my education and the company finds out through background check?

In calculating policy gradients, wouldn't longer trajectories have more weight according to the policy gradient formula?

Understanding the training phase of the tutorial “Using Keras and Deep Deterministic Policy Gradient to play TORCS” tutorialWhy do we normalize the discounted rewards when doing policy gradient reinforcement learning?How to get rid of the expectation in Monte Carlo Policy Gradient method?How an action gets selected in a Policy Gradient Method?Time horizon T in policy gradients (actor-critic)Policy Gradient Methods - ScoreFunction & Log(policy)Policy Gradients - gradient Log probabilities favor less likely actions?Understanding policy gradient theorem - What does it mean to take gradients of reward wrt policy parameters?

In Sergey Levine's lecture on policy gradients (berkeley deep rl course), he show that policy gradient can be evaluated according to the formula
policy gradient formula

In this formula, wouldn't longer trajectories get more weight (in finite horizon situations), since the middle term, the sum over log pi, would involve more terms? (Why would it work like that?)

The specific example I have in mind is pacman, longer trajectories would contribute more to the gradient. Should it work like that?

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

add a comment |

In Sergey Levine's lecture on policy gradients (berkeley deep rl course), he show that policy gradient can be evaluated according to the formula
policy gradient formula

In this formula, wouldn't longer trajectories get more weight (in finite horizon situations), since the middle term, the sum over log pi, would involve more terms? (Why would it work like that?)

The specific example I have in mind is pacman, longer trajectories would contribute more to the gradient. Should it work like that?

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

add a comment |

In Sergey Levine's lecture on policy gradients (berkeley deep rl course), he show that policy gradient can be evaluated according to the formula
policy gradient formula

In this formula, wouldn't longer trajectories get more weight (in finite horizon situations), since the middle term, the sum over log pi, would involve more terms? (Why would it work like that?)

The specific example I have in mind is pacman, longer trajectories would contribute more to the gradient. Should it work like that?

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

In Sergey Levine's lecture on policy gradients (berkeley deep rl course), he show that policy gradient can be evaluated according to the formula
policy gradient formula

In this formula, wouldn't longer trajectories get more weight (in finite horizon situations), since the middle term, the sum over log pi, would involve more terms? (Why would it work like that?)

The specific example I have in mind is pacman, longer trajectories would contribute more to the gradient. Should it work like that?

reinforcement-learning policy-gradients

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

edited Mar 19 at 4:46

asked Mar 19 at 3:50

liyuan

205

asked Mar 19 at 3:50

liyuan

205

asked Mar 19 at 3:50

liyuan

205

add a comment |

1 Answer
1

active

oldest

votes

wouldn't longer trajectories get more weight?

Not necessarily. Gradient $triangledown_theta$ could be negative or positive (1D analogy), therefore, larger number of gradients could have a smaller weight, which makes sense. A consistent short trajectory is more informative (has more weight) than an inconsistent long trajectory with sign-alternating policy gradients.

Why would it work like that?

If we are comparing two consistent trajectories, where most gradients are in the same direction, this formula makes sense again. A long consistent trajectory contains more useful information (more steps that confirm each other) than a short one. In real life, compare the informativeness of a successful week to a successful year for your policy learning.

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47577%2fin-calculating-policy-gradients-wouldnt-longer-trajectories-have-more-weight-a%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

wouldn't longer trajectories get more weight?

Why would it work like that?

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

add a comment |

wouldn't longer trajectories get more weight?

Why would it work like that?

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

add a comment |

wouldn't longer trajectories get more weight?

Why would it work like that?

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

wouldn't longer trajectories get more weight?

Why would it work like that?

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

edited Mar 19 at 19:12

answered Mar 19 at 8:19

Esmailian

1,686114

answered Mar 19 at 8:19

Esmailian

1,686114

answered Mar 19 at 8:19

Esmailian

1,686114

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

q9e 22pY

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1