Game theory in Reinforcement Learning The Next CEO of Stack Overflow2019 Community Moderator ElectionCooperative Reinforcement LearningAlphaGo (and other game programs using reinforcement-learning) without human databaseWhat is Reinforcement Learning?Reinforcement learning, pendulum pythonWhat is significance of Colour-digit MNIST game in paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning?Reinforcement Learning different patientsReinforcement Learning (Q Learning)Implementing experience replay in reinforcement learningBoundaries of Reinforcement LearningDeep Reinforcement Learning for dynamic pricing
Find a path from s to t using as few red nodes as possible
What is the difference between 'contrib' and 'non-free' packages repositories?
Is it a bad idea to plug the other end of ESD strap to wall ground?
How can a day be of 24 hours?
Does int main() need a declaration on C++?
How does a dynamic QR code work?
How can the PCs determine if an item is a phylactery?
Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version
Avoiding the "not like other girls" trope?
Oldie but Goldie
Physiological effects of huge anime eyes
What happens if you break a law in another country outside of that country?
Ising model simulation
My boss doesn't want me to have a side project
Finitely generated matrix groups whose eigenvalues are all algebraic
Arrows in tikz Markov chain diagram overlap
How seriously should I take size and weight limits of hand luggage?
Can a person "agarrar" something? ¿Puede una persona "agarrar" algo?
Read/write a pipe-delimited file line by line with some simple text manipulation
logical reads on global temp table, but not on session-level temp table
How to show a landlord what we have in savings?
How exploitable/balanced is this homebrew spell: Spell Permanency?
How to find if SQL server backup is encrypted with TDE without restoring the backup
Is it possible to make a 9x9 table fit within the default margins?
Game theory in Reinforcement Learning
The Next CEO of Stack Overflow2019 Community Moderator ElectionCooperative Reinforcement LearningAlphaGo (and other game programs using reinforcement-learning) without human databaseWhat is Reinforcement Learning?Reinforcement learning, pendulum pythonWhat is significance of Colour-digit MNIST game in paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning?Reinforcement Learning different patientsReinforcement Learning (Q Learning)Implementing experience replay in reinforcement learningBoundaries of Reinforcement LearningDeep Reinforcement Learning for dynamic pricing
$begingroup$
In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
Deep Mind Alpha-Star:
Mastering this problem requires breakthroughs in several AI research challenges including:
Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.
Where the game theory is applied when it comes to reinforcement learning?
deep-learning reinforcement-learning deepmind
$endgroup$
add a comment |
$begingroup$
In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
Deep Mind Alpha-Star:
Mastering this problem requires breakthroughs in several AI research challenges including:
Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.
Where the game theory is applied when it comes to reinforcement learning?
deep-learning reinforcement-learning deepmind
$endgroup$
add a comment |
$begingroup$
In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
Deep Mind Alpha-Star:
Mastering this problem requires breakthroughs in several AI research challenges including:
Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.
Where the game theory is applied when it comes to reinforcement learning?
deep-learning reinforcement-learning deepmind
$endgroup$
In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
Deep Mind Alpha-Star:
Mastering this problem requires breakthroughs in several AI research challenges including:
Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.
Where the game theory is applied when it comes to reinforcement learning?
deep-learning reinforcement-learning deepmind
deep-learning reinforcement-learning deepmind
edited Mar 25 at 10:38
Neil Slater
17.5k33264
17.5k33264
asked Mar 25 at 6:59
Karthik RajkumarKarthik Rajkumar
286
286
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Where the game theory is applied when it comes to reinforcement learning?
It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.
The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.
The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.
The way that Deep Mind team approached this is explained in their blog:
To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]
So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.
Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).
In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:
Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.
Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".
There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.
In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.
If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.
Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.
$endgroup$
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47923%2fgame-theory-in-reinforcement-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Where the game theory is applied when it comes to reinforcement learning?
It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.
The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.
The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.
The way that Deep Mind team approached this is explained in their blog:
To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]
So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.
Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).
In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:
Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.
Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".
There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.
In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.
If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.
Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.
$endgroup$
add a comment |
$begingroup$
Where the game theory is applied when it comes to reinforcement learning?
It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.
The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.
The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.
The way that Deep Mind team approached this is explained in their blog:
To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]
So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.
Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).
In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:
Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.
Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".
There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.
In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.
If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.
Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.
$endgroup$
add a comment |
$begingroup$
Where the game theory is applied when it comes to reinforcement learning?
It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.
The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.
The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.
The way that Deep Mind team approached this is explained in their blog:
To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]
So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.
Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).
In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:
Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.
Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".
There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.
In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.
If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.
Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.
$endgroup$
Where the game theory is applied when it comes to reinforcement learning?
It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.
The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.
The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.
The way that Deep Mind team approached this is explained in their blog:
To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]
So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.
Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).
In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:
Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.
Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".
There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.
In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.
If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.
Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.
answered Mar 25 at 13:44
Neil SlaterNeil Slater
17.5k33264
17.5k33264
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47923%2fgame-theory-in-reinforcement-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown