Game theory in Reinforcement Learning The Next CEO of Stack Overflow2019 Community Moderator ElectionCooperative Reinforcement LearningAlphaGo (and other game programs using reinforcement-learning) without human databaseWhat is Reinforcement Learning?Reinforcement learning, pendulum pythonWhat is significance of Colour-digit MNIST game in paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning?Reinforcement Learning different patientsReinforcement Learning (Q Learning)Implementing experience replay in reinforcement learningBoundaries of Reinforcement LearningDeep Reinforcement Learning for dynamic pricing

Find a path from s to t using as few red nodes as possible

What is the difference between 'contrib' and 'non-free' packages repositories?

Is it a bad idea to plug the other end of ESD strap to wall ground?

How can a day be of 24 hours?

Does int main() need a declaration on C++?

How does a dynamic QR code work?

How can the PCs determine if an item is a phylactery?

Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version

Avoiding the "not like other girls" trope?

Oldie but Goldie

Physiological effects of huge anime eyes

What happens if you break a law in another country outside of that country?

Ising model simulation

My boss doesn't want me to have a side project

Finitely generated matrix groups whose eigenvalues are all algebraic

Arrows in tikz Markov chain diagram overlap

How seriously should I take size and weight limits of hand luggage?

Can a person "agarrar" something? ¿Puede una persona "agarrar" algo?

Read/write a pipe-delimited file line by line with some simple text manipulation

logical reads on global temp table, but not on session-level temp table

How to show a landlord what we have in savings?

How exploitable/balanced is this homebrew spell: Spell Permanency?

How to find if SQL server backup is encrypted with TDE without restoring the backup

Is it possible to make a 9x9 table fit within the default margins?



Game theory in Reinforcement Learning



The Next CEO of Stack Overflow
2019 Community Moderator ElectionCooperative Reinforcement LearningAlphaGo (and other game programs using reinforcement-learning) without human databaseWhat is Reinforcement Learning?Reinforcement learning, pendulum pythonWhat is significance of Colour-digit MNIST game in paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning?Reinforcement Learning different patientsReinforcement Learning (Q Learning)Implementing experience replay in reinforcement learningBoundaries of Reinforcement LearningDeep Reinforcement Learning for dynamic pricing










3












$begingroup$


In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
Deep Mind Alpha-Star:




Mastering this problem requires breakthroughs in several AI research challenges including:




  • Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.



Where the game theory is applied when it comes to reinforcement learning?










share|improve this question











$endgroup$
















    3












    $begingroup$


    In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
    Deep Mind Alpha-Star:




    Mastering this problem requires breakthroughs in several AI research challenges including:




    • Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.



    Where the game theory is applied when it comes to reinforcement learning?










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
      Deep Mind Alpha-Star:




      Mastering this problem requires breakthroughs in several AI research challenges including:




      • Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.



      Where the game theory is applied when it comes to reinforcement learning?










      share|improve this question











      $endgroup$




      In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm.
      Deep Mind Alpha-Star:




      Mastering this problem requires breakthroughs in several AI research challenges including:




      • Game theory: StarCraft is a game where, just like rock-paper-scissors, there is no single best strategy. As such, an AI training process needs to continually explore and expand the frontiers of strategic knowledge.



      Where the game theory is applied when it comes to reinforcement learning?







      deep-learning reinforcement-learning deepmind






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 25 at 10:38









      Neil Slater

      17.5k33264




      17.5k33264










      asked Mar 25 at 6:59









      Karthik RajkumarKarthik Rajkumar

      286




      286




















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$


          Where the game theory is applied when it comes to reinforcement learning?




          It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.



          The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.



          The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.



          The way that Deep Mind team approached this is explained in their blog:




          To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]




          So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.



          Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).



          In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:



          • Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.


          • Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".


          There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.



          In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.



          If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.



          Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47923%2fgame-theory-in-reinforcement-learning%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3












            $begingroup$


            Where the game theory is applied when it comes to reinforcement learning?




            It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.



            The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.



            The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.



            The way that Deep Mind team approached this is explained in their blog:




            To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]




            So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.



            Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).



            In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:



            • Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.


            • Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".


            There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.



            In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.



            If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.



            Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.






            share|improve this answer









            $endgroup$

















              3












              $begingroup$


              Where the game theory is applied when it comes to reinforcement learning?




              It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.



              The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.



              The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.



              The way that Deep Mind team approached this is explained in their blog:




              To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]




              So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.



              Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).



              In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:



              • Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.


              • Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".


              There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.



              In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.



              If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.



              Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.






              share|improve this answer









              $endgroup$















                3












                3








                3





                $begingroup$


                Where the game theory is applied when it comes to reinforcement learning?




                It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.



                The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.



                The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.



                The way that Deep Mind team approached this is explained in their blog:




                To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]




                So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.



                Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).



                In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:



                • Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.


                • Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".


                There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.



                In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.



                If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.



                Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.






                share|improve this answer









                $endgroup$




                Where the game theory is applied when it comes to reinforcement learning?




                It is not used directly in this case, and AlphaStar makes no breakthroughs in game theory. The blog's wording here is not super precise.



                The point of the quote was to explain the extra challenge, which occurs in many games where opponents can react to each other's choices and there is often a counter-strategy to any given policy. Rock-paper-scissors is the simplest game that has this challenge, but it is common in many strategy games, as the game designers typically don't want a single best strategy to dominate the game, often going to some lengths to balance options in the game so that more of their game content is used and to keep a level of uncertainty and excitement in the game-playing community.



                The actual breakthroughs in regards to the quote in your question, are in finding ways to perform the kinds of long-term exploration that allow for different high-level strategies. Many RL algorithms perform relatively local exploration which would be too weak to keep track of entirely different strategies and decide when to use them.



                The way that Deep Mind team approached this is explained in their blog:




                To encourage diversity in the league, each agent has its own learning objective: for example, which competitors should this agent aim to beat, and any additional internal motivations that bias how the agent plays. One agent may have an objective to beat one specific competitor, while another agent may have to beat a whole distribution of competitors [ . . . ]




                So Deep Mind have not resolved any of that at a theoretical level, and have not used game theory in any direct sense. However, they have identified the kind of game theory scenario that applies, and have used that in the design, making steps in an engineering sense towards practical solutions.



                Other solutions in RL might also apply, such as hierarchical RL for capturing high-level actions as strategies to inform lower-level decisions, or using slow changing noise functions to drive exploration (as opposed to something which changes faster such as epsilon-greedy).



                In general, game theory is related to reinforcement learning, in that both construct a formal view of optimising utility:



                • Game theory is useful for analysing multi-agent scenarios, but generally analyses optimal policies for relatively simple single-step or repeated games.


                • Reinforcement learning is well-described for single agents, and deals well with sequential decision making, but does not have much quite as much material for dealing with competitive and co-operative multi-agent environments - typically treating other agents as "part of the environment".


                There is enough cross-over between the two theories that they can be used to inform each other in an intuitive way, as Deep Mind have done here.



                In more tractable game environments, game theory is able to determine stable and effective policies - for instance in rock-paper-scissors, the Nash equilibrium policy (one which players will be punished for moving away from) is randomly selecting each action with 1/3 probability. Note this is not necessarily the optimal policy - that depends on the opponent's behaviour - but it is an expected stable outcome for two rational and capable opponents to arrive at.



                If you develop a rock-player-scissor learning bot using RL, and it learns this strategy through self play, then you can be relatively happy that your learning algorithm worked. That would be one way of using RL and game theory together.



                Deep Mind don't know the Nash equilibrium of Star Craft strategies, and in fact the strategies are only loosely defined in terms of low-level actions, so it is not clear whether it is possible. The analysis of strategies supplied in the blog (e.g. a "rushing" strategy) are based on observations of the game and adding a human narrative to help understand what is going on. In practice, it is the sampling of opponents each preferring a different strategy or set a particular goal in the game, that trains a single neural-network based bot that has experience of countering multiple strategies and can express actions that optimally beat any strategy that matches patterns it has learned in self-play and observes an opponent using.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 25 at 13:44









                Neil SlaterNeil Slater

                17.5k33264




                17.5k33264



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47923%2fgame-theory-in-reinforcement-learning%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High