K.gradients gives type error where both arguments are tensors Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhich type auto encoder gives best results for textArchitecture for multivariate multi-time-series model where some features are TS specific and some features are globalHow to deal with classification where all target classes are independent (keras & image recognition)

How do you write "wild blueberries flavored"?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

Did any compiler fully use 80-bit floating point?

How does the body cool itself in a stillsuit?

Noise in Eigenvalues plot

Was the pager message from Nick Fury to Captain Marvel unnecessary?

Inverse square law not accurate for non-point masses?

New Order #6: Easter Egg

What is "Lambda" in Heston's original paper on stochastic volatility models?

By what mechanism was the 2017 UK General Election called?

Where did Ptolemy compare the Earth to the distance of fixed stars?

How to make an animal which can only breed for a certain number of generations?

Why can't fire hurt Daenerys but it did to Jon Snow in season 1?

What did Turing mean when saying that "machines cannot give rise to surprises" is due to a fallacy?

Does the universe have a fixed centre of mass?

An isoperimetric-type inequality inside a cube

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

Understanding piped commands in GNU/Linux

Did John Wesley plagiarize Matthew Henry...?

Flight departed from the gate 5 min before scheduled departure time. Refund options

Vertical ranges of Column Plots in 12

Why is there so little support for joining EFTA in the British parliament?

Calculation of line of sight system gain

The test team as an enemy of development? And how can this be avoided?



K.gradients gives type error where both arguments are tensors



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhich type auto encoder gives best results for textArchitecture for multivariate multi-time-series model where some features are TS specific and some features are globalHow to deal with classification where all target classes are independent (keras & image recognition)










1












$begingroup$


On lines 142 and 143 of: https://github.com/nyck33/openai_spinup_my_implements/blob/master/continuous/mountaincar/my_ddpg_ac.py



I have:



self.get_action_gradients = K.function(inputs=[self.model.input[0], self.model.input[1], 
K.learning_phase()], outputs=[action_gradients])


Which tells me:
line 143, in build_model
K.learning_phase()], outputs=[action_gradients])



TypeError: Can not convert a list into a Tensor or Operation.



action_gradients are calculated on line 140 via:



action_gradients = K.gradients(Q_value, actions)


so I did not think that is a problem but when I take the brackets off of the output argument for K.function like so:



self.get_action_gradients = K.function(inputs=[*self.model.input, 
K.learning_phase()], outputs=action_gradients)


Now I get a slightly different error mentioning Nonetype rather than list:



rning-copied/_1my_imps/continuous/mountaincar/my_ddpg_ac.py", line 143, in build_model

TypeError: Can not convert a NoneType into a Tensor or Operation.


Printing out the Q_values and actions show they are tensors:



Q_values Tensor("q_values/BiasAdd:0", shape=(?, 1), dtype=float32) actions Tensor("actions:0", shape=(?, 1), dtype=float32)


But printing out the action_gradients and type(action_gradients) just confuses me more:



action_gradients [None]
action_gradients type <class 'list'>


calling: K.gradients() on two tensors should work shouldn't it?



This DDPG code is originally from here: https://github.com/nyck33/autonomous_quadcopter



and I am trying to adapt it for MountainCarContinuous-v0










share|improve this question











$endgroup$
















    1












    $begingroup$


    On lines 142 and 143 of: https://github.com/nyck33/openai_spinup_my_implements/blob/master/continuous/mountaincar/my_ddpg_ac.py



    I have:



    self.get_action_gradients = K.function(inputs=[self.model.input[0], self.model.input[1], 
    K.learning_phase()], outputs=[action_gradients])


    Which tells me:
    line 143, in build_model
    K.learning_phase()], outputs=[action_gradients])



    TypeError: Can not convert a list into a Tensor or Operation.



    action_gradients are calculated on line 140 via:



    action_gradients = K.gradients(Q_value, actions)


    so I did not think that is a problem but when I take the brackets off of the output argument for K.function like so:



    self.get_action_gradients = K.function(inputs=[*self.model.input, 
    K.learning_phase()], outputs=action_gradients)


    Now I get a slightly different error mentioning Nonetype rather than list:



    rning-copied/_1my_imps/continuous/mountaincar/my_ddpg_ac.py", line 143, in build_model

    TypeError: Can not convert a NoneType into a Tensor or Operation.


    Printing out the Q_values and actions show they are tensors:



    Q_values Tensor("q_values/BiasAdd:0", shape=(?, 1), dtype=float32) actions Tensor("actions:0", shape=(?, 1), dtype=float32)


    But printing out the action_gradients and type(action_gradients) just confuses me more:



    action_gradients [None]
    action_gradients type <class 'list'>


    calling: K.gradients() on two tensors should work shouldn't it?



    This DDPG code is originally from here: https://github.com/nyck33/autonomous_quadcopter



    and I am trying to adapt it for MountainCarContinuous-v0










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      On lines 142 and 143 of: https://github.com/nyck33/openai_spinup_my_implements/blob/master/continuous/mountaincar/my_ddpg_ac.py



      I have:



      self.get_action_gradients = K.function(inputs=[self.model.input[0], self.model.input[1], 
      K.learning_phase()], outputs=[action_gradients])


      Which tells me:
      line 143, in build_model
      K.learning_phase()], outputs=[action_gradients])



      TypeError: Can not convert a list into a Tensor or Operation.



      action_gradients are calculated on line 140 via:



      action_gradients = K.gradients(Q_value, actions)


      so I did not think that is a problem but when I take the brackets off of the output argument for K.function like so:



      self.get_action_gradients = K.function(inputs=[*self.model.input, 
      K.learning_phase()], outputs=action_gradients)


      Now I get a slightly different error mentioning Nonetype rather than list:



      rning-copied/_1my_imps/continuous/mountaincar/my_ddpg_ac.py", line 143, in build_model

      TypeError: Can not convert a NoneType into a Tensor or Operation.


      Printing out the Q_values and actions show they are tensors:



      Q_values Tensor("q_values/BiasAdd:0", shape=(?, 1), dtype=float32) actions Tensor("actions:0", shape=(?, 1), dtype=float32)


      But printing out the action_gradients and type(action_gradients) just confuses me more:



      action_gradients [None]
      action_gradients type <class 'list'>


      calling: K.gradients() on two tensors should work shouldn't it?



      This DDPG code is originally from here: https://github.com/nyck33/autonomous_quadcopter



      and I am trying to adapt it for MountainCarContinuous-v0










      share|improve this question











      $endgroup$




      On lines 142 and 143 of: https://github.com/nyck33/openai_spinup_my_implements/blob/master/continuous/mountaincar/my_ddpg_ac.py



      I have:



      self.get_action_gradients = K.function(inputs=[self.model.input[0], self.model.input[1], 
      K.learning_phase()], outputs=[action_gradients])


      Which tells me:
      line 143, in build_model
      K.learning_phase()], outputs=[action_gradients])



      TypeError: Can not convert a list into a Tensor or Operation.



      action_gradients are calculated on line 140 via:



      action_gradients = K.gradients(Q_value, actions)


      so I did not think that is a problem but when I take the brackets off of the output argument for K.function like so:



      self.get_action_gradients = K.function(inputs=[*self.model.input, 
      K.learning_phase()], outputs=action_gradients)


      Now I get a slightly different error mentioning Nonetype rather than list:



      rning-copied/_1my_imps/continuous/mountaincar/my_ddpg_ac.py", line 143, in build_model

      TypeError: Can not convert a NoneType into a Tensor or Operation.


      Printing out the Q_values and actions show they are tensors:



      Q_values Tensor("q_values/BiasAdd:0", shape=(?, 1), dtype=float32) actions Tensor("actions:0", shape=(?, 1), dtype=float32)


      But printing out the action_gradients and type(action_gradients) just confuses me more:



      action_gradients [None]
      action_gradients type <class 'list'>


      calling: K.gradients() on two tensors should work shouldn't it?



      This DDPG code is originally from here: https://github.com/nyck33/autonomous_quadcopter



      and I am trying to adapt it for MountainCarContinuous-v0







      keras






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 4 at 12:39







      mLstudent33

















      asked Mar 29 at 17:30









      mLstudent33mLstudent33

      507




      507




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          I am very sorry but this was a careless error. In the critic network I had different pathways for state and action, joining them right before the Q-value output layer but...



          I actually had state as the input for both pathways rather than action for the action pathway.



          So of course, if the action is not present in a network, there is no way to use the chain rule to get the dQ(s,a)/da gradient of output w.r.t. actions.



          Here is the code for clarification:



          def build_model(self):
          #lrelu = LeakyReLU(alpha=0.1)
          #Define input layers
          states = layers.Input(shape=(self.state_size,), name="states")
          actions = layers.Input(shape=(self.action_size,), name="actions")

          #Add hidden layers for state pathway
          net_states = layers.Dense(units=32, use_bias=False)(states)
          net_states = layers.BatchNormalization()(net_states)
          net_states = layers.LeakyReLU(alpha=0.1)(net_states)

          net_states = layers.Dense(units=64)(net_states)
          net_states = layers.BatchNormalization()(net_states)
          net_states = layers.LeakyReLU(alpha=0.1)(net_states)

          #hidden layers for action
          net_actions = layers.Dense(units=32)(actions) #had (states) here instead
          net_actions = layers.BatchNormalization()(net_actions)
          net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)

          net_actions = layers.Dense(units=64)(net_actions)
          net_actions = layers.BatchNormalization()(net_actions)
          net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)


          and the output to console shows that it is no longer a None type:



          action_gradients [<tf.Tensor 'gradients_3/dense_13/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]


          Edit: The code works for MtnCarContinuous-v0 environment of openai gym with a modified reward function (incremental based on car position because the final reward is so sparse), over 90 seems to be achieved in 11 episodes. I rendered so actually saw the car make it to the top on the attempts where score is over 100 below.
          Here are the results:



          episode: 0 score: -41.59034754863541 mean: -41.59 std: 0.0
          episode: 1 score: -75.87242108451743 mean: -58.73 std: 17.14
          episode: 2 score: 32.01844640263133 mean: -28.48 std: 45.01
          episode: 3 score: 132.904319261567 mean: 11.86 std: 80.02
          episode: 4 score: 125.81198290946529 mean: 34.65 std: 84.85
          episode: 5 score: 84.26163480413017 mean: 42.92 std: 79.63
          episode: 6 score: 126.89684490110164 mean: 54.92 std: 79.37
          episode: 7 score: 139.18190524840517 mean: 65.45 std: 79.3
          episode: 8 score: 100.24481691450521 mean: 69.32 std: 75.56
          episode: 9 score: 165.80286734425076 mean: 78.97 std: 77.31
          episode: 10 score: 109.29507292352991 mean: 94.05 std: 66.24
          episode: 11 score: 209.07900825070152 mean: 122.55 std: 44.84


          Source code here: https://github.com/nyck33/openai_my_implements/tree/master/continuous/mountaincar






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48228%2fk-gradients-gives-type-error-where-both-arguments-are-tensors%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            I am very sorry but this was a careless error. In the critic network I had different pathways for state and action, joining them right before the Q-value output layer but...



            I actually had state as the input for both pathways rather than action for the action pathway.



            So of course, if the action is not present in a network, there is no way to use the chain rule to get the dQ(s,a)/da gradient of output w.r.t. actions.



            Here is the code for clarification:



            def build_model(self):
            #lrelu = LeakyReLU(alpha=0.1)
            #Define input layers
            states = layers.Input(shape=(self.state_size,), name="states")
            actions = layers.Input(shape=(self.action_size,), name="actions")

            #Add hidden layers for state pathway
            net_states = layers.Dense(units=32, use_bias=False)(states)
            net_states = layers.BatchNormalization()(net_states)
            net_states = layers.LeakyReLU(alpha=0.1)(net_states)

            net_states = layers.Dense(units=64)(net_states)
            net_states = layers.BatchNormalization()(net_states)
            net_states = layers.LeakyReLU(alpha=0.1)(net_states)

            #hidden layers for action
            net_actions = layers.Dense(units=32)(actions) #had (states) here instead
            net_actions = layers.BatchNormalization()(net_actions)
            net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)

            net_actions = layers.Dense(units=64)(net_actions)
            net_actions = layers.BatchNormalization()(net_actions)
            net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)


            and the output to console shows that it is no longer a None type:



            action_gradients [<tf.Tensor 'gradients_3/dense_13/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]


            Edit: The code works for MtnCarContinuous-v0 environment of openai gym with a modified reward function (incremental based on car position because the final reward is so sparse), over 90 seems to be achieved in 11 episodes. I rendered so actually saw the car make it to the top on the attempts where score is over 100 below.
            Here are the results:



            episode: 0 score: -41.59034754863541 mean: -41.59 std: 0.0
            episode: 1 score: -75.87242108451743 mean: -58.73 std: 17.14
            episode: 2 score: 32.01844640263133 mean: -28.48 std: 45.01
            episode: 3 score: 132.904319261567 mean: 11.86 std: 80.02
            episode: 4 score: 125.81198290946529 mean: 34.65 std: 84.85
            episode: 5 score: 84.26163480413017 mean: 42.92 std: 79.63
            episode: 6 score: 126.89684490110164 mean: 54.92 std: 79.37
            episode: 7 score: 139.18190524840517 mean: 65.45 std: 79.3
            episode: 8 score: 100.24481691450521 mean: 69.32 std: 75.56
            episode: 9 score: 165.80286734425076 mean: 78.97 std: 77.31
            episode: 10 score: 109.29507292352991 mean: 94.05 std: 66.24
            episode: 11 score: 209.07900825070152 mean: 122.55 std: 44.84


            Source code here: https://github.com/nyck33/openai_my_implements/tree/master/continuous/mountaincar






            share|improve this answer











            $endgroup$

















              0












              $begingroup$

              I am very sorry but this was a careless error. In the critic network I had different pathways for state and action, joining them right before the Q-value output layer but...



              I actually had state as the input for both pathways rather than action for the action pathway.



              So of course, if the action is not present in a network, there is no way to use the chain rule to get the dQ(s,a)/da gradient of output w.r.t. actions.



              Here is the code for clarification:



              def build_model(self):
              #lrelu = LeakyReLU(alpha=0.1)
              #Define input layers
              states = layers.Input(shape=(self.state_size,), name="states")
              actions = layers.Input(shape=(self.action_size,), name="actions")

              #Add hidden layers for state pathway
              net_states = layers.Dense(units=32, use_bias=False)(states)
              net_states = layers.BatchNormalization()(net_states)
              net_states = layers.LeakyReLU(alpha=0.1)(net_states)

              net_states = layers.Dense(units=64)(net_states)
              net_states = layers.BatchNormalization()(net_states)
              net_states = layers.LeakyReLU(alpha=0.1)(net_states)

              #hidden layers for action
              net_actions = layers.Dense(units=32)(actions) #had (states) here instead
              net_actions = layers.BatchNormalization()(net_actions)
              net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)

              net_actions = layers.Dense(units=64)(net_actions)
              net_actions = layers.BatchNormalization()(net_actions)
              net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)


              and the output to console shows that it is no longer a None type:



              action_gradients [<tf.Tensor 'gradients_3/dense_13/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]


              Edit: The code works for MtnCarContinuous-v0 environment of openai gym with a modified reward function (incremental based on car position because the final reward is so sparse), over 90 seems to be achieved in 11 episodes. I rendered so actually saw the car make it to the top on the attempts where score is over 100 below.
              Here are the results:



              episode: 0 score: -41.59034754863541 mean: -41.59 std: 0.0
              episode: 1 score: -75.87242108451743 mean: -58.73 std: 17.14
              episode: 2 score: 32.01844640263133 mean: -28.48 std: 45.01
              episode: 3 score: 132.904319261567 mean: 11.86 std: 80.02
              episode: 4 score: 125.81198290946529 mean: 34.65 std: 84.85
              episode: 5 score: 84.26163480413017 mean: 42.92 std: 79.63
              episode: 6 score: 126.89684490110164 mean: 54.92 std: 79.37
              episode: 7 score: 139.18190524840517 mean: 65.45 std: 79.3
              episode: 8 score: 100.24481691450521 mean: 69.32 std: 75.56
              episode: 9 score: 165.80286734425076 mean: 78.97 std: 77.31
              episode: 10 score: 109.29507292352991 mean: 94.05 std: 66.24
              episode: 11 score: 209.07900825070152 mean: 122.55 std: 44.84


              Source code here: https://github.com/nyck33/openai_my_implements/tree/master/continuous/mountaincar






              share|improve this answer











              $endgroup$















                0












                0








                0





                $begingroup$

                I am very sorry but this was a careless error. In the critic network I had different pathways for state and action, joining them right before the Q-value output layer but...



                I actually had state as the input for both pathways rather than action for the action pathway.



                So of course, if the action is not present in a network, there is no way to use the chain rule to get the dQ(s,a)/da gradient of output w.r.t. actions.



                Here is the code for clarification:



                def build_model(self):
                #lrelu = LeakyReLU(alpha=0.1)
                #Define input layers
                states = layers.Input(shape=(self.state_size,), name="states")
                actions = layers.Input(shape=(self.action_size,), name="actions")

                #Add hidden layers for state pathway
                net_states = layers.Dense(units=32, use_bias=False)(states)
                net_states = layers.BatchNormalization()(net_states)
                net_states = layers.LeakyReLU(alpha=0.1)(net_states)

                net_states = layers.Dense(units=64)(net_states)
                net_states = layers.BatchNormalization()(net_states)
                net_states = layers.LeakyReLU(alpha=0.1)(net_states)

                #hidden layers for action
                net_actions = layers.Dense(units=32)(actions) #had (states) here instead
                net_actions = layers.BatchNormalization()(net_actions)
                net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)

                net_actions = layers.Dense(units=64)(net_actions)
                net_actions = layers.BatchNormalization()(net_actions)
                net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)


                and the output to console shows that it is no longer a None type:



                action_gradients [<tf.Tensor 'gradients_3/dense_13/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]


                Edit: The code works for MtnCarContinuous-v0 environment of openai gym with a modified reward function (incremental based on car position because the final reward is so sparse), over 90 seems to be achieved in 11 episodes. I rendered so actually saw the car make it to the top on the attempts where score is over 100 below.
                Here are the results:



                episode: 0 score: -41.59034754863541 mean: -41.59 std: 0.0
                episode: 1 score: -75.87242108451743 mean: -58.73 std: 17.14
                episode: 2 score: 32.01844640263133 mean: -28.48 std: 45.01
                episode: 3 score: 132.904319261567 mean: 11.86 std: 80.02
                episode: 4 score: 125.81198290946529 mean: 34.65 std: 84.85
                episode: 5 score: 84.26163480413017 mean: 42.92 std: 79.63
                episode: 6 score: 126.89684490110164 mean: 54.92 std: 79.37
                episode: 7 score: 139.18190524840517 mean: 65.45 std: 79.3
                episode: 8 score: 100.24481691450521 mean: 69.32 std: 75.56
                episode: 9 score: 165.80286734425076 mean: 78.97 std: 77.31
                episode: 10 score: 109.29507292352991 mean: 94.05 std: 66.24
                episode: 11 score: 209.07900825070152 mean: 122.55 std: 44.84


                Source code here: https://github.com/nyck33/openai_my_implements/tree/master/continuous/mountaincar






                share|improve this answer











                $endgroup$



                I am very sorry but this was a careless error. In the critic network I had different pathways for state and action, joining them right before the Q-value output layer but...



                I actually had state as the input for both pathways rather than action for the action pathway.



                So of course, if the action is not present in a network, there is no way to use the chain rule to get the dQ(s,a)/da gradient of output w.r.t. actions.



                Here is the code for clarification:



                def build_model(self):
                #lrelu = LeakyReLU(alpha=0.1)
                #Define input layers
                states = layers.Input(shape=(self.state_size,), name="states")
                actions = layers.Input(shape=(self.action_size,), name="actions")

                #Add hidden layers for state pathway
                net_states = layers.Dense(units=32, use_bias=False)(states)
                net_states = layers.BatchNormalization()(net_states)
                net_states = layers.LeakyReLU(alpha=0.1)(net_states)

                net_states = layers.Dense(units=64)(net_states)
                net_states = layers.BatchNormalization()(net_states)
                net_states = layers.LeakyReLU(alpha=0.1)(net_states)

                #hidden layers for action
                net_actions = layers.Dense(units=32)(actions) #had (states) here instead
                net_actions = layers.BatchNormalization()(net_actions)
                net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)

                net_actions = layers.Dense(units=64)(net_actions)
                net_actions = layers.BatchNormalization()(net_actions)
                net_actions = layers.LeakyReLU(alpha=0.1)(net_actions)


                and the output to console shows that it is no longer a None type:



                action_gradients [<tf.Tensor 'gradients_3/dense_13/MatMul_grad/MatMul:0' shape=(?, 1) dtype=float32>]


                Edit: The code works for MtnCarContinuous-v0 environment of openai gym with a modified reward function (incremental based on car position because the final reward is so sparse), over 90 seems to be achieved in 11 episodes. I rendered so actually saw the car make it to the top on the attempts where score is over 100 below.
                Here are the results:



                episode: 0 score: -41.59034754863541 mean: -41.59 std: 0.0
                episode: 1 score: -75.87242108451743 mean: -58.73 std: 17.14
                episode: 2 score: 32.01844640263133 mean: -28.48 std: 45.01
                episode: 3 score: 132.904319261567 mean: 11.86 std: 80.02
                episode: 4 score: 125.81198290946529 mean: 34.65 std: 84.85
                episode: 5 score: 84.26163480413017 mean: 42.92 std: 79.63
                episode: 6 score: 126.89684490110164 mean: 54.92 std: 79.37
                episode: 7 score: 139.18190524840517 mean: 65.45 std: 79.3
                episode: 8 score: 100.24481691450521 mean: 69.32 std: 75.56
                episode: 9 score: 165.80286734425076 mean: 78.97 std: 77.31
                episode: 10 score: 109.29507292352991 mean: 94.05 std: 66.24
                episode: 11 score: 209.07900825070152 mean: 122.55 std: 44.84


                Source code here: https://github.com/nyck33/openai_my_implements/tree/master/continuous/mountaincar







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 5 at 6:56

























                answered Apr 5 at 5:37









                mLstudent33mLstudent33

                507




                507



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48228%2fk-gradients-gives-type-error-where-both-arguments-are-tensors%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High