How to penalize for empty fields in a DataFrame? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe

Road tyres vs "Street" tyres for charity ride on MTB Tandem

Match Roman Numerals

Do working physicists consider Newtonian mechanics to be "falsified"?

Is there a writing software that you can sort scenes like slides in PowerPoint?

Why is superheterodyning better than direct conversion?

How should I replace vector<uint8_t>::const_iterator in an API?

Why not take a picture of a closer black hole?

How do I add random spotting to the same face in cycles?

How do you keep chess fun when your opponent constantly beats you?

Why did all the guest students take carriages to the Yule Ball?

Did the new image of black hole confirm the general theory of relativity?

He got a vote 80% that of Emmanuel Macron’s

What information about me do stores get via my credit card?

How to test the equality of two Pearson correlation coefficients computed from the same sample?

How to grep and cut numbers from a file and sum them

First use of “packing” as in carrying a gun

Didn't get enough time to take a Coding Test - what to do now?

Can the prologue be the backstory of your main character?

How can I protect witches in combat who wear limited clothing?

Install many applications using one command

What's the point in a preamp?

Do warforged have souls?

How to pronounce 1ターン?

Does Parliament hold absolute power in the UK?



How to penalize for empty fields in a DataFrame?



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsPandas: access fields within field in a DataFrameHow duplicated items can be deleted from dataframe in pandaslengthy criteria in dataframe selectorResampling pandas Dataframe keeping other columnsHow to group this dataframe in python?Pandas DataFrame Rollup ErrorDataframe size is null?Pivot reshape dataframeHow to get a dataframe values in one single column for the following dataset?Manipulating multi-indices for a pandas dataframe










2












$begingroup$


I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










share|improve this question









$endgroup$
















    2












    $begingroup$


    I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



    So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










    share|improve this question









    $endgroup$














      2












      2








      2





      $begingroup$


      I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



      So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.










      share|improve this question









      $endgroup$




      I have to calculate the consistency of racing car drivers during the whole season. My DataFrame consists of 10 columns (10 circuit names) and for each of those columns I have the standard deviation in lap time the driver posted in that circuit. In other words, how consistent the driver is from lap to lap. In races the driver did not finish the field is blank.



      So far I have calculated their average season consistency by averaging all 10 columns. However, not finishing a race should affect a driver's consistency negatively and I do not know how to implement that.







      pandas data






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 31 at 14:02









      jatrp5jatrp5

      111




      111




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          This heavily depends on the domain knowledge. A general approach would be to place



          1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


          2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


          3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


          No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



          1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


          2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            This heavily depends on the domain knowledge. A general approach would be to place



            1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


            2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


            3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


            No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



            1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


            2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






            share|improve this answer











            $endgroup$

















              1












              $begingroup$

              This heavily depends on the domain knowledge. A general approach would be to place



              1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


              2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


              3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


              No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



              1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


              2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






              share|improve this answer











              $endgroup$















                1












                1








                1





                $begingroup$

                This heavily depends on the domain knowledge. A general approach would be to place



                1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


                2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


                3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


                No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



                1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


                2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.






                share|improve this answer











                $endgroup$



                This heavily depends on the domain knowledge. A general approach would be to place



                1. A multiplicative of the worst or average consistency at each circuit $c$, i.e. $(1 + m)textmax(sigma_c)$ or $(1 + m)textavg(sigma_c)$ respectively, for the null values at that circuit, or


                2. A multiplicative of the worst or average consistency of each driver $d$, i.e. $(1 + m)textmax(sigma_d)$ or $(1 + m)textavg(sigma_d)$ respectively, for their unfinished races, or


                3. A multiplicative of average of driver and circuit average consistencies, i.e. $(1 + m)[textavg(sigma_d) + textavg(sigma_c)]/2$, for unfinished race of driver $d$ at circuit $c$, or some other combinations.


                No matter which approach to choose, the choice of coefficient $m$ affects the final ranking and could be determined either



                1. Subjectively by looking at the rankings from an expert point of view and selecting the one that makes more sense, or


                2. By trying a range of values like $m in -0.2, -0.1, 0, 0.1, 0.2, .., 0.5$ and averaging the consistencies $sigma_d$ or rankings $R_d$ for each driver $d$. An advantage of this approach would be that when rank of a driver has a low variance over different values of $m$, it implies that driver's rank is insensitive to the choice of $m$, i.e. it is less controversial, and when rank changes a lot with different choices of $m$, the average rank is more controversial.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 31 at 18:51

























                answered Mar 31 at 15:29









                EsmailianEsmailian

                3,191320




                3,191320



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48293%2fhow-to-penalize-for-empty-fields-in-a-dataframe%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High