Histogram is extremely skewed to the leftTidying Time Intervals for Plotting a Histogram in RHistogram of some values onlyPlotting different values in pandas histogram with different colorsHistogram alternatives for two sets of data combinedFinding similarity between two histogram plotsHistogram plot with plt.hist()Can the 'bin size' in a histogram be thought of as a regularity constraint?How to best visualize data when outliers lead to lack of contrasting colors for the rest of the plot?How to add numbers to the axes of a graph?How to plot an histogram from a dictionary data?

What does chmod -u do?

Does Doodling or Improvising on the Piano Have Any Benefits?

Why does the Sun have different day lengths, but not the gas giants?

Open a doc from terminal, but not by its name

Why Shazam when there is already Superman?

Fear of getting stuck on one programming language / technology that is not used in my country

Why did the EU agree to delay the Brexit deadline?

Why should universal income be universal?

15% tax on $7.5k earnings. Is that right?

Moving brute-force search to FPGA

Non-abelian cohomologies

Picking the different solutions to the time independent Schrodinger eqaution

Why does a simple loop result in ASYNC_NETWORK_IO waits?

Why is it that I can sometimes guess the next note?

How can I write humor as character trait?

I'm the sea and the sun

Pre-mixing cryogenic fuels and using only one fuel tank

Title 53, why is it reserved?

How does the math work for Perception checks?

How to cover method return statement in Apex Class?

Why can Carol Danvers change her suit colours in the first place?

What is the English pronunciation of "pain au chocolat"?

Terse Method to Swap Lowest for Highest?

What happens if you are holding an Iron Flask with a demon inside and walk into an Antimagic Field?



Histogram is extremely skewed to the left


Tidying Time Intervals for Plotting a Histogram in RHistogram of some values onlyPlotting different values in pandas histogram with different colorsHistogram alternatives for two sets of data combinedFinding similarity between two histogram plotsHistogram plot with plt.hist()Can the 'bin size' in a histogram be thought of as a regularity constraint?How to best visualize data when outliers lead to lack of contrasting colors for the rest of the plot?How to add numbers to the axes of a graph?How to plot an histogram from a dictionary data?













1












$begingroup$


everyone.
Hope you are all okay.



I am pretty bad at visualizations in Python.
I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
So I am using very simple code:



plt.hist(df_movies.vote_count,bins=7)
plt.show()


And the output looks the following way:



enter image description here



As it is easy to notice the distribution is skewed very much to left.

I realize that I should focus on plt.xticks(), but I tried to play around and the histogram still looked awful.



Could you please give me some advices how to make it more understandable and useful to get insights.



Thanks a lot in advance!










share|improve this question









New contributor




user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    1












    $begingroup$


    everyone.
    Hope you are all okay.



    I am pretty bad at visualizations in Python.
    I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
    So I am using very simple code:



    plt.hist(df_movies.vote_count,bins=7)
    plt.show()


    And the output looks the following way:



    enter image description here



    As it is easy to notice the distribution is skewed very much to left.

    I realize that I should focus on plt.xticks(), but I tried to play around and the histogram still looked awful.



    Could you please give me some advices how to make it more understandable and useful to get insights.



    Thanks a lot in advance!










    share|improve this question









    New contributor




    user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      1












      1








      1





      $begingroup$


      everyone.
      Hope you are all okay.



      I am pretty bad at visualizations in Python.
      I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
      So I am using very simple code:



      plt.hist(df_movies.vote_count,bins=7)
      plt.show()


      And the output looks the following way:



      enter image description here



      As it is easy to notice the distribution is skewed very much to left.

      I realize that I should focus on plt.xticks(), but I tried to play around and the histogram still looked awful.



      Could you please give me some advices how to make it more understandable and useful to get insights.



      Thanks a lot in advance!










      share|improve this question









      New contributor




      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      everyone.
      Hope you are all okay.



      I am pretty bad at visualizations in Python.
      I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
      So I am using very simple code:



      plt.hist(df_movies.vote_count,bins=7)
      plt.show()


      And the output looks the following way:



      enter image description here



      As it is easy to notice the distribution is skewed very much to left.

      I realize that I should focus on plt.xticks(), but I tried to play around and the histogram still looked awful.



      Could you please give me some advices how to make it more understandable and useful to get insights.



      Thanks a lot in advance!







      python visualization matplotlib






      share|improve this question









      New contributor




      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Mar 19 at 21:45







      user641597













      New contributor




      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 19 at 10:46









      user641597user641597

      83




      83




      New contributor




      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      user641597 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          I think there are a few "easy wins" here.



          1. You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data

          2. You can define the range of your bins. For example you could set the range with a list of two entries range = [0, 5000] as an additional parameter

          3. You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.

          4. You can try a cumulative and normed plot cumulative = True, normed = True this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.

          I hope this helps and let me know if something is unclear






          share|improve this answer









          $endgroup$












            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            user641597 is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47604%2fhistogram-is-extremely-skewed-to-the-left%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            I think there are a few "easy wins" here.



            1. You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data

            2. You can define the range of your bins. For example you could set the range with a list of two entries range = [0, 5000] as an additional parameter

            3. You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.

            4. You can try a cumulative and normed plot cumulative = True, normed = True this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.

            I hope this helps and let me know if something is unclear






            share|improve this answer









            $endgroup$

















              2












              $begingroup$

              I think there are a few "easy wins" here.



              1. You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data

              2. You can define the range of your bins. For example you could set the range with a list of two entries range = [0, 5000] as an additional parameter

              3. You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.

              4. You can try a cumulative and normed plot cumulative = True, normed = True this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.

              I hope this helps and let me know if something is unclear






              share|improve this answer









              $endgroup$















                2












                2








                2





                $begingroup$

                I think there are a few "easy wins" here.



                1. You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data

                2. You can define the range of your bins. For example you could set the range with a list of two entries range = [0, 5000] as an additional parameter

                3. You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.

                4. You can try a cumulative and normed plot cumulative = True, normed = True this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.

                I hope this helps and let me know if something is unclear






                share|improve this answer









                $endgroup$



                I think there are a few "easy wins" here.



                1. You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data

                2. You can define the range of your bins. For example you could set the range with a list of two entries range = [0, 5000] as an additional parameter

                3. You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.

                4. You can try a cumulative and normed plot cumulative = True, normed = True this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.

                I hope this helps and let me know if something is unclear







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 19 at 12:44









                El BurroEl Burro

                455311




                455311




















                    user641597 is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    user641597 is a new contributor. Be nice, and check out our Code of Conduct.












                    user641597 is a new contributor. Be nice, and check out our Code of Conduct.











                    user641597 is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47604%2fhistogram-is-extremely-skewed-to-the-left%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High