Histogram is extremely skewed to the leftTidying Time Intervals for Plotting a Histogram in RHistogram of some values onlyPlotting different values in pandas histogram with different colorsHistogram alternatives for two sets of data combinedFinding similarity between two histogram plotsHistogram plot with plt.hist()Can the 'bin size' in a histogram be thought of as a regularity constraint?How to best visualize data when outliers lead to lack of contrasting colors for the rest of the plot?How to add numbers to the axes of a graph?How to plot an histogram from a dictionary data?
What does chmod -u do?
Does Doodling or Improvising on the Piano Have Any Benefits?
Why does the Sun have different day lengths, but not the gas giants?
Open a doc from terminal, but not by its name
Why Shazam when there is already Superman?
Fear of getting stuck on one programming language / technology that is not used in my country
Why did the EU agree to delay the Brexit deadline?
Why should universal income be universal?
15% tax on $7.5k earnings. Is that right?
Moving brute-force search to FPGA
Non-abelian cohomologies
Picking the different solutions to the time independent Schrodinger eqaution
Why does a simple loop result in ASYNC_NETWORK_IO waits?
Why is it that I can sometimes guess the next note?
How can I write humor as character trait?
I'm the sea and the sun
Pre-mixing cryogenic fuels and using only one fuel tank
Title 53, why is it reserved?
How does the math work for Perception checks?
How to cover method return statement in Apex Class?
Why can Carol Danvers change her suit colours in the first place?
What is the English pronunciation of "pain au chocolat"?
Terse Method to Swap Lowest for Highest?
What happens if you are holding an Iron Flask with a demon inside and walk into an Antimagic Field?
Histogram is extremely skewed to the left
Tidying Time Intervals for Plotting a Histogram in RHistogram of some values onlyPlotting different values in pandas histogram with different colorsHistogram alternatives for two sets of data combinedFinding similarity between two histogram plotsHistogram plot with plt.hist()Can the 'bin size' in a histogram be thought of as a regularity constraint?How to best visualize data when outliers lead to lack of contrasting colors for the rest of the plot?How to add numbers to the axes of a graph?How to plot an histogram from a dictionary data?
$begingroup$
everyone.
Hope you are all okay.
I am pretty bad at visualizations in Python.
I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
So I am using very simple code:
plt.hist(df_movies.vote_count,bins=7)
plt.show()
And the output looks the following way:
As it is easy to notice the distribution is skewed very much to left.
I realize that I should focus on plt.xticks()
, but I tried to play around and the histogram still looked awful.
Could you please give me some advices how to make it more understandable and useful to get insights.
Thanks a lot in advance!
python visualization matplotlib
New contributor
$endgroup$
add a comment |
$begingroup$
everyone.
Hope you are all okay.
I am pretty bad at visualizations in Python.
I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
So I am using very simple code:
plt.hist(df_movies.vote_count,bins=7)
plt.show()
And the output looks the following way:
As it is easy to notice the distribution is skewed very much to left.
I realize that I should focus on plt.xticks()
, but I tried to play around and the histogram still looked awful.
Could you please give me some advices how to make it more understandable and useful to get insights.
Thanks a lot in advance!
python visualization matplotlib
New contributor
$endgroup$
add a comment |
$begingroup$
everyone.
Hope you are all okay.
I am pretty bad at visualizations in Python.
I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
So I am using very simple code:
plt.hist(df_movies.vote_count,bins=7)
plt.show()
And the output looks the following way:
As it is easy to notice the distribution is skewed very much to left.
I realize that I should focus on plt.xticks()
, but I tried to play around and the histogram still looked awful.
Could you please give me some advices how to make it more understandable and useful to get insights.
Thanks a lot in advance!
python visualization matplotlib
New contributor
$endgroup$
everyone.
Hope you are all okay.
I am pretty bad at visualizations in Python.
I am working on a movie dataset and now I want to understand how values of variable 'vote_count' for every movie in the dataset are distributed.
So I am using very simple code:
plt.hist(df_movies.vote_count,bins=7)
plt.show()
And the output looks the following way:
As it is easy to notice the distribution is skewed very much to left.
I realize that I should focus on plt.xticks()
, but I tried to play around and the histogram still looked awful.
Could you please give me some advices how to make it more understandable and useful to get insights.
Thanks a lot in advance!
python visualization matplotlib
python visualization matplotlib
New contributor
New contributor
edited Mar 19 at 21:45
user641597
New contributor
asked Mar 19 at 10:46
user641597user641597
83
83
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I think there are a few "easy wins" here.
- You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data
- You can define the range of your bins. For example you could set the range with a list of two entries
range = [0, 5000]
as an additional parameter - You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.
- You can try a cumulative and normed plot
cumulative = True, normed = True
this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.
I hope this helps and let me know if something is unclear
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
user641597 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47604%2fhistogram-is-extremely-skewed-to-the-left%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think there are a few "easy wins" here.
- You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data
- You can define the range of your bins. For example you could set the range with a list of two entries
range = [0, 5000]
as an additional parameter - You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.
- You can try a cumulative and normed plot
cumulative = True, normed = True
this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.
I hope this helps and let me know if something is unclear
$endgroup$
add a comment |
$begingroup$
I think there are a few "easy wins" here.
- You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data
- You can define the range of your bins. For example you could set the range with a list of two entries
range = [0, 5000]
as an additional parameter - You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.
- You can try a cumulative and normed plot
cumulative = True, normed = True
this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.
I hope this helps and let me know if something is unclear
$endgroup$
add a comment |
$begingroup$
I think there are a few "easy wins" here.
- You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data
- You can define the range of your bins. For example you could set the range with a list of two entries
range = [0, 5000]
as an additional parameter - You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.
- You can try a cumulative and normed plot
cumulative = True, normed = True
this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.
I hope this helps and let me know if something is unclear
$endgroup$
I think there are a few "easy wins" here.
- You might add more bins - you are already using the bin setting. Just add something high like 100 or even 1000 to get a first feeling for the data
- You can define the range of your bins. For example you could set the range with a list of two entries
range = [0, 5000]
as an additional parameter - You can consider not showing all data- maybe movies with a few ratings or movies with a lot are not what you focus on. You can use the range argument from point two to limit the plot to the region interesting for you.
- You can try a cumulative and normed plot
cumulative = True, normed = True
this can be very useful to get a better estimate on statements like "80% of movies have less than 100 ratings" - for cumulative plots I recommend a very large number of bins.
I hope this helps and let me know if something is unclear
answered Mar 19 at 12:44
El BurroEl Burro
455311
455311
add a comment |
add a comment |
user641597 is a new contributor. Be nice, and check out our Code of Conduct.
user641597 is a new contributor. Be nice, and check out our Code of Conduct.
user641597 is a new contributor. Be nice, and check out our Code of Conduct.
user641597 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47604%2fhistogram-is-extremely-skewed-to-the-left%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown