Is a distribution that is normal, but highly skewed, considered Gaussian? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Is normality testing 'essentially useless'?What is the difference between zero-inflated and hurdle models?If my histogram shows a bell-shaped curve, can I say my data is normally distributed?What would the distribution of time spent per day on a given site look like?How do I identify the “Long Tail” portion of my distribution?Skewed but bell-shaped still considered as normal distribution for ANOVA?Which Distribution Does the Data Point Belong to?Skewness - why is this distribution right skewed?log transform vs. resamplingIs it valid to remove the overhead of finding the current time for a computer program this way?Histograms for severely skewed dataWhat would the distribution of time spent per day on a given site look like?Distinguish between underlying Distribution and data shape in data transforming?Using bootstrap to estimate the 95th percentile and confidence interval for skewed data

How do you keep chess fun when your opponent constantly beats you?

Can I visit the Trinity College (Cambridge) library and see some of their rare books

Do warforged have souls?

different output for groups and groups USERNAME after adding a username to a group

How to read αἱμύλιος or when to aspirate

Free operad over a monoid object

Simulating Exploding Dice

How to support a colleague who finds meetings extremely tiring?

Can withdrawing asylum be illegal?

Is it ethical to upload a automatically generated paper to a non peer-reviewed site as part of a larger research?

Is an up-to-date browser secure on an out-of-date OS?

What can I do if neighbor is blocking my solar panels intentionally?

What does Linus Torvalds mean when he says that Git "never ever" tracks a file?

Working through Single Responsibility Principle in Python when Calls are Expensive

What other Star Trek series did the main TNG cast show up in?

Categorical vs continuous feature selection/engineering

What happens to a Warlock's expended Spell Slots when they gain a Level?

Why did Peik Lin say, "I'm not an animal"?

How do spell lists change if the party levels up without taking a long rest?

Solving overdetermined system by QR decomposition

Can each chord in a progression create its own key?

Nested ellipses in tikzpicture: Chomsky hierarchy

Did the new image of black hole confirm the general theory of relativity?

Match Roman Numerals



Is a distribution that is normal, but highly skewed, considered Gaussian?



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Is normality testing 'essentially useless'?What is the difference between zero-inflated and hurdle models?If my histogram shows a bell-shaped curve, can I say my data is normally distributed?What would the distribution of time spent per day on a given site look like?How do I identify the “Long Tail” portion of my distribution?Skewed but bell-shaped still considered as normal distribution for ANOVA?Which Distribution Does the Data Point Belong to?Skewness - why is this distribution right skewed?log transform vs. resamplingIs it valid to remove the overhead of finding the current time for a computer program this way?Histograms for severely skewed dataWhat would the distribution of time spent per day on a given site look like?Distinguish between underlying Distribution and data shape in data transforming?Using bootstrap to estimate the 95th percentile and confidence interval for skewed data



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








12












$begingroup$


I have this question: What do you think the distribution of time spent per day on YouTube looks like?



My answer is that it is probably normally distributed and highly left skewed. I expect there is one mode where most users spend around some average time and then a long right tail since some users are overwhelming power users.



Is that a fair answer? Is there a better word for that distribution?










share|cite|improve this question











$endgroup$







  • 4




    $begingroup$
    As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
    $endgroup$
    – Nick Cox
    Mar 31 at 6:42







  • 1




    $begingroup$
    Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
    $endgroup$
    – innisfree
    Apr 1 at 7:08







  • 5




    $begingroup$
    "Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
    $endgroup$
    – Michael Hardy
    Apr 2 at 1:47










  • $begingroup$
    I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:39







  • 2




    $begingroup$
    also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:53


















12












$begingroup$


I have this question: What do you think the distribution of time spent per day on YouTube looks like?



My answer is that it is probably normally distributed and highly left skewed. I expect there is one mode where most users spend around some average time and then a long right tail since some users are overwhelming power users.



Is that a fair answer? Is there a better word for that distribution?










share|cite|improve this question











$endgroup$







  • 4




    $begingroup$
    As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
    $endgroup$
    – Nick Cox
    Mar 31 at 6:42







  • 1




    $begingroup$
    Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
    $endgroup$
    – innisfree
    Apr 1 at 7:08







  • 5




    $begingroup$
    "Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
    $endgroup$
    – Michael Hardy
    Apr 2 at 1:47










  • $begingroup$
    I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:39







  • 2




    $begingroup$
    also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:53














12












12








12


2



$begingroup$


I have this question: What do you think the distribution of time spent per day on YouTube looks like?



My answer is that it is probably normally distributed and highly left skewed. I expect there is one mode where most users spend around some average time and then a long right tail since some users are overwhelming power users.



Is that a fair answer? Is there a better word for that distribution?










share|cite|improve this question











$endgroup$




I have this question: What do you think the distribution of time spent per day on YouTube looks like?



My answer is that it is probably normally distributed and highly left skewed. I expect there is one mode where most users spend around some average time and then a long right tail since some users are overwhelming power users.



Is that a fair answer? Is there a better word for that distribution?







distributions normal-distribution skewness skew-normal






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Mar 31 at 6:46









Nick Cox

39.3k587131




39.3k587131










asked Mar 30 at 19:14









CauderCauder

10318




10318







  • 4




    $begingroup$
    As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
    $endgroup$
    – Nick Cox
    Mar 31 at 6:42







  • 1




    $begingroup$
    Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
    $endgroup$
    – innisfree
    Apr 1 at 7:08







  • 5




    $begingroup$
    "Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
    $endgroup$
    – Michael Hardy
    Apr 2 at 1:47










  • $begingroup$
    I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:39







  • 2




    $begingroup$
    also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:53













  • 4




    $begingroup$
    As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
    $endgroup$
    – Nick Cox
    Mar 31 at 6:42







  • 1




    $begingroup$
    Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
    $endgroup$
    – innisfree
    Apr 1 at 7:08







  • 5




    $begingroup$
    "Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
    $endgroup$
    – Michael Hardy
    Apr 2 at 1:47










  • $begingroup$
    I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:39







  • 2




    $begingroup$
    also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
    $endgroup$
    – Martijn Weterings
    Apr 2 at 12:53








4




4




$begingroup$
As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
$endgroup$
– Nick Cox
Mar 31 at 6:42





$begingroup$
As some answers mention but do not emphasise, skewness is named informally for the longer tail if there is one, so right-skewed if a longer right tail. Left and right as used in this context both presuppose a display following a convention that magnitude is shown on the hoirizontal axis. If that sounds too obvious, consider displays in the Earth and environmental sciences in which the magnitude is height or depth and shown vertically. Small print: some measures of skewness can be zero even if a distribution is skewed geometrically.
$endgroup$
– Nick Cox
Mar 31 at 6:42





1




1




$begingroup$
Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
$endgroup$
– innisfree
Apr 1 at 7:08





$begingroup$
Total time per day for all users? or time per day per person? If the latter, then surely there's a moderately big spike at 0, in which case you probably need a 'spike and slab' style distribution with a Dirac delta at 0.
$endgroup$
– innisfree
Apr 1 at 7:08





5




5




$begingroup$
"Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
$endgroup$
– Michael Hardy
Apr 2 at 1:47




$begingroup$
"Normal" is synonymous with "Gaussian", and Gaussian distributions, also called normal distributions, are not skewed.
$endgroup$
– Michael Hardy
Apr 2 at 1:47












$begingroup$
I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
$endgroup$
– Martijn Weterings
Apr 2 at 12:39





$begingroup$
I find the question in the title much different from the question in the body text. Or at least the title is very confusing. No distribution is 'normal but highly skewed' that's a contradiction. Also, the Gaussian distribution is very well defined $f(x) = frac1sqrt2pisigma^2 textexpleft( - frac(x-mu)^22sigma^2right)$ and not at all like the distribution of time spent per day on YouTube. So the answer to the question in the title is a big no.
$endgroup$
– Martijn Weterings
Apr 2 at 12:39





2




2




$begingroup$
also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
$endgroup$
– Martijn Weterings
Apr 2 at 12:53





$begingroup$
also, the question at the end 'is there a better word for that distribution?' is very vague or broad. The information seems to be only 'one mode' and 'a long right tail' (the part 'probably normally distributed' makes no sense). There can be many distributions that satisfy these conditions. It is amazing that this question attracts more than ten answers and at least as many proposals for the alternative distribution before we actually try to clarify the question (there isn't even data).
$endgroup$
– Martijn Weterings
Apr 2 at 12:53











11 Answers
11






active

oldest

votes


















13












$begingroup$

A fraction per day is certainly not negative. This rules out the normal distribution, which has probability mass over the entire real axis - in particular over the negative half.



Power law distributions are often used to model things like income distributions, sizes of cities etc. They are nonnegative and typically highly skewed. These would be the first I would try in modeling time spent watching YouTube. (Or monitoring CrossValidated questions.)



More information on power laws can be found here or here, or in our power-law tag.






share|cite|improve this answer









$endgroup$








  • 14




    $begingroup$
    You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
    $endgroup$
    – Matt Krause
    Mar 30 at 22:26






  • 2




    $begingroup$
    @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
    $endgroup$
    – Tomáš Kafka
    Apr 1 at 12:26










  • $begingroup$
    @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
    $endgroup$
    – Stephan Kolassa
    Apr 1 at 15:28


















43












$begingroup$

A distribution that is normal is not highly skewed. That is a contradiction. Normally distributed variables have skew = 0.






share|cite|improve this answer









$endgroup$








  • 1




    $begingroup$
    What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
    $endgroup$
    – Cauder
    Mar 30 at 19:21






  • 13




    $begingroup$
    Unimodal and skewed is as close as I can come...
    $endgroup$
    – jbowman
    Mar 30 at 19:27






  • 9




    $begingroup$
    As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
    $endgroup$
    – Cauder
    Mar 30 at 19:30






  • 6




    $begingroup$
    Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
    $endgroup$
    – gung
    Mar 31 at 2:14










  • $begingroup$
    When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
    $endgroup$
    – Carl Witthoft
    Apr 1 at 18:03


















19












$begingroup$

If it has long right tail, then it's right skewed.



enter image description here



It can't be a normal distribution since skew !=0, it's perhaps a unimodal skew normal distribution:



https://en.wikipedia.org/wiki/Skew_normal_distribution






share|cite|improve this answer









$endgroup$




















    13












    $begingroup$

    It could be a log-normal distribution. As mentioned here:




    Users' dwell time on online articles (jokes, news etc.) follows a log-normal distribution.




    The reference given is: Yin, Peifeng; Luo, Ping; Lee, Wang-Chien; Wang, Min (2013). Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. ACM International Conference on KDD.






    share|cite|improve this answer









    $endgroup$




















      6












      $begingroup$

      The gamma distribution could be a good candidate to describe this kind of distribution over nonnegative, right-skewed data. See the green line in the image here:
      https://en.m.wikipedia.org/wiki/Gamma_distribution






      share|cite|improve this answer









      $endgroup$




















        6












        $begingroup$

        "Is there a better word for that distribution?"



        There's a worthwhile distinction here between using words to describe the properties of the distribution, versus trying to find a "name" for the distribution so that you can identify it as (approximately) an instance of a particular standard distribution: one for which a formula or statistical tables might exist for its distribution function, and for which you could estimate its parameters. In this latter case, you are likely using the named distribution, e.g. "normal/Gaussian" (the two terms are generally synonymous), as a model that captures some of the key features of your data, rather than claiming the population your data is drawn from exactly follows that theoretical distribution. To slightly misquote George Box, all models are "wrong", but some are useful. If you are thinking about the modelling approach, it is worth considering what features you want to incorporate and how complicated or parsimonious you want your model to be.



        Being positively skewed is an example of describing a property that the distribution has, but doesn't come close to specifying which off-the-shelf distribution is "the" appropriate model. It does rule out some candidates, for example the Gaussian (i.e. normal) distribution has zero skew so will not be appropriate to model your data if the skew is an important feature. There may be other properties of the data that are important to you too, e.g. that it's unimodal (has just one peak) or that it is bounded between 0 and 24 hours (or between 0 and 1, if you are writing it as a fraction of the day), or that there is a probability mass concentrated at zero (since there are people who do not watch youtube at all on a given day). You may also be interested in other properties like the kurtosis. And it is worth bearing in mind that even if your distribution had a "hump" or "bell-curve" shape and had zero or near-zero skew, it doesn't automatically follow that the normal distribution is "correct" for it! On the other hand, even if the population your data is drawn from actually did follow a particular distribution precisely, due to sampling error your dataset may not quite resemble it. Small data sets are likely to be "noisy", and it may be unclear whether certain features you can see, e.g. additional small humps or asymmetric tails, are properties of the underlying population the data was drawn from (and perhaps therefore ought to be incorporated in your model) or whether they are just artefacts from your particular sample (and for modelling purposes should be ignored). If you have a small data set and the skew is close to zero, then it is even plausible the underlying distribution is actually symmetric. The larger your data set and the larger the skewness, the less plausible this becomes — but while you could perform a significance test to see how convincing is the evidence your data provides for skewness in the population it was drawn from, this may be missing the point as to whether a normal (or other zero skew) distribution is appropriate as a model ...



        Which properties of the data really matter for the purposes you are intending to model it? Note that if the skew is reasonably small and you do not care very much about it, even if the underlying population is genuinely skewed, then you might still find the normal distribution a useful model to approximate this true distribution of watching times. But you should check that this doesn't end up making silly predictions. Because a normal distribution has no highest or lowest possible value, then although extremely high or low values become increasingly unlikely, you will always find that your model predicts there is some probability of watching for a negative number of hours per day, or more than 24 hours. This gets more problematic for you if the predicted probability of such impossible events becomes high. A symmetric distribution like the normal will predict that as many people will watch for lengths of time more than e.g. 50% above the mean, as watch for less than 50% below the mean. If watching times are very skewed, then this kind of prediction may also be so implausible as to be silly, and give you misleading results if you are taking the results of your model and using them as inputs for some other purpose (for instance, you're running a simulation of watching times in order to calculate optimal advertisement scheduling). If the skewness is so noteworthy you want to capture it as part of your model, then the skew normal distribution may be more appropriate. If you want to capture both skewness and kurtosis, then consider the skewed t. If you want to incorporate the physically possible upper and lower bounds, then consider using the truncated versions of these distributions. Many other probability distributions exist that can be skewed and unimodal (for appropriate parameter choices) such as the F or gamma distributions, and again you can truncate these so they do not predict impossibly high watching times. A beta distribution may be a good choice if you are modelling the fraction of the day spent watching, as this is always bounded between 0 and 1 without further truncation being necessary. If you want to incorporate the concentration of probability at exactly zero due to non-watchers, then consider building in a hurdle model.



        But at the point you are trying to throw in every feature you can identify from your data, and build an ever more sophisticated model, perhaps you should ask yourself why you are doing this? Would there be an advantage to a simpler model, for example it being easier to work with mathematically or having fewer parameters to estimate? If you are concerned that such simplification will leave you unable to capture all of the properties of interest to you, it may well be that no "off-the-shelf" distribution does quite what you want. However, we are not restricted to working with named distributions whose mathematical properties have been elucidated previously. Instead, consider using your data to construct an empirical distribution function. This will capture all the behaviour that was present in your data, but you can no longer give it a name like "normal" or "gamma", nor can you apply mathematical properties that pertain only to a particular distribution. For instance, the "95% of the data lies within 1.96 standard deviations of the mean" rule is for normally distributed data and may not apply to your distribution; though note that some rules apply to all distributions, e.g. Chebyshev's inequality guarantees at least 75% of your data must lie within two standard deviations of the mean, regardless of the skew. Unfortunately the empirical distribution will also inherit all those properties of your data set arising purely by sampling error, not just those possessed by the underlying population, so you may find a histogram of your empirical distribution has some humps and dips that the population itself does not. You may want to investigate smoothed empirical distribution functions, or better yet, increasing your sample size.



        In summary: although the normal distribution has zero skew, the fact your data are skewed doesn't rule out the normal distribution as a useful model, though it does suggest some other distribution may be more appropriate. You should consider other properties of the data when choosing your model, besides the skew, and consider too the purposes you are going to use the model for. It's safe to say that your true population of watching times does not exactly follow some famous, named distribution, but this does not mean such a distribution is doomed to be useless as a model. However, for some purposes you may prefer to just use the empirical distribution itself, rather than try fitting a standard distribution to it.






        share|cite|improve this answer











        $endgroup$




















          4












          $begingroup$

          "Normal" and "Gaussian" mean exactly the same thing. As other answers explain, the distribution you're talking about is not normal/Gaussian, because that distribution assigns probabilities to every value on the real line, whereas your distribution only exists between $0$ and $24$.






          share|cite|improve this answer









          $endgroup$




















            3












            $begingroup$

            In the case at hand, since the time spent per day is bound from $0$ to $1$ (if quantified as a fraction of the day), distributions that are unbounded above (e.g. Pareto, skew-normal, Gamma, log-normal) won't work, but Beta would.






            share|cite|improve this answer









            $endgroup$




















              2












              $begingroup$

              How about a hurdle model?



              A hurdle model has two parts. The first is Bernoulli experiment that determines whether you use YouTube at all. If you don't, then your usage time is obviously zero and you're done. If you do, you "pass that hurdle", then the usage time comes from some other strictly positive distribution.



              A closely related concept are zero-inflated models. These are meant to deal with a situation where we observe a bunch of zeros, but can't distinguish between always-zeros and sometimes-zeros. For example, consider the number of cigarettes that a person smokes each day. For non-smokers, that number is always zero, but some smokers may not smoke on a given day (out of cigarettes? on a long flight?). Unlike the hurdle model, the "smoker" distribution here should include zero, but these counts are 'inflated' by the non-smokers' contribution too.






              share|cite|improve this answer









              $endgroup$




















                0












                $begingroup$

                If the distribution is indeed a 'subset' of the normal distribution, you should considder a truncated model. Widely used in this context is the family of TOBIT models.

                They essentialy suggest a pdf with a (positive) probability mass at 0 and then a 'cut of part of the normal distribution' for positive values.

                I will refrain from typing the formula here and rather refere you to the Wikipedia Article: https://en.wikipedia.org/wiki/Tobit_model






                share|cite|improve this answer









                $endgroup$




















                  -3












                  $begingroup$

                  Normal distributions are by definition non-skewed, so you can't have both things. If the distribution is left-skewed, then it cannot be Gaussian. You'll have to pick a different one! The closest thing to your request I can think of is this:



                  https://en.wikipedia.org/wiki/Skew_normal_distribution






                  share|cite|improve this answer









                  $endgroup$








                  • 3




                    $begingroup$
                    I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                    $endgroup$
                    – Nick Cox
                    Apr 2 at 9:48










                  • $begingroup$
                    It summarizes many of them in a straight-forward three-line response
                    $endgroup$
                    – David
                    Apr 2 at 11:46






                  • 3




                    $begingroup$
                    Sorry, but that's still repetition.
                    $endgroup$
                    – Nick Cox
                    Apr 2 at 12:52










                  • $begingroup$
                    OK... who cares?
                    $endgroup$
                    – David
                    Apr 2 at 14:06






                  • 2




                    $begingroup$
                    Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                    $endgroup$
                    – Nick Cox
                    Apr 2 at 14:24










                  protected by gung Apr 2 at 13:16



                  Thank you for your interest in this question.
                  Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                  Would you like to answer one of these unanswered questions instead?














                  11 Answers
                  11






                  active

                  oldest

                  votes








                  11 Answers
                  11






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  13












                  $begingroup$

                  A fraction per day is certainly not negative. This rules out the normal distribution, which has probability mass over the entire real axis - in particular over the negative half.



                  Power law distributions are often used to model things like income distributions, sizes of cities etc. They are nonnegative and typically highly skewed. These would be the first I would try in modeling time spent watching YouTube. (Or monitoring CrossValidated questions.)



                  More information on power laws can be found here or here, or in our power-law tag.






                  share|cite|improve this answer









                  $endgroup$








                  • 14




                    $begingroup$
                    You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                    $endgroup$
                    – Matt Krause
                    Mar 30 at 22:26






                  • 2




                    $begingroup$
                    @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                    $endgroup$
                    – Tomáš Kafka
                    Apr 1 at 12:26










                  • $begingroup$
                    @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                    $endgroup$
                    – Stephan Kolassa
                    Apr 1 at 15:28















                  13












                  $begingroup$

                  A fraction per day is certainly not negative. This rules out the normal distribution, which has probability mass over the entire real axis - in particular over the negative half.



                  Power law distributions are often used to model things like income distributions, sizes of cities etc. They are nonnegative and typically highly skewed. These would be the first I would try in modeling time spent watching YouTube. (Or monitoring CrossValidated questions.)



                  More information on power laws can be found here or here, or in our power-law tag.






                  share|cite|improve this answer









                  $endgroup$








                  • 14




                    $begingroup$
                    You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                    $endgroup$
                    – Matt Krause
                    Mar 30 at 22:26






                  • 2




                    $begingroup$
                    @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                    $endgroup$
                    – Tomáš Kafka
                    Apr 1 at 12:26










                  • $begingroup$
                    @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                    $endgroup$
                    – Stephan Kolassa
                    Apr 1 at 15:28













                  13












                  13








                  13





                  $begingroup$

                  A fraction per day is certainly not negative. This rules out the normal distribution, which has probability mass over the entire real axis - in particular over the negative half.



                  Power law distributions are often used to model things like income distributions, sizes of cities etc. They are nonnegative and typically highly skewed. These would be the first I would try in modeling time spent watching YouTube. (Or monitoring CrossValidated questions.)



                  More information on power laws can be found here or here, or in our power-law tag.






                  share|cite|improve this answer









                  $endgroup$



                  A fraction per day is certainly not negative. This rules out the normal distribution, which has probability mass over the entire real axis - in particular over the negative half.



                  Power law distributions are often used to model things like income distributions, sizes of cities etc. They are nonnegative and typically highly skewed. These would be the first I would try in modeling time spent watching YouTube. (Or monitoring CrossValidated questions.)



                  More information on power laws can be found here or here, or in our power-law tag.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Mar 30 at 19:35









                  Stephan KolassaStephan Kolassa

                  47.5k7101178




                  47.5k7101178







                  • 14




                    $begingroup$
                    You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                    $endgroup$
                    – Matt Krause
                    Mar 30 at 22:26






                  • 2




                    $begingroup$
                    @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                    $endgroup$
                    – Tomáš Kafka
                    Apr 1 at 12:26










                  • $begingroup$
                    @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                    $endgroup$
                    – Stephan Kolassa
                    Apr 1 at 15:28












                  • 14




                    $begingroup$
                    You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                    $endgroup$
                    – Matt Krause
                    Mar 30 at 22:26






                  • 2




                    $begingroup$
                    @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                    $endgroup$
                    – Tomáš Kafka
                    Apr 1 at 12:26










                  • $begingroup$
                    @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                    $endgroup$
                    – Stephan Kolassa
                    Apr 1 at 15:28







                  14




                  14




                  $begingroup$
                  You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                  $endgroup$
                  – Matt Krause
                  Mar 30 at 22:26




                  $begingroup$
                  You're completely correct that normal distributions have support on the real line. And yet...they're no an awful model for some strictly positive qualities, like adults' height or weight, where the mean and variance are such that the negative values are very unlikely under the model.
                  $endgroup$
                  – Matt Krause
                  Mar 30 at 22:26




                  2




                  2




                  $begingroup$
                  @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                  $endgroup$
                  – Tomáš Kafka
                  Apr 1 at 12:26




                  $begingroup$
                  @MattKrause That's actually a great question - is there a same probability I will be '10 cm above or below the mean height' or '10 percent above or below the mean height'? Only the first case could warrant normal distribution.
                  $endgroup$
                  – Tomáš Kafka
                  Apr 1 at 12:26












                  $begingroup$
                  @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                  $endgroup$
                  – Stephan Kolassa
                  Apr 1 at 15:28




                  $begingroup$
                  @MattKrause: I completely agree, in a general sense. Yet, the present question is about the proportion of daily time spent watching YouTube. We don't have any data, but I would be extremely surprised if the distribution was even remotely symmetric.
                  $endgroup$
                  – Stephan Kolassa
                  Apr 1 at 15:28













                  43












                  $begingroup$

                  A distribution that is normal is not highly skewed. That is a contradiction. Normally distributed variables have skew = 0.






                  share|cite|improve this answer









                  $endgroup$








                  • 1




                    $begingroup$
                    What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:21






                  • 13




                    $begingroup$
                    Unimodal and skewed is as close as I can come...
                    $endgroup$
                    – jbowman
                    Mar 30 at 19:27






                  • 9




                    $begingroup$
                    As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:30






                  • 6




                    $begingroup$
                    Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                    $endgroup$
                    – gung
                    Mar 31 at 2:14










                  • $begingroup$
                    When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                    $endgroup$
                    – Carl Witthoft
                    Apr 1 at 18:03















                  43












                  $begingroup$

                  A distribution that is normal is not highly skewed. That is a contradiction. Normally distributed variables have skew = 0.






                  share|cite|improve this answer









                  $endgroup$








                  • 1




                    $begingroup$
                    What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:21






                  • 13




                    $begingroup$
                    Unimodal and skewed is as close as I can come...
                    $endgroup$
                    – jbowman
                    Mar 30 at 19:27






                  • 9




                    $begingroup$
                    As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:30






                  • 6




                    $begingroup$
                    Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                    $endgroup$
                    – gung
                    Mar 31 at 2:14










                  • $begingroup$
                    When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                    $endgroup$
                    – Carl Witthoft
                    Apr 1 at 18:03













                  43












                  43








                  43





                  $begingroup$

                  A distribution that is normal is not highly skewed. That is a contradiction. Normally distributed variables have skew = 0.






                  share|cite|improve this answer









                  $endgroup$



                  A distribution that is normal is not highly skewed. That is a contradiction. Normally distributed variables have skew = 0.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Mar 30 at 19:15









                  Peter FlomPeter Flom

                  77.3k12109217




                  77.3k12109217







                  • 1




                    $begingroup$
                    What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:21






                  • 13




                    $begingroup$
                    Unimodal and skewed is as close as I can come...
                    $endgroup$
                    – jbowman
                    Mar 30 at 19:27






                  • 9




                    $begingroup$
                    As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:30






                  • 6




                    $begingroup$
                    Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                    $endgroup$
                    – gung
                    Mar 31 at 2:14










                  • $begingroup$
                    When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                    $endgroup$
                    – Carl Witthoft
                    Apr 1 at 18:03












                  • 1




                    $begingroup$
                    What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:21






                  • 13




                    $begingroup$
                    Unimodal and skewed is as close as I can come...
                    $endgroup$
                    – jbowman
                    Mar 30 at 19:27






                  • 9




                    $begingroup$
                    As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                    $endgroup$
                    – Cauder
                    Mar 30 at 19:30






                  • 6




                    $begingroup$
                    Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                    $endgroup$
                    – gung
                    Mar 31 at 2:14










                  • $begingroup$
                    When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                    $endgroup$
                    – Carl Witthoft
                    Apr 1 at 18:03







                  1




                  1




                  $begingroup$
                  What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                  $endgroup$
                  – Cauder
                  Mar 30 at 19:21




                  $begingroup$
                  What is a better way to describe the distribution? Is there a word for that type of distribution where it centers around a mode and then has a long tail?
                  $endgroup$
                  – Cauder
                  Mar 30 at 19:21




                  13




                  13




                  $begingroup$
                  Unimodal and skewed is as close as I can come...
                  $endgroup$
                  – jbowman
                  Mar 30 at 19:27




                  $begingroup$
                  Unimodal and skewed is as close as I can come...
                  $endgroup$
                  – jbowman
                  Mar 30 at 19:27




                  9




                  9




                  $begingroup$
                  As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                  $endgroup$
                  – Cauder
                  Mar 30 at 19:30




                  $begingroup$
                  As an aside, it's just really incredible that people give their time to help other people get better at this stuff. I know it goes without saying, but it's so cool what you both do!
                  $endgroup$
                  – Cauder
                  Mar 30 at 19:30




                  6




                  6




                  $begingroup$
                  Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                  $endgroup$
                  – gung
                  Mar 31 at 2:14




                  $begingroup$
                  Yes, but it's worth clarifying that that statement pertains to the normally distributed population. A sample drawn from that population can be very skewed.
                  $endgroup$
                  – gung
                  Mar 31 at 2:14












                  $begingroup$
                  When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                  $endgroup$
                  – Carl Witthoft
                  Apr 1 at 18:03




                  $begingroup$
                  When the skew value is small ("small" being decided by the people dealing with the stats in question), you can still treat the population as normal, albeit with minor error as a result.
                  $endgroup$
                  – Carl Witthoft
                  Apr 1 at 18:03











                  19












                  $begingroup$

                  If it has long right tail, then it's right skewed.



                  enter image description here



                  It can't be a normal distribution since skew !=0, it's perhaps a unimodal skew normal distribution:



                  https://en.wikipedia.org/wiki/Skew_normal_distribution






                  share|cite|improve this answer









                  $endgroup$

















                    19












                    $begingroup$

                    If it has long right tail, then it's right skewed.



                    enter image description here



                    It can't be a normal distribution since skew !=0, it's perhaps a unimodal skew normal distribution:



                    https://en.wikipedia.org/wiki/Skew_normal_distribution






                    share|cite|improve this answer









                    $endgroup$















                      19












                      19








                      19





                      $begingroup$

                      If it has long right tail, then it's right skewed.



                      enter image description here



                      It can't be a normal distribution since skew !=0, it's perhaps a unimodal skew normal distribution:



                      https://en.wikipedia.org/wiki/Skew_normal_distribution






                      share|cite|improve this answer









                      $endgroup$



                      If it has long right tail, then it's right skewed.



                      enter image description here



                      It can't be a normal distribution since skew !=0, it's perhaps a unimodal skew normal distribution:



                      https://en.wikipedia.org/wiki/Skew_normal_distribution







                      share|cite|improve this answer












                      share|cite|improve this answer



                      share|cite|improve this answer










                      answered Mar 30 at 19:31









                      beholdbehold

                      3599




                      3599





















                          13












                          $begingroup$

                          It could be a log-normal distribution. As mentioned here:




                          Users' dwell time on online articles (jokes, news etc.) follows a log-normal distribution.




                          The reference given is: Yin, Peifeng; Luo, Ping; Lee, Wang-Chien; Wang, Min (2013). Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. ACM International Conference on KDD.






                          share|cite|improve this answer









                          $endgroup$

















                            13












                            $begingroup$

                            It could be a log-normal distribution. As mentioned here:




                            Users' dwell time on online articles (jokes, news etc.) follows a log-normal distribution.




                            The reference given is: Yin, Peifeng; Luo, Ping; Lee, Wang-Chien; Wang, Min (2013). Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. ACM International Conference on KDD.






                            share|cite|improve this answer









                            $endgroup$















                              13












                              13








                              13





                              $begingroup$

                              It could be a log-normal distribution. As mentioned here:




                              Users' dwell time on online articles (jokes, news etc.) follows a log-normal distribution.




                              The reference given is: Yin, Peifeng; Luo, Ping; Lee, Wang-Chien; Wang, Min (2013). Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. ACM International Conference on KDD.






                              share|cite|improve this answer









                              $endgroup$



                              It could be a log-normal distribution. As mentioned here:




                              Users' dwell time on online articles (jokes, news etc.) follows a log-normal distribution.




                              The reference given is: Yin, Peifeng; Luo, Ping; Lee, Wang-Chien; Wang, Min (2013). Silence is also evidence: interpreting dwell time for recommendation from psychological perspective. ACM International Conference on KDD.







                              share|cite|improve this answer












                              share|cite|improve this answer



                              share|cite|improve this answer










                              answered Mar 31 at 1:01









                              Count IblisCount Iblis

                              28114




                              28114





















                                  6












                                  $begingroup$

                                  The gamma distribution could be a good candidate to describe this kind of distribution over nonnegative, right-skewed data. See the green line in the image here:
                                  https://en.m.wikipedia.org/wiki/Gamma_distribution






                                  share|cite|improve this answer









                                  $endgroup$

















                                    6












                                    $begingroup$

                                    The gamma distribution could be a good candidate to describe this kind of distribution over nonnegative, right-skewed data. See the green line in the image here:
                                    https://en.m.wikipedia.org/wiki/Gamma_distribution






                                    share|cite|improve this answer









                                    $endgroup$















                                      6












                                      6








                                      6





                                      $begingroup$

                                      The gamma distribution could be a good candidate to describe this kind of distribution over nonnegative, right-skewed data. See the green line in the image here:
                                      https://en.m.wikipedia.org/wiki/Gamma_distribution






                                      share|cite|improve this answer









                                      $endgroup$



                                      The gamma distribution could be a good candidate to describe this kind of distribution over nonnegative, right-skewed data. See the green line in the image here:
                                      https://en.m.wikipedia.org/wiki/Gamma_distribution







                                      share|cite|improve this answer












                                      share|cite|improve this answer



                                      share|cite|improve this answer










                                      answered Mar 31 at 6:00









                                      mauricemaurice

                                      19816




                                      19816





















                                          6












                                          $begingroup$

                                          "Is there a better word for that distribution?"



                                          There's a worthwhile distinction here between using words to describe the properties of the distribution, versus trying to find a "name" for the distribution so that you can identify it as (approximately) an instance of a particular standard distribution: one for which a formula or statistical tables might exist for its distribution function, and for which you could estimate its parameters. In this latter case, you are likely using the named distribution, e.g. "normal/Gaussian" (the two terms are generally synonymous), as a model that captures some of the key features of your data, rather than claiming the population your data is drawn from exactly follows that theoretical distribution. To slightly misquote George Box, all models are "wrong", but some are useful. If you are thinking about the modelling approach, it is worth considering what features you want to incorporate and how complicated or parsimonious you want your model to be.



                                          Being positively skewed is an example of describing a property that the distribution has, but doesn't come close to specifying which off-the-shelf distribution is "the" appropriate model. It does rule out some candidates, for example the Gaussian (i.e. normal) distribution has zero skew so will not be appropriate to model your data if the skew is an important feature. There may be other properties of the data that are important to you too, e.g. that it's unimodal (has just one peak) or that it is bounded between 0 and 24 hours (or between 0 and 1, if you are writing it as a fraction of the day), or that there is a probability mass concentrated at zero (since there are people who do not watch youtube at all on a given day). You may also be interested in other properties like the kurtosis. And it is worth bearing in mind that even if your distribution had a "hump" or "bell-curve" shape and had zero or near-zero skew, it doesn't automatically follow that the normal distribution is "correct" for it! On the other hand, even if the population your data is drawn from actually did follow a particular distribution precisely, due to sampling error your dataset may not quite resemble it. Small data sets are likely to be "noisy", and it may be unclear whether certain features you can see, e.g. additional small humps or asymmetric tails, are properties of the underlying population the data was drawn from (and perhaps therefore ought to be incorporated in your model) or whether they are just artefacts from your particular sample (and for modelling purposes should be ignored). If you have a small data set and the skew is close to zero, then it is even plausible the underlying distribution is actually symmetric. The larger your data set and the larger the skewness, the less plausible this becomes — but while you could perform a significance test to see how convincing is the evidence your data provides for skewness in the population it was drawn from, this may be missing the point as to whether a normal (or other zero skew) distribution is appropriate as a model ...



                                          Which properties of the data really matter for the purposes you are intending to model it? Note that if the skew is reasonably small and you do not care very much about it, even if the underlying population is genuinely skewed, then you might still find the normal distribution a useful model to approximate this true distribution of watching times. But you should check that this doesn't end up making silly predictions. Because a normal distribution has no highest or lowest possible value, then although extremely high or low values become increasingly unlikely, you will always find that your model predicts there is some probability of watching for a negative number of hours per day, or more than 24 hours. This gets more problematic for you if the predicted probability of such impossible events becomes high. A symmetric distribution like the normal will predict that as many people will watch for lengths of time more than e.g. 50% above the mean, as watch for less than 50% below the mean. If watching times are very skewed, then this kind of prediction may also be so implausible as to be silly, and give you misleading results if you are taking the results of your model and using them as inputs for some other purpose (for instance, you're running a simulation of watching times in order to calculate optimal advertisement scheduling). If the skewness is so noteworthy you want to capture it as part of your model, then the skew normal distribution may be more appropriate. If you want to capture both skewness and kurtosis, then consider the skewed t. If you want to incorporate the physically possible upper and lower bounds, then consider using the truncated versions of these distributions. Many other probability distributions exist that can be skewed and unimodal (for appropriate parameter choices) such as the F or gamma distributions, and again you can truncate these so they do not predict impossibly high watching times. A beta distribution may be a good choice if you are modelling the fraction of the day spent watching, as this is always bounded between 0 and 1 without further truncation being necessary. If you want to incorporate the concentration of probability at exactly zero due to non-watchers, then consider building in a hurdle model.



                                          But at the point you are trying to throw in every feature you can identify from your data, and build an ever more sophisticated model, perhaps you should ask yourself why you are doing this? Would there be an advantage to a simpler model, for example it being easier to work with mathematically or having fewer parameters to estimate? If you are concerned that such simplification will leave you unable to capture all of the properties of interest to you, it may well be that no "off-the-shelf" distribution does quite what you want. However, we are not restricted to working with named distributions whose mathematical properties have been elucidated previously. Instead, consider using your data to construct an empirical distribution function. This will capture all the behaviour that was present in your data, but you can no longer give it a name like "normal" or "gamma", nor can you apply mathematical properties that pertain only to a particular distribution. For instance, the "95% of the data lies within 1.96 standard deviations of the mean" rule is for normally distributed data and may not apply to your distribution; though note that some rules apply to all distributions, e.g. Chebyshev's inequality guarantees at least 75% of your data must lie within two standard deviations of the mean, regardless of the skew. Unfortunately the empirical distribution will also inherit all those properties of your data set arising purely by sampling error, not just those possessed by the underlying population, so you may find a histogram of your empirical distribution has some humps and dips that the population itself does not. You may want to investigate smoothed empirical distribution functions, or better yet, increasing your sample size.



                                          In summary: although the normal distribution has zero skew, the fact your data are skewed doesn't rule out the normal distribution as a useful model, though it does suggest some other distribution may be more appropriate. You should consider other properties of the data when choosing your model, besides the skew, and consider too the purposes you are going to use the model for. It's safe to say that your true population of watching times does not exactly follow some famous, named distribution, but this does not mean such a distribution is doomed to be useless as a model. However, for some purposes you may prefer to just use the empirical distribution itself, rather than try fitting a standard distribution to it.






                                          share|cite|improve this answer











                                          $endgroup$

















                                            6












                                            $begingroup$

                                            "Is there a better word for that distribution?"



                                            There's a worthwhile distinction here between using words to describe the properties of the distribution, versus trying to find a "name" for the distribution so that you can identify it as (approximately) an instance of a particular standard distribution: one for which a formula or statistical tables might exist for its distribution function, and for which you could estimate its parameters. In this latter case, you are likely using the named distribution, e.g. "normal/Gaussian" (the two terms are generally synonymous), as a model that captures some of the key features of your data, rather than claiming the population your data is drawn from exactly follows that theoretical distribution. To slightly misquote George Box, all models are "wrong", but some are useful. If you are thinking about the modelling approach, it is worth considering what features you want to incorporate and how complicated or parsimonious you want your model to be.



                                            Being positively skewed is an example of describing a property that the distribution has, but doesn't come close to specifying which off-the-shelf distribution is "the" appropriate model. It does rule out some candidates, for example the Gaussian (i.e. normal) distribution has zero skew so will not be appropriate to model your data if the skew is an important feature. There may be other properties of the data that are important to you too, e.g. that it's unimodal (has just one peak) or that it is bounded between 0 and 24 hours (or between 0 and 1, if you are writing it as a fraction of the day), or that there is a probability mass concentrated at zero (since there are people who do not watch youtube at all on a given day). You may also be interested in other properties like the kurtosis. And it is worth bearing in mind that even if your distribution had a "hump" or "bell-curve" shape and had zero or near-zero skew, it doesn't automatically follow that the normal distribution is "correct" for it! On the other hand, even if the population your data is drawn from actually did follow a particular distribution precisely, due to sampling error your dataset may not quite resemble it. Small data sets are likely to be "noisy", and it may be unclear whether certain features you can see, e.g. additional small humps or asymmetric tails, are properties of the underlying population the data was drawn from (and perhaps therefore ought to be incorporated in your model) or whether they are just artefacts from your particular sample (and for modelling purposes should be ignored). If you have a small data set and the skew is close to zero, then it is even plausible the underlying distribution is actually symmetric. The larger your data set and the larger the skewness, the less plausible this becomes — but while you could perform a significance test to see how convincing is the evidence your data provides for skewness in the population it was drawn from, this may be missing the point as to whether a normal (or other zero skew) distribution is appropriate as a model ...



                                            Which properties of the data really matter for the purposes you are intending to model it? Note that if the skew is reasonably small and you do not care very much about it, even if the underlying population is genuinely skewed, then you might still find the normal distribution a useful model to approximate this true distribution of watching times. But you should check that this doesn't end up making silly predictions. Because a normal distribution has no highest or lowest possible value, then although extremely high or low values become increasingly unlikely, you will always find that your model predicts there is some probability of watching for a negative number of hours per day, or more than 24 hours. This gets more problematic for you if the predicted probability of such impossible events becomes high. A symmetric distribution like the normal will predict that as many people will watch for lengths of time more than e.g. 50% above the mean, as watch for less than 50% below the mean. If watching times are very skewed, then this kind of prediction may also be so implausible as to be silly, and give you misleading results if you are taking the results of your model and using them as inputs for some other purpose (for instance, you're running a simulation of watching times in order to calculate optimal advertisement scheduling). If the skewness is so noteworthy you want to capture it as part of your model, then the skew normal distribution may be more appropriate. If you want to capture both skewness and kurtosis, then consider the skewed t. If you want to incorporate the physically possible upper and lower bounds, then consider using the truncated versions of these distributions. Many other probability distributions exist that can be skewed and unimodal (for appropriate parameter choices) such as the F or gamma distributions, and again you can truncate these so they do not predict impossibly high watching times. A beta distribution may be a good choice if you are modelling the fraction of the day spent watching, as this is always bounded between 0 and 1 without further truncation being necessary. If you want to incorporate the concentration of probability at exactly zero due to non-watchers, then consider building in a hurdle model.



                                            But at the point you are trying to throw in every feature you can identify from your data, and build an ever more sophisticated model, perhaps you should ask yourself why you are doing this? Would there be an advantage to a simpler model, for example it being easier to work with mathematically or having fewer parameters to estimate? If you are concerned that such simplification will leave you unable to capture all of the properties of interest to you, it may well be that no "off-the-shelf" distribution does quite what you want. However, we are not restricted to working with named distributions whose mathematical properties have been elucidated previously. Instead, consider using your data to construct an empirical distribution function. This will capture all the behaviour that was present in your data, but you can no longer give it a name like "normal" or "gamma", nor can you apply mathematical properties that pertain only to a particular distribution. For instance, the "95% of the data lies within 1.96 standard deviations of the mean" rule is for normally distributed data and may not apply to your distribution; though note that some rules apply to all distributions, e.g. Chebyshev's inequality guarantees at least 75% of your data must lie within two standard deviations of the mean, regardless of the skew. Unfortunately the empirical distribution will also inherit all those properties of your data set arising purely by sampling error, not just those possessed by the underlying population, so you may find a histogram of your empirical distribution has some humps and dips that the population itself does not. You may want to investigate smoothed empirical distribution functions, or better yet, increasing your sample size.



                                            In summary: although the normal distribution has zero skew, the fact your data are skewed doesn't rule out the normal distribution as a useful model, though it does suggest some other distribution may be more appropriate. You should consider other properties of the data when choosing your model, besides the skew, and consider too the purposes you are going to use the model for. It's safe to say that your true population of watching times does not exactly follow some famous, named distribution, but this does not mean such a distribution is doomed to be useless as a model. However, for some purposes you may prefer to just use the empirical distribution itself, rather than try fitting a standard distribution to it.






                                            share|cite|improve this answer











                                            $endgroup$















                                              6












                                              6








                                              6





                                              $begingroup$

                                              "Is there a better word for that distribution?"



                                              There's a worthwhile distinction here between using words to describe the properties of the distribution, versus trying to find a "name" for the distribution so that you can identify it as (approximately) an instance of a particular standard distribution: one for which a formula or statistical tables might exist for its distribution function, and for which you could estimate its parameters. In this latter case, you are likely using the named distribution, e.g. "normal/Gaussian" (the two terms are generally synonymous), as a model that captures some of the key features of your data, rather than claiming the population your data is drawn from exactly follows that theoretical distribution. To slightly misquote George Box, all models are "wrong", but some are useful. If you are thinking about the modelling approach, it is worth considering what features you want to incorporate and how complicated or parsimonious you want your model to be.



                                              Being positively skewed is an example of describing a property that the distribution has, but doesn't come close to specifying which off-the-shelf distribution is "the" appropriate model. It does rule out some candidates, for example the Gaussian (i.e. normal) distribution has zero skew so will not be appropriate to model your data if the skew is an important feature. There may be other properties of the data that are important to you too, e.g. that it's unimodal (has just one peak) or that it is bounded between 0 and 24 hours (or between 0 and 1, if you are writing it as a fraction of the day), or that there is a probability mass concentrated at zero (since there are people who do not watch youtube at all on a given day). You may also be interested in other properties like the kurtosis. And it is worth bearing in mind that even if your distribution had a "hump" or "bell-curve" shape and had zero or near-zero skew, it doesn't automatically follow that the normal distribution is "correct" for it! On the other hand, even if the population your data is drawn from actually did follow a particular distribution precisely, due to sampling error your dataset may not quite resemble it. Small data sets are likely to be "noisy", and it may be unclear whether certain features you can see, e.g. additional small humps or asymmetric tails, are properties of the underlying population the data was drawn from (and perhaps therefore ought to be incorporated in your model) or whether they are just artefacts from your particular sample (and for modelling purposes should be ignored). If you have a small data set and the skew is close to zero, then it is even plausible the underlying distribution is actually symmetric. The larger your data set and the larger the skewness, the less plausible this becomes — but while you could perform a significance test to see how convincing is the evidence your data provides for skewness in the population it was drawn from, this may be missing the point as to whether a normal (or other zero skew) distribution is appropriate as a model ...



                                              Which properties of the data really matter for the purposes you are intending to model it? Note that if the skew is reasonably small and you do not care very much about it, even if the underlying population is genuinely skewed, then you might still find the normal distribution a useful model to approximate this true distribution of watching times. But you should check that this doesn't end up making silly predictions. Because a normal distribution has no highest or lowest possible value, then although extremely high or low values become increasingly unlikely, you will always find that your model predicts there is some probability of watching for a negative number of hours per day, or more than 24 hours. This gets more problematic for you if the predicted probability of such impossible events becomes high. A symmetric distribution like the normal will predict that as many people will watch for lengths of time more than e.g. 50% above the mean, as watch for less than 50% below the mean. If watching times are very skewed, then this kind of prediction may also be so implausible as to be silly, and give you misleading results if you are taking the results of your model and using them as inputs for some other purpose (for instance, you're running a simulation of watching times in order to calculate optimal advertisement scheduling). If the skewness is so noteworthy you want to capture it as part of your model, then the skew normal distribution may be more appropriate. If you want to capture both skewness and kurtosis, then consider the skewed t. If you want to incorporate the physically possible upper and lower bounds, then consider using the truncated versions of these distributions. Many other probability distributions exist that can be skewed and unimodal (for appropriate parameter choices) such as the F or gamma distributions, and again you can truncate these so they do not predict impossibly high watching times. A beta distribution may be a good choice if you are modelling the fraction of the day spent watching, as this is always bounded between 0 and 1 without further truncation being necessary. If you want to incorporate the concentration of probability at exactly zero due to non-watchers, then consider building in a hurdle model.



                                              But at the point you are trying to throw in every feature you can identify from your data, and build an ever more sophisticated model, perhaps you should ask yourself why you are doing this? Would there be an advantage to a simpler model, for example it being easier to work with mathematically or having fewer parameters to estimate? If you are concerned that such simplification will leave you unable to capture all of the properties of interest to you, it may well be that no "off-the-shelf" distribution does quite what you want. However, we are not restricted to working with named distributions whose mathematical properties have been elucidated previously. Instead, consider using your data to construct an empirical distribution function. This will capture all the behaviour that was present in your data, but you can no longer give it a name like "normal" or "gamma", nor can you apply mathematical properties that pertain only to a particular distribution. For instance, the "95% of the data lies within 1.96 standard deviations of the mean" rule is for normally distributed data and may not apply to your distribution; though note that some rules apply to all distributions, e.g. Chebyshev's inequality guarantees at least 75% of your data must lie within two standard deviations of the mean, regardless of the skew. Unfortunately the empirical distribution will also inherit all those properties of your data set arising purely by sampling error, not just those possessed by the underlying population, so you may find a histogram of your empirical distribution has some humps and dips that the population itself does not. You may want to investigate smoothed empirical distribution functions, or better yet, increasing your sample size.



                                              In summary: although the normal distribution has zero skew, the fact your data are skewed doesn't rule out the normal distribution as a useful model, though it does suggest some other distribution may be more appropriate. You should consider other properties of the data when choosing your model, besides the skew, and consider too the purposes you are going to use the model for. It's safe to say that your true population of watching times does not exactly follow some famous, named distribution, but this does not mean such a distribution is doomed to be useless as a model. However, for some purposes you may prefer to just use the empirical distribution itself, rather than try fitting a standard distribution to it.






                                              share|cite|improve this answer











                                              $endgroup$



                                              "Is there a better word for that distribution?"



                                              There's a worthwhile distinction here between using words to describe the properties of the distribution, versus trying to find a "name" for the distribution so that you can identify it as (approximately) an instance of a particular standard distribution: one for which a formula or statistical tables might exist for its distribution function, and for which you could estimate its parameters. In this latter case, you are likely using the named distribution, e.g. "normal/Gaussian" (the two terms are generally synonymous), as a model that captures some of the key features of your data, rather than claiming the population your data is drawn from exactly follows that theoretical distribution. To slightly misquote George Box, all models are "wrong", but some are useful. If you are thinking about the modelling approach, it is worth considering what features you want to incorporate and how complicated or parsimonious you want your model to be.



                                              Being positively skewed is an example of describing a property that the distribution has, but doesn't come close to specifying which off-the-shelf distribution is "the" appropriate model. It does rule out some candidates, for example the Gaussian (i.e. normal) distribution has zero skew so will not be appropriate to model your data if the skew is an important feature. There may be other properties of the data that are important to you too, e.g. that it's unimodal (has just one peak) or that it is bounded between 0 and 24 hours (or between 0 and 1, if you are writing it as a fraction of the day), or that there is a probability mass concentrated at zero (since there are people who do not watch youtube at all on a given day). You may also be interested in other properties like the kurtosis. And it is worth bearing in mind that even if your distribution had a "hump" or "bell-curve" shape and had zero or near-zero skew, it doesn't automatically follow that the normal distribution is "correct" for it! On the other hand, even if the population your data is drawn from actually did follow a particular distribution precisely, due to sampling error your dataset may not quite resemble it. Small data sets are likely to be "noisy", and it may be unclear whether certain features you can see, e.g. additional small humps or asymmetric tails, are properties of the underlying population the data was drawn from (and perhaps therefore ought to be incorporated in your model) or whether they are just artefacts from your particular sample (and for modelling purposes should be ignored). If you have a small data set and the skew is close to zero, then it is even plausible the underlying distribution is actually symmetric. The larger your data set and the larger the skewness, the less plausible this becomes — but while you could perform a significance test to see how convincing is the evidence your data provides for skewness in the population it was drawn from, this may be missing the point as to whether a normal (or other zero skew) distribution is appropriate as a model ...



                                              Which properties of the data really matter for the purposes you are intending to model it? Note that if the skew is reasonably small and you do not care very much about it, even if the underlying population is genuinely skewed, then you might still find the normal distribution a useful model to approximate this true distribution of watching times. But you should check that this doesn't end up making silly predictions. Because a normal distribution has no highest or lowest possible value, then although extremely high or low values become increasingly unlikely, you will always find that your model predicts there is some probability of watching for a negative number of hours per day, or more than 24 hours. This gets more problematic for you if the predicted probability of such impossible events becomes high. A symmetric distribution like the normal will predict that as many people will watch for lengths of time more than e.g. 50% above the mean, as watch for less than 50% below the mean. If watching times are very skewed, then this kind of prediction may also be so implausible as to be silly, and give you misleading results if you are taking the results of your model and using them as inputs for some other purpose (for instance, you're running a simulation of watching times in order to calculate optimal advertisement scheduling). If the skewness is so noteworthy you want to capture it as part of your model, then the skew normal distribution may be more appropriate. If you want to capture both skewness and kurtosis, then consider the skewed t. If you want to incorporate the physically possible upper and lower bounds, then consider using the truncated versions of these distributions. Many other probability distributions exist that can be skewed and unimodal (for appropriate parameter choices) such as the F or gamma distributions, and again you can truncate these so they do not predict impossibly high watching times. A beta distribution may be a good choice if you are modelling the fraction of the day spent watching, as this is always bounded between 0 and 1 without further truncation being necessary. If you want to incorporate the concentration of probability at exactly zero due to non-watchers, then consider building in a hurdle model.



                                              But at the point you are trying to throw in every feature you can identify from your data, and build an ever more sophisticated model, perhaps you should ask yourself why you are doing this? Would there be an advantage to a simpler model, for example it being easier to work with mathematically or having fewer parameters to estimate? If you are concerned that such simplification will leave you unable to capture all of the properties of interest to you, it may well be that no "off-the-shelf" distribution does quite what you want. However, we are not restricted to working with named distributions whose mathematical properties have been elucidated previously. Instead, consider using your data to construct an empirical distribution function. This will capture all the behaviour that was present in your data, but you can no longer give it a name like "normal" or "gamma", nor can you apply mathematical properties that pertain only to a particular distribution. For instance, the "95% of the data lies within 1.96 standard deviations of the mean" rule is for normally distributed data and may not apply to your distribution; though note that some rules apply to all distributions, e.g. Chebyshev's inequality guarantees at least 75% of your data must lie within two standard deviations of the mean, regardless of the skew. Unfortunately the empirical distribution will also inherit all those properties of your data set arising purely by sampling error, not just those possessed by the underlying population, so you may find a histogram of your empirical distribution has some humps and dips that the population itself does not. You may want to investigate smoothed empirical distribution functions, or better yet, increasing your sample size.



                                              In summary: although the normal distribution has zero skew, the fact your data are skewed doesn't rule out the normal distribution as a useful model, though it does suggest some other distribution may be more appropriate. You should consider other properties of the data when choosing your model, besides the skew, and consider too the purposes you are going to use the model for. It's safe to say that your true population of watching times does not exactly follow some famous, named distribution, but this does not mean such a distribution is doomed to be useless as a model. However, for some purposes you may prefer to just use the empirical distribution itself, rather than try fitting a standard distribution to it.







                                              share|cite|improve this answer














                                              share|cite|improve this answer



                                              share|cite|improve this answer








                                              edited Apr 2 at 22:15

























                                              answered Apr 2 at 1:19









                                              SilverfishSilverfish

                                              15.2k1567147




                                              15.2k1567147





















                                                  4












                                                  $begingroup$

                                                  "Normal" and "Gaussian" mean exactly the same thing. As other answers explain, the distribution you're talking about is not normal/Gaussian, because that distribution assigns probabilities to every value on the real line, whereas your distribution only exists between $0$ and $24$.






                                                  share|cite|improve this answer









                                                  $endgroup$

















                                                    4












                                                    $begingroup$

                                                    "Normal" and "Gaussian" mean exactly the same thing. As other answers explain, the distribution you're talking about is not normal/Gaussian, because that distribution assigns probabilities to every value on the real line, whereas your distribution only exists between $0$ and $24$.






                                                    share|cite|improve this answer









                                                    $endgroup$















                                                      4












                                                      4








                                                      4





                                                      $begingroup$

                                                      "Normal" and "Gaussian" mean exactly the same thing. As other answers explain, the distribution you're talking about is not normal/Gaussian, because that distribution assigns probabilities to every value on the real line, whereas your distribution only exists between $0$ and $24$.






                                                      share|cite|improve this answer









                                                      $endgroup$



                                                      "Normal" and "Gaussian" mean exactly the same thing. As other answers explain, the distribution you're talking about is not normal/Gaussian, because that distribution assigns probabilities to every value on the real line, whereas your distribution only exists between $0$ and $24$.







                                                      share|cite|improve this answer












                                                      share|cite|improve this answer



                                                      share|cite|improve this answer










                                                      answered Apr 1 at 14:08









                                                      David RicherbyDavid Richerby

                                                      1555




                                                      1555





















                                                          3












                                                          $begingroup$

                                                          In the case at hand, since the time spent per day is bound from $0$ to $1$ (if quantified as a fraction of the day), distributions that are unbounded above (e.g. Pareto, skew-normal, Gamma, log-normal) won't work, but Beta would.






                                                          share|cite|improve this answer









                                                          $endgroup$

















                                                            3












                                                            $begingroup$

                                                            In the case at hand, since the time spent per day is bound from $0$ to $1$ (if quantified as a fraction of the day), distributions that are unbounded above (e.g. Pareto, skew-normal, Gamma, log-normal) won't work, but Beta would.






                                                            share|cite|improve this answer









                                                            $endgroup$















                                                              3












                                                              3








                                                              3





                                                              $begingroup$

                                                              In the case at hand, since the time spent per day is bound from $0$ to $1$ (if quantified as a fraction of the day), distributions that are unbounded above (e.g. Pareto, skew-normal, Gamma, log-normal) won't work, but Beta would.






                                                              share|cite|improve this answer









                                                              $endgroup$



                                                              In the case at hand, since the time spent per day is bound from $0$ to $1$ (if quantified as a fraction of the day), distributions that are unbounded above (e.g. Pareto, skew-normal, Gamma, log-normal) won't work, but Beta would.







                                                              share|cite|improve this answer












                                                              share|cite|improve this answer



                                                              share|cite|improve this answer










                                                              answered Apr 1 at 10:36









                                                              J.G.J.G.

                                                              27616




                                                              27616





















                                                                  2












                                                                  $begingroup$

                                                                  How about a hurdle model?



                                                                  A hurdle model has two parts. The first is Bernoulli experiment that determines whether you use YouTube at all. If you don't, then your usage time is obviously zero and you're done. If you do, you "pass that hurdle", then the usage time comes from some other strictly positive distribution.



                                                                  A closely related concept are zero-inflated models. These are meant to deal with a situation where we observe a bunch of zeros, but can't distinguish between always-zeros and sometimes-zeros. For example, consider the number of cigarettes that a person smokes each day. For non-smokers, that number is always zero, but some smokers may not smoke on a given day (out of cigarettes? on a long flight?). Unlike the hurdle model, the "smoker" distribution here should include zero, but these counts are 'inflated' by the non-smokers' contribution too.






                                                                  share|cite|improve this answer









                                                                  $endgroup$

















                                                                    2












                                                                    $begingroup$

                                                                    How about a hurdle model?



                                                                    A hurdle model has two parts. The first is Bernoulli experiment that determines whether you use YouTube at all. If you don't, then your usage time is obviously zero and you're done. If you do, you "pass that hurdle", then the usage time comes from some other strictly positive distribution.



                                                                    A closely related concept are zero-inflated models. These are meant to deal with a situation where we observe a bunch of zeros, but can't distinguish between always-zeros and sometimes-zeros. For example, consider the number of cigarettes that a person smokes each day. For non-smokers, that number is always zero, but some smokers may not smoke on a given day (out of cigarettes? on a long flight?). Unlike the hurdle model, the "smoker" distribution here should include zero, but these counts are 'inflated' by the non-smokers' contribution too.






                                                                    share|cite|improve this answer









                                                                    $endgroup$















                                                                      2












                                                                      2








                                                                      2





                                                                      $begingroup$

                                                                      How about a hurdle model?



                                                                      A hurdle model has two parts. The first is Bernoulli experiment that determines whether you use YouTube at all. If you don't, then your usage time is obviously zero and you're done. If you do, you "pass that hurdle", then the usage time comes from some other strictly positive distribution.



                                                                      A closely related concept are zero-inflated models. These are meant to deal with a situation where we observe a bunch of zeros, but can't distinguish between always-zeros and sometimes-zeros. For example, consider the number of cigarettes that a person smokes each day. For non-smokers, that number is always zero, but some smokers may not smoke on a given day (out of cigarettes? on a long flight?). Unlike the hurdle model, the "smoker" distribution here should include zero, but these counts are 'inflated' by the non-smokers' contribution too.






                                                                      share|cite|improve this answer









                                                                      $endgroup$



                                                                      How about a hurdle model?



                                                                      A hurdle model has two parts. The first is Bernoulli experiment that determines whether you use YouTube at all. If you don't, then your usage time is obviously zero and you're done. If you do, you "pass that hurdle", then the usage time comes from some other strictly positive distribution.



                                                                      A closely related concept are zero-inflated models. These are meant to deal with a situation where we observe a bunch of zeros, but can't distinguish between always-zeros and sometimes-zeros. For example, consider the number of cigarettes that a person smokes each day. For non-smokers, that number is always zero, but some smokers may not smoke on a given day (out of cigarettes? on a long flight?). Unlike the hurdle model, the "smoker" distribution here should include zero, but these counts are 'inflated' by the non-smokers' contribution too.







                                                                      share|cite|improve this answer












                                                                      share|cite|improve this answer



                                                                      share|cite|improve this answer










                                                                      answered Apr 1 at 13:58









                                                                      Matt KrauseMatt Krause

                                                                      15k24480




                                                                      15k24480





















                                                                          0












                                                                          $begingroup$

                                                                          If the distribution is indeed a 'subset' of the normal distribution, you should considder a truncated model. Widely used in this context is the family of TOBIT models.

                                                                          They essentialy suggest a pdf with a (positive) probability mass at 0 and then a 'cut of part of the normal distribution' for positive values.

                                                                          I will refrain from typing the formula here and rather refere you to the Wikipedia Article: https://en.wikipedia.org/wiki/Tobit_model






                                                                          share|cite|improve this answer









                                                                          $endgroup$

















                                                                            0












                                                                            $begingroup$

                                                                            If the distribution is indeed a 'subset' of the normal distribution, you should considder a truncated model. Widely used in this context is the family of TOBIT models.

                                                                            They essentialy suggest a pdf with a (positive) probability mass at 0 and then a 'cut of part of the normal distribution' for positive values.

                                                                            I will refrain from typing the formula here and rather refere you to the Wikipedia Article: https://en.wikipedia.org/wiki/Tobit_model






                                                                            share|cite|improve this answer









                                                                            $endgroup$















                                                                              0












                                                                              0








                                                                              0





                                                                              $begingroup$

                                                                              If the distribution is indeed a 'subset' of the normal distribution, you should considder a truncated model. Widely used in this context is the family of TOBIT models.

                                                                              They essentialy suggest a pdf with a (positive) probability mass at 0 and then a 'cut of part of the normal distribution' for positive values.

                                                                              I will refrain from typing the formula here and rather refere you to the Wikipedia Article: https://en.wikipedia.org/wiki/Tobit_model






                                                                              share|cite|improve this answer









                                                                              $endgroup$



                                                                              If the distribution is indeed a 'subset' of the normal distribution, you should considder a truncated model. Widely used in this context is the family of TOBIT models.

                                                                              They essentialy suggest a pdf with a (positive) probability mass at 0 and then a 'cut of part of the normal distribution' for positive values.

                                                                              I will refrain from typing the formula here and rather refere you to the Wikipedia Article: https://en.wikipedia.org/wiki/Tobit_model







                                                                              share|cite|improve this answer












                                                                              share|cite|improve this answer



                                                                              share|cite|improve this answer










                                                                              answered Apr 2 at 12:25









                                                                              LucasLucas

                                                                              101




                                                                              101





















                                                                                  -3












                                                                                  $begingroup$

                                                                                  Normal distributions are by definition non-skewed, so you can't have both things. If the distribution is left-skewed, then it cannot be Gaussian. You'll have to pick a different one! The closest thing to your request I can think of is this:



                                                                                  https://en.wikipedia.org/wiki/Skew_normal_distribution






                                                                                  share|cite|improve this answer









                                                                                  $endgroup$








                                                                                  • 3




                                                                                    $begingroup$
                                                                                    I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 9:48










                                                                                  • $begingroup$
                                                                                    It summarizes many of them in a straight-forward three-line response
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 11:46






                                                                                  • 3




                                                                                    $begingroup$
                                                                                    Sorry, but that's still repetition.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 12:52










                                                                                  • $begingroup$
                                                                                    OK... who cares?
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 14:06






                                                                                  • 2




                                                                                    $begingroup$
                                                                                    Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 14:24
















                                                                                  -3












                                                                                  $begingroup$

                                                                                  Normal distributions are by definition non-skewed, so you can't have both things. If the distribution is left-skewed, then it cannot be Gaussian. You'll have to pick a different one! The closest thing to your request I can think of is this:



                                                                                  https://en.wikipedia.org/wiki/Skew_normal_distribution






                                                                                  share|cite|improve this answer









                                                                                  $endgroup$








                                                                                  • 3




                                                                                    $begingroup$
                                                                                    I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 9:48










                                                                                  • $begingroup$
                                                                                    It summarizes many of them in a straight-forward three-line response
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 11:46






                                                                                  • 3




                                                                                    $begingroup$
                                                                                    Sorry, but that's still repetition.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 12:52










                                                                                  • $begingroup$
                                                                                    OK... who cares?
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 14:06






                                                                                  • 2




                                                                                    $begingroup$
                                                                                    Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 14:24














                                                                                  -3












                                                                                  -3








                                                                                  -3





                                                                                  $begingroup$

                                                                                  Normal distributions are by definition non-skewed, so you can't have both things. If the distribution is left-skewed, then it cannot be Gaussian. You'll have to pick a different one! The closest thing to your request I can think of is this:



                                                                                  https://en.wikipedia.org/wiki/Skew_normal_distribution






                                                                                  share|cite|improve this answer









                                                                                  $endgroup$



                                                                                  Normal distributions are by definition non-skewed, so you can't have both things. If the distribution is left-skewed, then it cannot be Gaussian. You'll have to pick a different one! The closest thing to your request I can think of is this:



                                                                                  https://en.wikipedia.org/wiki/Skew_normal_distribution







                                                                                  share|cite|improve this answer












                                                                                  share|cite|improve this answer



                                                                                  share|cite|improve this answer










                                                                                  answered Apr 2 at 7:59









                                                                                  DavidDavid

                                                                                  4955




                                                                                  4955







                                                                                  • 3




                                                                                    $begingroup$
                                                                                    I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 9:48










                                                                                  • $begingroup$
                                                                                    It summarizes many of them in a straight-forward three-line response
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 11:46






                                                                                  • 3




                                                                                    $begingroup$
                                                                                    Sorry, but that's still repetition.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 12:52










                                                                                  • $begingroup$
                                                                                    OK... who cares?
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 14:06






                                                                                  • 2




                                                                                    $begingroup$
                                                                                    Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 14:24













                                                                                  • 3




                                                                                    $begingroup$
                                                                                    I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 9:48










                                                                                  • $begingroup$
                                                                                    It summarizes many of them in a straight-forward three-line response
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 11:46






                                                                                  • 3




                                                                                    $begingroup$
                                                                                    Sorry, but that's still repetition.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 12:52










                                                                                  • $begingroup$
                                                                                    OK... who cares?
                                                                                    $endgroup$
                                                                                    – David
                                                                                    Apr 2 at 14:06






                                                                                  • 2




                                                                                    $begingroup$
                                                                                    Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                    $endgroup$
                                                                                    – Nick Cox
                                                                                    Apr 2 at 14:24








                                                                                  3




                                                                                  3




                                                                                  $begingroup$
                                                                                  I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 9:48




                                                                                  $begingroup$
                                                                                  I agree except that the OP is confusing left and right skewness, as already pointed out. And @behold has already suggested the skew-normal in an answer. So, I can't see that this adds to existing answers.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 9:48












                                                                                  $begingroup$
                                                                                  It summarizes many of them in a straight-forward three-line response
                                                                                  $endgroup$
                                                                                  – David
                                                                                  Apr 2 at 11:46




                                                                                  $begingroup$
                                                                                  It summarizes many of them in a straight-forward three-line response
                                                                                  $endgroup$
                                                                                  – David
                                                                                  Apr 2 at 11:46




                                                                                  3




                                                                                  3




                                                                                  $begingroup$
                                                                                  Sorry, but that's still repetition.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 12:52




                                                                                  $begingroup$
                                                                                  Sorry, but that's still repetition.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 12:52












                                                                                  $begingroup$
                                                                                  OK... who cares?
                                                                                  $endgroup$
                                                                                  – David
                                                                                  Apr 2 at 14:06




                                                                                  $begingroup$
                                                                                  OK... who cares?
                                                                                  $endgroup$
                                                                                  – David
                                                                                  Apr 2 at 14:06




                                                                                  2




                                                                                  2




                                                                                  $begingroup$
                                                                                  Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 14:24





                                                                                  $begingroup$
                                                                                  Well, I do; and whoever added +1 to my comments (clearly not me) and whoever downvoted your answer (not me, as it happens). This thread is already long and repetitive; yet more redundant comments don't improve it for future readers.
                                                                                  $endgroup$
                                                                                  – Nick Cox
                                                                                  Apr 2 at 14:24






                                                                                  protected by gung Apr 2 at 13:16



                                                                                  Thank you for your interest in this question.
                                                                                  Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                                                  Would you like to answer one of these unanswered questions instead?



                                                                                  Popular posts from this blog

                                                                                  Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                                                                                  Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                                                                                  Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High