Negative binomial distribution

Different texts (and even different parts of this article) adopt slightly different definitions for the negative binomial distribution. They can be distinguished by whether the support starts at k = 0 or at k = r, whether p denotes the probability of a success or of a failure, and whether r represents success or failure,[1] so identifying the specific parametrization used is crucial in any given text.
Probability mass function

The orange line represents the mean, which is equal to 10 in each of these plots; the green line shows the standard deviation.
Notation
Parameters r > 0 — number of successes until the experiment is stopped (integer, but the definition can also be extended to reals)
p ∈ [0,1] — success probability in each experiment (real)
Support k ∈ { 0, 1, 2, 3, … } — number of failures
PMF involving a binomial coefficient
CDF the regularized incomplete beta function
Mean
Mode
Variance
Skewness
Excess kurtosis
MGF
CF
PGF
Fisher information
Method of moments

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes (denoted ) occurs.[2] For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success (). In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

An alternative formulation is to model the number of total trials (instead of the number of failures). In fact, for a specified (non-random) number of successes (r), the number of failures (n − r) are random because the total trials (n) are random. For example, we could use the negative binomial distribution to model the number of days n (random) a certain machine works (specified by r) before it breaks down.

The Pascal distribution (after Blaise Pascal) and Polya distribution (for George Pólya) are special cases of the negative binomial distribution. A convention among engineers, climatologists, and others is to use "negative binomial" or "Pascal" for the case of an integer-valued stopping-time parameter () and use "Polya" for the real-valued case.

For occurrences of associated discrete events, like tornado outbreaks, the Polya distributions can be used to give more accurate models than the Poisson distribution by allowing the mean and variance to be different, unlike the Poisson. The negative binomial distribution has a variance , with the distribution becoming identical to Poisson in the limit for a given mean (i.e. when the failures are increasingly rare). This can make the distribution a useful overdispersed alternative to the Poisson distribution, for example for a robust modification of Poisson regression. In epidemiology, it has been used to model disease transmission for infectious diseases where the likely number of onward infections may vary considerably from individual to individual and from setting to setting.[3] More generally, it may be appropriate where events have positively correlated occurrences causing a larger variance than if the occurrences were independent, due to a positive covariance term.

The term "negative binomial" is likely due to the fact that a certain binomial coefficient that appears in the formula for the probability mass function of the distribution can be written more simply with negative numbers.[4]

  1. ^ DeGroot, Morris H. (1986). Probability and Statistics (Second ed.). Addison-Wesley. pp. 258–259. ISBN 0-201-11366-X. LCCN 84006269. OCLC 10605205.
  2. ^ Weisstein, Eric. "Negative Binomial Distribution". Wolfram MathWorld. Wolfram Research. Retrieved 11 October 2020.
  3. ^ e.g. Lloyd-Smith, J. O.; Schreiber, S. J.; Kopp, P. E.; Getz, W. M. (2005). "Superspreading and the effect of individual variation on disease emergence". Nature. 438 (7066): 355–359. doi:10.1038/nature04153. PMC 7094981.
    The overdispersion parameter is usually denoted by the letter in epidemiology, rather than as here.
  4. ^ Casella, George; Berger, Roger L. (2002). Statistical inference (2nd ed.). Thomson Learning. p. 95. ISBN 0-534-24312-6.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search