# 39. Statistics – Distribution

In previous calibration chapter, statistical distribution indicates the probably of when certain events will occur. But there are many statistical distribution which can be referred to calculate based on the nature of events and conditions. This section will breakdown the types of distribution and classification of when to use these distributions.

For statistical distribution, the two major separation is that an event’s probability can be either discrete or continuous. And the critical components to comply with distribution are the averages and variance from distribution. The following table will illustrate the differences between discrete and continuous distributions.

*Distribution Comparison*

For discrete distribution, the following distributions are listed below for applying based on the discrete condition

Usually the distribution indicates success or fail. If the event successfully happened then the probability is 1 while the probability is 0 when fails.

Binomial occurs when each trial is an independent Bernoulli event with a successful probability of p (while x is the amount of successful trials), and then the distribution is valid with n total amount of trials. And binomial distribution’s Excel parameter is BINOMDIST (x,n,p,FALSE), where FALSE indicate the single probability (True is cumulative probability)

If p is the successful tries’ probability, then the probability having X amount of failures before the rth success is the distribution for negative binomial. The negative binomial distribution’s Excel command is NEGBINOMDIST(x,r,p)

Hypergeometric distribution occurs if there’s a finite population with a size of N, and there are M success case. By sampling the population via sampling without replacement, and the sample size is n with X successful population.

To illustrate the example, let’s calculate the probability of having a diamond when randomly selecting 6 cards out of the poker deck.

The Excel function would be HYPGEOMDIST(X,n,M,N) which is equivalent to HYPGEOMDIST(1,6,13,52) and will be roughly 37%.

And by having infinite amount of population size N, then the hypergeometric distribution is equivalent to binomial distribution.

If the success amount (λ) occurred within the time frame is well known and it’s also relevant with the time duration. And the success will not occur twice at the same time, then the X amount of success is for Poisson distribution.

For example, if there’s 2 vehicles passed the toll booth, and vehicle stays in the toll booth for 5 minutes. What’s the probability of 6 vehicles passed the toll booth? How about the probability for 6 or more vehicles passing by the toll booth?

By using the EXCEL expression, POISSON(λ,X,FALSE) where FALSE calculates the opportunities of 6 vehicles and 10 vehicles successfully passed (2 vehicles multiply 5 minutes). So the probability for 6 cars would be 6%

And if using true at the end of command (while change λ=5) , it will accumulate the probability from 0 to 5 cars. Then subtract from 1 would be the 93.3%

If a random variance has the same probability then it will be applied with discrete uniform distribution. For instance, rolling one dice is a perfect example.

*Discrete Distribution Equation Setup*

Meanwhile, continuous distribution will have the following types along with associated probability, averages and variance.

Applies when if there’s continuous variables and the value lays between value a and value b. And assume the probability of occurrence for each point is the same.

For example, when the average interview time is between 5 to 15 minutes, what’s the probability for interview which can be completed between 9 to 11 minutes.

The calculation will be (11-9) * (1 / (15-5)) which is roughly 20%.

Normal Distribution is the most common distribution in statistics. And when the n is close to infinite for binomial distribution, then it will be close to normal distribution (when p is close to 0.5) or np and np(1-p) is greater than 5.

When Poisson’s density (λ >10), then the normal distribution can be applied.

This applies when normal distribution’s average is 0 and the variance is 1. It is also abbreviated as Z distribution, normal distribution’s transition to Z distribution is to let the variable subtract the average (μ) and divide the standard deviation (σ) to form the Z score to calculate.

Generally used to calculate the waiting time, so for Poisson distribution’s successful event is λ, then the 1st successful event time’s average will be β.

and if you want to wait until the nth successful event then α = n. And the waiting time will be Gamma Distribution.

For the Excel command, GAMMADIST(x, α ,β,cumulative) is the command for it where if cumulative is true then it will add up. Otherwise type false to calculate the single probability of gamma distribution.

Exponential is derived from Gamma distribution where α = 1 and β = 1/λ. And Exponential is calculating the 1st successful event’s occurring time instead of calculating the nth successful event like Gamma Distribution.

For the Excel command, EXPONDIST(x, λ,cumulative) is the command for it where if cumulative is true then it will add up. Otherwise type false to calculate the single probability of gamma distribution.

Another common distribution derived from Gamma Distribution, this case α = ν/2 and β = 2 where ν is the degrees of freedom within the chi-square distribution. This is when to use the matching level between the inspected data and behavior.

For the Excel commands, it generally goes to CHISQ.DIST(x,ν,cumulative) to calculate the probability CHISQ.INV(p,ν) get the left tailed probability of the chi square distribution.

*Continuous** Distribution Equation Setup*

Lastly, the expected value of discrete variable will be described below