









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The Poisson approximation to the binomial distribution
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!
®
In this Section we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and where the only information available is a measurement of its average value. This has widespread applications, for example in analysing traffic flow, in fault prediction on electric cables and in the prediction of randomly occurring accidents. We shall look at the Poisson distribution in two distinct ways. Firstly, as a distribution in its own right. This will enable us to apply statistical methods to a set of problems which cannot be solved using the binomial distribution. Secondly, as an approximation to the binomial distribution X ∼ B(n, p) in the case where n is large and p is small. You will find that this approximation can often save the need to do much tedious arithmetic.
Before starting this Section you should...
&
$
%
On completion you should be able to...
HELM (2008): Section 37.3: The Poisson Distribution
The probability of the outcome X = r of a set of Bernoulli trials can always be calculated by using the formula
P(X = r) = nCrqn−rpr
given above. Clearly, for very large values of n the calculation can be rather tedious, this is particularly so when very small values of p are also present. In the situation when n is large and p is small and the product np is constant we can take a different approach to the problem of calculating the probability that X = r. In the table below the values of P(X = r) have been calculated for various combinations of n and p under the constraint that np = 1. You should try some of the calculations for yourself using the formula given above for some of the smaller values of n.
Probability of X successes n p X = 0 X = 1 X = 2 X = 3 X = 4 X = 5 X = 6
Each of the binomial distributions given has a mean given by np = 1. Notice that the probabilities that X = 0, 1 , 2 , 3 , 4 ,... approach the values 0. 368 , 0. 368 , 0. 184 ,... as n increases.
If we have to determine the probabilities of success when large values of n and small values of p are involved it would be very convenient if we could do so without having to construct tables. In fact we can do such calculations by using the Poisson distribution which, under certain constraints, may be considered as an approximation to the binomial distribution.
By considering simplifications applied to the binomial distribution subject to the conditions
we can derive the formula
P(X = r) = e−λ^
λr r!
as an approximation to P(X = r) = nCrqn−rpr.
This is the Poisson distribution given previously. We now show how this is done. We know that the binomial distribution is given by
(q + p)n^ = qn^ + nqn−^1 p +
n(n − 1) 2!
qn−^2 p^2 + · · · +
n(n − 1)... (n − r + 1) r!
qn−rpr^ + · · · + pn
Condition (2) tells us that since p is small, q = 1 − p is approximately equal to 1. Applying this to the terms of the binomial expansion above we see that the right-hand side becomes
1 + np +
n(n − 1) 2!
p^2 + · · · +
n(n − 1)... (n − r + 1) r!
pr^ + · · · + pn
Workbook 37: Discrete Probability Distributions
We introduced the binomial distribution by considering the following scenario. A worn machine is known to produce 10% defective components. If the random vari- able X is the number of defective components produced in a run of 3 components, find the probabilities that X takes the values 0 to 3. Suppose now that a similar machine which is known to produce 1% defective components is used for a production run of 40 components. We wish to calculate the probability that two defective items are produced. Essentially we are assuming that X ∼ B(40, 0 .01) and are asking for P(X = 2). We use both the binomial distribution and its Poisson approximation for comparison.
Solution Using the binomial distribution we have the solution
P(X = 2) = 40 C 2 (0.99)^40 −^2 (0.01)^2 =
Note that the arithmetic involved is unwieldy. Using the Poisson approximation we have the solution
P(X = 2) = e−^0.^4
Note that the arithmetic involved is simpler and the approximation is reasonable.
In practice, we can use the Poisson distribution to very closely approximate the binomial distribution provided that the product np is constant with
n ≥ 100 and p ≤ 0. 05
Note that this is not a hard-and-fast rule and we simply say that
‘the larger n is the better and the smaller p is the better provided that np is a sensible size.’
The approximation remains good provided that np < 5 for values of n as low as 20.
Task Mass-produced needles are packed in boxes of 1000. It is believed that 1 needle in 2000 on average is substandard. What is the probability that a box contains 2 or more defectives? The correct model is the binomial distribution with n = 1000 , p =
(and q =
Workbook 37: Discrete Probability Distributions
®
(a) Using the binomial distribution calculate P(X = 0), P(X = 1) and hence P(X ≥ 2):
Your solution
Answer
∴ P(X = 0) + P(X = 1) = 0.60645 + 0.30338 = 0. 90983 ' 0. 9098 (4 d.p.)
Hence P(2 or more defectives) ' 1 − 0 .9098 = 0. 0902.
(b) Now choose a suitable value for λ in order to use a Poisson model to approximate the probabilities:
Your solution λ =
Answer λ = np = 1000 ×
Now recalculate the probability that there are 2 or more defectives using the Poisson distribution with λ = 12 :
Your solution P(X = 0) =
∴ P(2 or more defectives)=
Answer P(X = 0) = e−^
1 (^2) , P(X = 1) = 12 e−^ 1 2 ∴ P(X = 0) + P(X = 1) = 32 e−^
(^12) = 0. 9098 (4 d.p.) Hence P(2 or more defectives) ' 1 − 0 .9098 = 0. 0902.
HELM (2008): Section 37.3: The Poisson Distribution
®
The Poisson distribution is a probability model which can be used to find the probability of a single event occurring a given number of times in an interval of (usually) time. The occurrence of these events must be determined by chance alone which implies that information about the occurrence of any one event cannot be used to predict the occurrence of any other event. It is worth noting that only the occurrence of an event can be counted; the non-occurrence of an event cannot be counted. This contrasts with Bernoulli trials where we know the number of trials, the number of events occurring and therefore the number of events not occurring.
The Poisson distribution has widespread applications in areas such as analysing traffic flow, fault pre- diction in electric cables, defects occurring in manufactured objects such as castings, email messages arriving at a computer and in the prediction of randomly occurring events or accidents. One well known series of accidental events concerns Prussian cavalry who were killed by horse kicks. Although not discussed here (death by horse kick is hardly an engineering application of statistics!) you will find accounts in many statistical texts. One example of the use of a Poisson distribution where the events are not necessarily time related is in the prediction of fault occurrence along a long weld - faults may occur anywhere along the length of the weld. A similar argument applies when scanning castings for faults - we are looking for faults occurring in a volume of material, not over an interval if time.
The following definition gives a theoretical underpinning to the Poisson distribution.
Suppose that events occur at random throughout an interval. Suppose further that the interval can be divided into subintervals which are so small that:
then the random experiment is known as a Poisson process.
The word ‘process’ is used to suggest that the experiment takes place over time, which is the usual case. If the average number of events occurring in the interval (not subinterval) is λ (> 0) then the random variable X representing the actual number of events occurring in the interval is said to have a Poisson distribution and it can be shown (we omit the derivation) that
P(X = r) = e−λ^
λr r!
r = 0, 1 , 2 , 3 ,...
The following Key Point provides a summary.
HELM (2008): Section 37.3: The Poisson Distribution
The Poisson Probabilities
If X is the random variable ‘number of occurrences in a given interval’
for which the average rate of occurrence is λ then, according to the Poisson model, the probability of r occurrences in that interval is given by
P(X = r) = e−λ^
λr r!
r = 0, 1 , 2 , 3 ,...
Task Using the Poisson distribution P(X = r) = e−λ^
λr r!
write down the formulae for P(X = 0), P(X = 1), P(X = 2) and P(X = 6), noting that 0! = 1.
Your solution P(X = 0) =
Answer
P(X = 0) = e−λ^ ×
λ^0 0!
= e−λ^ ×
≡ e−λ^ P(X = 1) = e−λ^ ×
λ 1!
= λe−λ
P(X = 2) = e−λ^ ×
λ^2 2!
λ^2 2
e−λ^ P(X = 6) = e−λ^ ×
λ^6 6!
λ^6 720
e−λ
Workbook 37: Discrete Probability Distributions
Calculate the value for P(X = 6) to extend the Table in the previous Task using the recurrence relation and the value for P(X = 5).
Solution The recurrence relation gives the formula
P(X = 6) =
We now look further at the Poisson distribution by considering an example based on traffic flow.
Suppose it has been observed that, on average, 180 cars per hour pass a specified point on a particular road in the morning rush hour. Due to impending roadworks it is estimated that congestion will occur closer to the city centre if more than 5 cars pass the point in any one minute. What is the probability of congestion occurring?
Solution We note that we cannot use the binomial model since we have no values of n and p. Essentially we are saying that there is no fixed number (n) of cars passing the specified point and that we have no way of estimating p. The only information available is the average rate at which cars pass the specified point. Let X be the random variable X = number of cars arriving in any minute. We need to calculate the probability that more than 5 cars arrive in any one minute. Note that in order to do this we need to convert the information given on the average rate (cars arriving per hour) into a value for λ (cars arriving per minute). This gives the value λ = 3.
Using λ = 3 to calculate the required probabilities gives:
r 0 1 2 3 4 5 Sum P(X = r) 0.04979 0.149361 0.22404 0.22404 0.168031 0.10082 0.
To calculate the required probability we note that P(more than 5 cars arrive in one minute) = 1 − P(5 cars or less arrive in one minute) Thus
P(X > 5) = 1 − P(X ≤ 5) = 1 − P(X = 0) − P(X = 1) − P(X = 2) − P(X = 3) − P(X = 4) − P(X = 5)
Then P(more than 5) = 1 − 0 .91608 = 0.08392 = 0. 0839 (4 d.p).
Workbook 37: Discrete Probability Distributions
®
The mean number of bacteria per millilitre of a liquid is known to be 6. Find the probability that in 1 ml of the liquid, there will be: (a) 0, (b) 1, (c) 2, (d) 3, (e) less than 4, (f) 6 bacteria.
Solution Here we have an average rate of occurrences but no estimate of the probability so it looks as though we have a Poisson distribution with λ = 6. Using the formula in Key Point 7 we have:
(a) P(X = 0) = e−^6
That is, the probability of having no bacteria in 1 ml of liquid is 0.
(b) P(X = 1) =
λ 1
That is, the probability of having 1 bacteria in 1 ml of liquid is 0.
(c) P(X = 2) =
λ 2
That is, the probability of having 2 bacteria in 1 ml of liquid is 0.
(d) P(X = 3) =
λ 3
That is, the probability of having 3 bacteria in 1 ml of liquid is 0. (e) P(X < 4) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0. 1512
(f) P(X = 6) = e−^6
Note that in working out the first 6 answers, which link together, all the digits were kept in the calculator to ensure accuracy. Answers were rounded off only when written down.
Never copy down answers correct to, say, 4 decimal places and then use those rounded figures to calculate the next figure as rounding-off errors will become greater at each stage. If you did so here you would get answers 0.0025, 0.0150, 0.0450, 0.9000 and P(X < 4) = 0. 1525. The difference is not great but could be significant.
HELM (2008): Section 37.3: The Poisson Distribution
®
The expectation and variance of the Poisson distribution can be derived directly from the definitions which apply to any discrete probability distribution. However, the algebra involved is a little lengthy. Instead we derive them from the binomial distribution from which the Poisson distribution is derived.
Intuitive Explanation
One way of deriving the mean and variance of the Poisson distribution is to consider the behaviour of the binomial distribution under the following conditions:
Recalling that the expectation and variance of the binomial distribution are given by the results
E(X) = np and V(X) = np(1 − p) = npq
it is reasonable to assert that condition (2) implies, since q = 1 − p, that q is approximately 1 and so the expectation and variance are given by
E(X) = np and V(X) = npq ≈ np
In fact the algebraic derivation of the expectation and variance of the Poisson distribution shows that these results are in fact exact.
Note that the expectation and the variance are equal.
The Poisson Distribution If X is the random variable {number of occurrences in a given interval}
for which the average rate of occurrences is λ and X can assume the values 0 , 1 , 2 , 3 ,... and the probability of r occurrences in that interval is given by
P(X = r) = e−λ^
λr r! then the expectation and variance of the distribution are given by the formulae
E(X) = λ and V(X) = λ For a Poisson distribution the Expectation and Variance are equal.
HELM (2008): Section 37.3: The Poisson Distribution
What is the probability that a sheet 5 m × 8 m will have at most one fault?
(a) Find the probability that there are 4 defective transistors in a batch of 2000. (b) What is the largest number, N , of transistors that can be put in a box so that the probability of no defectives is at least 1/2?
(a) the expected total number of failures in the factory in a year? (b) the probability that there are fewer than two failures in the factory in a year?
Workbook 37: Discrete Probability Distributions
Answers
P(4 or fewer defectives in sample of 100)
= P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)
= e−^2 + 2e−^2 +
e−^2 +
e−^2 +
e−^2 = 0. 947347
Inspection costs
Cost c 75 75 × 50 P(X = c) 0.947347 0.
E(Cost) = 75(0.947347) + 75 × 50(0.0526) = 268. 5 p
(a) E(X) = 10 × 0 .4 = 4. (b) P(X < 2) = P(X = 0) + P(X = 1) = e−^4 + 4e−^4 = 5e−^4 = 0. 0916.
(a) P(X = 3) =
e−^553 3!
(b) Let R be the number of replacements made.
and
P(X ≥ 5) = 1 − [P(X = 0) + · · · + P(X = 4)]
so E(R) = 5 − 5 × P(X = 0) − · · · − 1 × P(X = 4)
= 5 − e−^5
Workbook 37: Discrete Probability Distributions