In: Statistics and Probability
In the 19th century, cavalries were still an import- ant part of the European military complex. While horses have many wonderful qualities, they can be dangerous beasts, especially if poorly treated. The Prussian army kept track of the number of fatal- ities caused by horse kicks to members of 10 of their cavalry regiments over a 20-year time span. If these fatalities occurred independently and with equal probability for each regiment, then the number of deaths by horse kick per regiment per year should follow a Poisson distribution. On the other hand, if some regiments during some years consisted of particularly bad horsemen,11 then the events would not occur with equal probability, in which case we would expect a frequency dis- tribution different from the Poisson distribution. The following table shows the data, expressed as the number of fatalities per regiment-year (Bort- kiewicz 1898).
Number of Deaths (x) | Number of regiment-years |
0 | 109 |
1 | 65 |
2 | 22 |
3 | 3 |
4 | 1 |
>4 | 0 |
Total | 200 |
a. What is the mean number of deaths from horse kicks per
regiment-year?
b. Test whether a Poisson distribution fits these data.
(please break down each step of this as much as possible-- thank
you so much!)
Solution
Let X = Number of deaths from horse kicks per regiment-year and f(x) = Number of regiment-years with x number of deaths from horse kicks.
Back-up Theory
Mean of X = Σx.f(x), summed over all possible values of X. …………………………………………….. (1)
If a random variable X ~ Poisson(λ), i.e., X has Poisson Distribution with parameter λ then
probability mass function (pmf) of X is given by P(X = x) = e – λ.λx/(x!) ………………………………..(2)
where x = 0, 1, 2, ……. , ∞
Values of p(x) for various values of λ and x can be obtained by using Excel Function, POISSON(x,Mean,Cumulative) ………………………………………………………………………….. (3)
Mean = λ ...………………………………………………………………………………………………….. (4)
Variance = λ ………………………………………………………………………………………………… (5)
Now to work out the solution,
Part (a)
The given data and the preparatory calculations are tabulated below:
x |
f(x) |
x.f(x) |
0 |
109 |
0 |
1 |
65 |
65 |
2 |
22 |
44 |
3 |
3 |
9 |
4 |
1 |
4 |
> 4 |
0 |
0 |
Total |
200 |
122 |
Mean |
0.61 |
Part (b)
Fitting Poisson Distribution
Step 1: Estimate the mean from the given data – Already done under Part (a)
Step 2: Obtain the probabilities using (2) or (3)
Step 3: Multiply each probability under Step 2 by total frequency (n) to get expected frequencies.
Calculations using (3)
x |
Exp.prob |
Exp.freq |
0 |
0.543351 |
108.670174 |
1 |
0.331444 |
66.288806 |
2 |
0.10109 |
20.2180858 |
3 |
0.020555 |
4.11101079 |
4 |
0.003135 |
0.62692915 |
> 4 |
0.000425 |
0.085 |
Total |
1 |
200.000006 |
Now to test,
Null Hypothesis: H0: Poisson Distribution fits the given data Vs Alternative: HA: H0 is false.
Test statistic: χ2 = Σ{(O - E)2/E}, where O and E are respectively the observed and expected frequencies and sum is over all the given cells.
= 0.7904 as shown below:
x |
O |
E |
{(O - E)^2}/E |
0 |
109 |
108.670174 |
0.00100106 |
1 |
65 |
66.288806 |
0.02505734 |
2 |
22 |
20.2180858 |
0.1570484 |
3 |
3 |
4.11101079 |
0.3002534 |
4 |
1 |
0.62692915 |
0.22200573 |
> 4 |
0 |
0.085 |
0.085 |
Total |
200 |
200.000006 |
0.79036593 |
Under H0, χ2 ~ χ25
So, p-value = P(χ25 > 0.7904) = 0.9776 [using Excel Function of Chi-square Distribution]
Decision : Since p-value is very high, the null hypothesis is accepted.
Conclusion : There is strong evidence to suggest that
Poisson distribution fits the given data ANSWER
DONE