In: Statistics and Probability
1. A general insurance company is debating introducing a new screening programme to reduce the claim amounts that it needs to pay out. The programme consists of a much more detailed application form that takes longer for the new client department to process. The screening is applied to a test group of clients as a trial whilst other clients continue to fill in the old application form. It can be assumed that claim payments follow a normal distribution. The claim payments data for samples of the two groups of clients are (in K$00 per year):
Without screening 24.5 21.7 35.2 15.9 23.7 34.2 29.3 21.1 23.5 28.3
With screening 22.4 21.2 36.3 15.7 21.5 7.3 12.8 21.2 23.9 18.4
(i) (a) Calculate a 95% confidence interval for the difference between the mean claim amounts.
(b) Comment on your answer.
(ii) (a) Calculate a 95% confidence interval for the ratio of the population variances.
(b) Hence, comment on the assumption of equal variances required in part (i).
(iii)Assume that the sample sizes taken from the clients with and without screening are always equal to keep processing easy. Calculate the minimum sample size so that the width of a 95% confidence interval for the difference between mean claim amounts is less than 10, assuming that the samples have the same variances as in part (i).
(iv) Test the hypothesis that the new screening programme reduces the mean claim amount.
(v) Formally test the assumption of equal variances required in part (iv).
Answer 1 i a
Suppose X1 denotes without screening population and X2 denoted with screening population
let the means of the two populations( without and with screening) be represented by µ1 and µ2 , and let the standard deviations of the two populations be represented as σ1 and σ 2 .
Here we want to test H0 : µ1 - µ2 =0 against H1: Not H0
There are two formulas for calculating a confidence interval for the difference between two population means. The different formulas are based on whether the standard deviations are assumed to be equal or unequal.
Case 1 – Standard Deviations Assumed Equal
When σ1 = σ 2 = σ are unknown, the appropriate two-sided confidence interval for µ1 - µ2 is
Now from data we have,
Mean | Variance | |
X1 | 4.81 | 32.57 |
X2 | 5.22 | 52.59 |
Calculation of Pooled Variance:
and from t distribution table it is observed that
Hence 95% confidence interval is
Case 2 – Standard Deviations Assumed Unequal
When σ1 ≠ σ 2 are unknown, the appropriate two-sided confidence interval for µ1 – µ2 is
Now from above expression v=0.21 which could be approximated with zero.
Answer 1 i b
Discussion: The correct value of t to use for a 95% confidence interval with 18 degrees of freedom is 2.10.
We interpret this interval that the difference between the two population means is estimated to be(4.81-5.22) = - 0.41 and we are 95% confident that the true value lies between -14.86 and 14.04.
From the above value of v, we can not proceed further with unequal unknown variance.
Answer 1 ii a
For a ratio of two variances from normal distributions, a two-sided, 100(1 – α)% confidence interval is calculated by
Now from F table we have observed,
Hence the confidence interval is (0.15,2.49)
Answer 1 ii b
We interpret Answer 1 ii b interval that the ratio of two population variance is estimated to be 0.62 and we are 95% confident that the true value lies between 0.15 and 2.49.
Now to have equal variances, the ratio value should equal to 1.
Even we have seen that unequal variance calculations are inconsistant.
Hence the population is having equal mean.
Answer 1 iii
In studies where the plan is to estimate the difference in means between two independent populations, the formula for determining the sample sizes required in each comparison group is given below:
where ni is the sample size required in each group (i=1,2), Z is the value from the standard normal distribution reflecting the confidence level that will be used and E is the desired margin of error. σ again reflects the standard deviation of the outcome variable. Recall from the module on confidence intervals that, when we generated a confidence interval estimate for the difference in means, we used Sp, the pooled estimate of the common standard deviation, as a measure of variability in the outcome (based on pooling the data), where Sp is computed as follows:
Noe our ES=10
Hence ni = 3.64 which can be approximated by 4 (since sample size can not be in decimal )
Samples of size n1=4 and n2=4 will ensure that the 95% confidence interval for difference between mean claim amounts is less than 10.