In: Statistics and Probability
Question one
A researcher in a large supermarket wishes to study sickness
absences among its employees. The
organisation has branches in all the provinces, each branch keeps
full records of sickness leave. A random sample of ten such
branches produced the following data showing the number of
days
of sickness per branch in the year 2017.
18 23 26 30 32 35 39 45 48 54
Required:
a) Using the above data
a). Calculate (manually and using the computer software such as
EXCEL, SPSS etc), a
95% confidence interval for the mean amount of sickness days per
branch.
b). Estimate the number of branches that should be included in a
simple random sample so that a 95% confidence interval for the mean
number of days sickness should not have a width greater than 4 days
5 marks
c)) After the sample was collected, it became apparent that the
branches fell into three natural groups in terms of sales-small,
medium and large. From the data on all of the branches in the
provinces, the researcher found that of 210 randomly selected
staff, 90 worked in small branches, 36 in medium sized branches,
and the rest worked in large branches. In total, 96 of the selected
staff had no days off for sickness, of which 52 worked in small
branches, and 29 worked in large sized branches.
i) Form a table showing the information clearly. 5 marks
ii) Carry out an appropriate statistical test to investigate
whether the size of branch influences the occurrence of sickness
absence, interpret your results clearly.
a). Calculate (manually and using the computer software such as
EXCEL, SPSS etc), a
95% confidence interval for the mean amount of sickness days per
branch.
The required confidence interval for the population mean by using excel is given as below:
Confidence Interval Estimate for the Mean |
|
Data |
|
Sample Standard Deviation |
11.5181017 |
Sample Mean |
35 |
Sample Size |
10 |
Confidence Level |
95% |
Intermediate Calculations |
|
Standard Error of the Mean |
3.642343568 |
Degrees of Freedom |
9 |
t Value |
2.2622 |
Interval Half Width |
8.2396 |
Confidence Interval |
|
Interval Lower Limit |
26.76 |
Interval Upper Limit |
43.24 |
b). Estimate the number of branches that should be included in a simple random sample so that a 95% confidence interval for the mean number of days sickness should not have a width greater than 4 days
WE are given
Margin of error = 4
Estimate for standard deviation = σ = 11.5181017
Confidence level = 95%
Critical Z value = 1.96
(by using z-table)
Sample size formula is given as below:
n = (Z*σ/E)^2
n = (1.96*11.5181017/4)^2
n = 31.85327
Required sample size = 32
c)) After the sample was collected, it became apparent that the
branches fell into three natural groups in terms of sales-small,
medium and large. From the data on all of the branches in the
provinces, the researcher found that of 210 randomly selected
staff, 90 worked in small branches, 36 in medium sized branches,
and the rest worked in large branches. In total, 96 of the selected
staff had no days off for sickness, of which 52 worked in small
branches, and 29 worked in large sized branches.
i) Form a table showing the information clearly.
The required table is given as below:
Sale |
||||
Small |
Medium |
Large |
Total |
|
No days off |
52 |
15 |
29 |
96 |
Off |
38 |
21 |
55 |
114 |
Total |
90 |
36 |
84 |
210 |
ii) Carry out an appropriate statistical test to investigate whether the size of branch influences the occurrence of sickness absence, interpret your results clearly.
Here, we have to use Chi square test for independence.
Null hypothesis: H0: The size of branch not influences the occurrence of sickness absence.
Alternative hypothesis: Ha: The size of branch influences the occurrence of sickness absence.
We assume level of significance = α = 0.05
Test statistic formula is given as below:
Chi square = ∑[(O – E)^2/E]
Where, O is observed frequencies and E is expected frequencies.
E = row total * column total / Grand total
We are given
Number of rows = r = 2
Number of columns = c = 3
Degrees of freedom = df = (r – 1)*(c – 1) = 1*2 = 2
α = 0.05
Critical value = 5.991465
(by using Chi square table or excel)
Calculation tables for test statistic are given as below:
Observed Frequencies |
||||
Column variable |
||||
Row variable |
Small |
Medium |
Large |
Total |
No off |
52 |
15 |
29 |
96 |
off |
38 |
21 |
55 |
114 |
Total |
90 |
36 |
84 |
210 |
Expected Frequencies |
||||
Column variable |
||||
Row variable |
Small |
Medium |
Large |
Total |
No off |
41.14286 |
16.45714 |
38.4 |
96 |
off |
48.85714 |
19.54286 |
45.6 |
114 |
Total |
90 |
36 |
84 |
210 |
Calculations |
||
(O - E) |
||
10.85714 |
-1.45714 |
-9.4 |
-10.8571 |
1.457143 |
9.4 |
(O - E)^2/E |
||
2.865079 |
0.129018 |
2.301042 |
2.412698 |
0.108647 |
1.937719 |
Chi square = ∑[(O – E)^2/E] = 9.754203
P-value = 0.007619
(By using Chi square table or excel)
P-value < α = 0.05
So, we reject the null hypothesis
There is sufficient evidence to conclude that the size of branch influences the occurrence of sickness absence.