In: Statistics and Probability
A study is conducted to compare salaries of managers of a certain industry employed
in two areas of the country, the eastern and northern regions. Independent ran
dom samples of 300 plant managers are selected for each of the two regions. These
managers were asked for their annual salaries. The results showed an average salary
x1 = $102300 and standard deviation s1 = $5700 for the eastern region and an average
salary x2 = $98500 and standard deviation s2 = $3800 for the northern region.
(a) Assuming normality for the distributions of annual salaries and the equality of
variances, construct a 99% confifidence interval for µ1 1 µ2, the difffference in the
mean salaries.
(b) Is the assumption of normality necessary? Why or why not?
(c) Is the assumption of equality of variances reasonable? Assume they are unequal
and obtain a 95% confifidence interval for the ratio of the two variances!
(d) If your answers to the previous questions are negative, compute a new 99%
confifidence interval for µ1 1 µ2 with the correct assumptions. Compare with the
result in (a).
(e) Let us assume that the data have not been collected yet. Let us also assume
that previous knowledge suggests that σ1 = σ2 = $4000. Are the sample sizes
of 300 suffiffifficient to produce a 95% confifidence interval on µ1 1 µ2 having a width
of only $ 1000?
a.
TRADITIONAL METHOD
given that,
mean(x)=102300
standard deviation , s.d1=5700
number(n1)=300
y(mean)=98500
standard deviation, s.d2 =3800
number(n2)=300
I.
calculate pooled variance s^2= (n1-1*s1^2 + n2-1*s2^2
)/(n1+n2-2)
s^2 = (299*32490000 + 299*14440000) / (600- 2 )
s^2 = 23465000
II.
standard error = sqrt(S^2(1/n1+1/n2))
=sqrt( 23465000 * (1/300+1/300) )
=395.517
III.
margin of error = t a/2 * (standard error)
where,
t a/2 = t -table value
level of significance, α = 0.01
from standard normal table, two tailed and value of |t α| with
(n1+n2-2) i.e 598 d.f is 2.584
margin of error = 2.584 * 395.517
= 1022.015
IV.
CI = (x1-x2) ± margin of error
confidence interval = [ (102300-98500) ± 1022.015 ]
= [2777.985 , 4822.015]
-----------------------------------------------------------------------------------------------
DIRECT METHOD
given that,
mean(x)=102300
standard deviation , s.d1=5700
sample size, n1=300
y(mean)=98500
standard deviation, s.d2 =3800
sample size,n2 =300
CI = x1 - x2 ± t a/2 * sqrt ( s^2 ( 1 / n1 + 1 /n2 ) )
where,
x1,x2 = mean of populations
s^2 = pooled variance
n1,n2 = size of both
a = 1 - (confidence Level/100)
ta/2 = t-table value
CI = confidence interval
CI = [( 102300-98500) ± t a/2 * sqrt( 23465000 * (1/300+1/300)
]
= [ (3800) ± 1022.015 ]
= [2777.985 , 4822.015]
-----------------------------------------------------------------------------------------------
interpretations:
1. we are 99% sure that the interval [2777.985 , 4822.015]contains
the true population proportion
2. If a large number of samples are collected, and a confidence
interval is created
for each sample, 99% of these intervals will contains the true
population proportion
b.
Normality. The normality assumption is one of the most
misunderstood in all of statistics.
When the sample size is sufficiently large (>200),
the normality assumption is not needed at all as the Central Limit
Theorem ensures that the distribution of disturbance term will
approximate normality.
c.
CONFIDENCE INTERVAL FOR VARIANCE
ci = (n-1) s^2 / ᴪ^2 right < σ^2 < (n-1) s^2 / ᴪ^2 left
where,
s^2 = variance
ᴪ^2 right = (1 - confidence level)/2
ᴪ^2 left = 1 - ᴪ^2 right
n = sample size
since aplha =0.05
ᴪ^2 right = (1 - confidence level)/2 = (1 - 0.95)/2 = 0.05/2 =
0.025
ᴪ^2 left = 1 - ᴪ^2 right = 1 - 0.025 = 0.975
the two critical values ᴪ^2 left, ᴪ^2 right at 299 df are 348.7943
, 252.992
variacne( s^2 )=2.25
sample size(n)=300
confidence interval = [ 299 * 2.25/348.7943 < σ^2 < 299 *
2.25/252.992 ]
= [ 672.75/348.7943 < σ^2 < 672.75/252.9924 ]
[ 1.9288 , 2.6592 ]
95% confifidence interval for the ratio of the two variances = [
1.9288 , 2.6592 ]
d.
TRADITIONAL METHOD
given that,
mean(x)=102300
standard deviation , s.d1=5700
number(n1)=300
y(mean)=98500
standard deviation, s.d2 =3800
number(n2)=300
I.
standard error = sqrt(s.d1^2/n1)+(s.d2^2/n2)
where,
sd1, sd2 = standard deviation of both
n1, n2 = sample size
standard error = sqrt((32490000/300)+(14440000/300))
= 395.517
II.
margin of error = t a/2 * (standard error)
where,
t a/2 = t -table value
level of significance, α = 0.01
from standard normal table, two tailed and
value of |t α| with min (n1-1, n2-1) i.e 299 d.f is 2.592
margin of error = 2.592 * 395.517
= 1025.179
III.
CI = (x1-x2) ± margin of error
confidence interval = [ (102300-98500) ± 1025.179 ]
= [2774.821 , 4825.179]
-----------------------------------------------------------------------------------------------
DIRECT METHOD
given that,
mean(x)=102300
standard deviation , s.d1=5700
sample size, n1=300
y(mean)=98500
standard deviation, s.d2 =3800
sample size,n2 =300
CI = x1 - x2 ± t a/2 * Sqrt ( sd1 ^2 / n1 + sd2 ^2 /n2 )
where,
x1,x2 = mean of populations
sd1,sd2 = standard deviations
n1,n2 = size of both
a = 1 - (confidence Level/100)
ta/2 = t-table value
CI = confidence interval
CI = [( 102300-98500) ± t a/2 *
sqrt((32490000/300)+(14440000/300)]
= [ (3800) ± t a/2 * 395.517]
= [2774.821 , 4825.179]
-----------------------------------------------------------------------------------------------
interpretations:
1. we are 99% sure that the interval [2774.821 , 4825.179] contains
the true population proportion
2. If a large number of samples are collected, and a confidence
interval is created
for each sample, 99% of these intervals will contains the true
population proportion
e.
TRADITIONAL METHOD
given that,
mean(x)=102300
standard deviation , σ1 =4000
population size(n1)=300
y(mean)=98500
standard deviation, σ2 =4000
population size(n2)=300
I.
standard error = sqrt(s.d1^2/n1)+(s.d2^2/n2)
where,
sd1, sd2 = standard deviation of both
n1, n2 = sample size
standard error = sqrt((16000000/300)+(16000000/300))
= 326.5986
II.
margin of error = Z a/2 * (standard error)
where,
Za/2 = Z-table value
level of significance, α = 0.05
from standard normal table, two tailed z α/2 =1.96
since our test is two-tailed
value of z table is 1.96
margin of error = 1.96 * 326.5986
= 640.1333
III.
CI = (x1-x2) ± margin of error
confidence interval = [ (102300-98500) ± 640.1333 ]
= [3159.8667 , 4440.1333]
-----------------------------------------------------------------------------------------------
DIRECT METHOD
given that,
mean(x)=102300
standard deviation , σ1 =4000
number(n1)=300
y(mean)=98500
standard deviation, σ2 =4000
number(n2)=300
CI = x1 - x2 ± Z a/2 * Sqrt ( sd1 ^2 / n1 + sd2 ^2 /n2 )
where,
x1,x2 = mean of populations
sd1,sd2 = standard deviations
n1,n2 = size of both
a = 1 - (confidence Level/100)
Za/2 = Z-table value
CI = confidence interval
CI = [ ( 102300-98500) ±Z a/2 * Sqrt(
16000000/300+16000000/300)]
= [ (3800) ± Z a/2 * Sqrt( 106666.6667) ]
= [ (3800) ± 1.96 * Sqrt( 106666.6667) ]
= [3159.8667 , 4440.1333]
-----------------------------------------------------------------------------------------------
interpretations:
1. we are 95% sure that the interval [3159.8667 , 4440.1333]
contains the difference between
true population mean U1 - U2
2. If a large number of samples are collected, and a confidence
interval is created
for each sample, 95% of these intervals will contains the
difference between
true population mean U1 - U2
3. Since this Cl does contain a zero we can conclude at 0.05 true
mean
difference is zero
Answer:
95% sure that the interval [3159.8667 , 4440.1333]$ so that,
95% confifidence interval on µ1 1 µ2 having a width of only $
1000