In: Statistics and Probability
The Transactional Records Access Clearinghouse at Syracuse University reported data showing the odds of an Internal Revenue Service audit. The following table shows the average adjusted gross income reported (in dollars) and the percent of the returns that were audited for 20 selected IRS districts.
District | Adjusted Gross Income ($) |
Percent Audited |
---|---|---|
Los Angeles | 36,664 | 1.3 |
Sacramento | 38,845 | 1.1 |
Atlanta | 34,886 | 1.1 |
Boise | 32,512 | 1.1 |
Dallas | 34,531 | 1.0 |
Providence | 35,995 | 1.0 |
San Jose | 37,799 | 0.9 |
Cheyenne | 33,876 | 0.9 |
Fargo | 30,513 | 0.9 |
New Orleans | 30,174 | 0.9 |
Oklahoma City | 30,060 | 0.8 |
Houston | 37,153 | 0.8 |
Portland | 34,918 | 0.7 |
Phoenix | 33,291 | 0.7 |
Augusta | 31,504 | 0.7 |
Albuquerque | 29,199 | 0.6 |
Greensboro | 33,072 | 0.6 |
Columbia | 30,859 | 0.5 |
Nashville | 32,566 | 0.5 |
Buffalo | 34,296 | 0.5 |
(a)
Develop the estimated regression equation that could be used to predict the percent audited given the average adjusted gross income reported (in dollars). (Round your value for the y-intercept to three decimal places and your value for the slope to six decimal places.)
ŷ =
(b)
At the 0.05 level of significance, determine whether the adjusted gross income (in dollars) and the percent audited are related. (Use the F test.)
State the null and alternative hypotheses.
H0: β1 ≠ 0
Ha: β1 = 0
H0: β1 = 0
Ha: β1 ≠
0
H0: β0 ≠ 0
Ha: β0 = 0
H0: β1 ≥ 0
Ha: β1 < 0
H0: β0 = 0
Ha: β0 ≠ 0
Find the value of the test statistic. (Round your answer to two decimal places.)
Find the p-value. (Round your answer to three decimal places.)
p-value =
State your conclusion.
Do not reject H0. We cannot conclude that the relationship between the adjusted gross income (in dollars) and the percent audited is significant.
Do not reject H0. We conclude that the relationship between the adjusted gross income (in dollars) and the percent audited is significant.
Reject H0. We conclude that the relationship between the adjusted gross income (in dollars) and the percent audited is significant.
Reject H0. We cannot conclude that the relationship between the adjusted gross income (in dollars) and the percent audited is significant.
(c)
Did the estimated regression equation provide a good fit? Explain. (Round your answer to three decimal places.)
Since
r2 =
is ---Select--- less than 0.55 at least 0.55 , the estimated regression equation ---Select--- provided did not provide a good fit.
(d)
Use the estimated regression equation developed in part (a) to calculate a 95% confidence interval for the expected percent audited for districts with an average adjusted gross income of $37,000. (Round your answers to two decimal places.)
% to %
Let the regression equation is of the form,
y=β0+β1x+e
Where,
α is the shift from the origin and β is the intercept and e is the error
and, y: percent audited
x: gross adjusted income
a.
Now, we have to Develop the estimated regression equation that could be used to predict the percent audited given the average adjusted gross income reported (in dollars).
Let the ith observation be,
yi=α+βxi+ei ,here i=1(1)20
For this we estimated least square estimates of β0 and β1 by minimizing
i=120ei2=i=120(y-β0-β1x)2 =0
By solving the normal equations, we get,
β1=covx,ySx2 and β0=y-β1x
Sxx=1n-1i(xi-x)2 =variance of x
Syy=1n-1i(yi-y)2 =variance of y
covx,y=1nixiyi-xy =covariance b/w x and y.
Sxy=1n-1ixiyi-xy
We have done the analysis in MS Excel,
The required estimated regression equation is,
y(percent audit)=-0.50361+3.196x(gross income)
b.
Hypothesis to be tested:
Now, we have to test,
H0:ρ=0 ag. H1: ρ≠0
Test Statistic:
The test statistic for testing H0 is, F=MSRMSE follows Fα;n-1,k-1 ,
under H0
Here, k=2,n=20
And MSR=SSRk-1 where, SSR is sum of square due to regression,
i=120(y-y)2=SSR
And MSE=SSEn-1, SSE=i=120yi-y2,sum of square due to
Residual
Critical region:
We reject the null H0 if F>Fα;n-1,k-1
Conclusion:
Let, α=0.05 . Here, F=5.269 and F0.05;20-1,2-1=0.034
Hence, we reject the null H0 at 5% level of significance and conclude on the basis of the given data that Percent audit and average adjusted gross incomes are related.
Hypothesis to be tested for significance of β1 and β0 :
We want to test,
H01:β0=0 ag. H11:β0≠0
H02:β1=0 ag. H12:β1≠0
Test statistic:
The test statistic for testing H01 is,
t1=β0-β0SE(β0) follows t distribution with 18 degrees of freedom, under the null.
SEβ0=MSEi(Xi-X)2i=1nXi220
The test statistic for testing H02 is,
t2=β1-β1SE(β1) follows t distribution with 7-2=5 degrees of freedom, under the null.
SEβ1=MSEi(Xi-X)2
Critical Region:
Reject the null hypotheses H02 and H01 if |t1 |>t0.05;5 and |t2 |>t0.05;5 at 5 % level of significance.
Conclusion:
Here, t1=-0.8641 and t2=2.295 and t0.05;18=1.734
Hence, we reject the null hypothesis H01 and accept H02 at 5% level of significance and conclude on the basis of the given sample that Gross Adjusted Income has a significant effect on Percent audit and the intercept has no significant effect.
c.
Here, R^2=0.226<0.55 i.e. only 22.6% variability of the total variability is explained by the regression equation. Hence the model does not provide a good fit.
d.
Now, we have to calculate a 95% confidence interval for the expected percent audited for districts with an average adjusted gross income of $37,000.
The Confidence interval is given by,
y±t0.05;18MSEi(Xi-X)2i=1nXi220
Or, y±SE(β0)
Now, for x=37,000, y=118251.5
The 95% CI is given by,
118251.5±0.582816438
Gross Adjusted Income(x) | Percent($)Audited(y) | SUMMARY OUTPUT | |||||||||||||
36664 | 1.3 | ||||||||||||||
38845 | 1.1 | Regression Statistics | |||||||||||||
34886 | 1.1 | Multiple R | 0.475868552 | R_square | 0.22645088 | ||||||||||
32512 | 1.1 | R Square | 0.226450878 | ||||||||||||
34531 | 1 | Adjusted R Square | 0.183475927 | ||||||||||||
35995 | 1 | Standard Error | 0.207511207 | ||||||||||||
37799 | 0.9 | Observations | 20 | ||||||||||||
33876 | 0.9 | ||||||||||||||
30513 | 0.9 | ANOVA | |||||||||||||
30174 | 0.9 | df | SS | MS | F | Significance F | |||||||||
30060 | 0.8 | Regression | 1 | 0.22690378 | 0.226904 | 5.269369 | 0.03393498 | ||||||||
37153 | 0.8 | Residual | 18 | 0.77509622 | 0.043061 | ||||||||||
34198 | 0.7 | Total | 19 | 1.002 | |||||||||||
33291 | 0.7 | ||||||||||||||
31504 | 0.7 | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | ||||||
29199 | 0.6 | Intercept | -0.503614543 | 0.582816438 | -0.8641 | 0.398899 | -1.7280664 | 0.72083736 | -1.7280664 | 0.72083736 | |||||
33072 | 0.6 | Gross Adjusted Income(x) | 3.96913E-05 | 1.72908E-05 | 2.295511 | 0.033935 | 3.3646E-06 | 7.6018E-05 | 3.3646E-06 | 7.6018E-05 | |||||
30859 | 0.5 | ||||||||||||||
32566 | 0.5 | ||||||||||||||
34296 | 0.5 | ||||||||||||||
RESIDUAL OUTPUT | PROBABILITY OUTPUT | ||||||||||||||
Observation | Predicted Percent($)Audited(y) | Residuals | Standard Residuals | Percentile | Percent($)Audited(y) | 1.734064 | |||||||||
1 | 0.951628104 | 0.348371896 | 1.724813 | 2.5 | 0.5 | ||||||||||
2 | 1.038194878 | 0.061805122 | 0.306001 | 7.5 | 0.5 | ||||||||||
3.97E-05 | 3.196 | 3 | 0.881056933 | 0.218943067 | 1.084002 | 12.5 | 0.5 | ||||||||
4 | 0.786829733 | 0.313170267 | 1.550528 | 17.5 | 0.6 | ||||||||||
y_hat | 118251.5 | 5 | 0.866966513 | 0.133033487 | 0.658658 | 22.5 | 0.6 | ||||||||
6 | 0.925074609 | 0.074925391 | 0.370961 | 27.5 | 0.7 | ||||||||||
7 | 0.996677755 | -0.096677755 | -0.47866 | 32.5 | 0.7 | ||||||||||
8 | 0.840968697 | 0.059031303 | 0.292268 | 37.5 | 0.7 | ||||||||||
9 | 0.707486779 | 0.192513221 | 0.953146 | 42.5 | 0.8 | ||||||||||
10 | 0.694031421 | 0.205968579 | 1.019765 | 47.5 | 0.8 | ||||||||||
11 | 0.68950661 | 0.11049339 | 0.54706 | 52.5 | 0.9 | ||||||||||
12 | 0.971037161 | -0.171037161 | -0.84682 | 57.5 | 0.9 | ||||||||||
13 | 0.853749303 | -0.153749303 | -0.76122 | 62.5 | 0.9 | ||||||||||
14 | 0.817749273 | -0.117749273 | -0.58298 | 67.5 | 0.9 | ||||||||||
15 | 0.74682088 | -0.04682088 | -0.23181 | 72.5 | 1 | ||||||||||
16 | 0.655332382 | -0.055332382 | -0.27395 | 77.5 | 1 | ||||||||||
17 | 0.809056874 | -0.209056874 | -1.03506 | 82.5 | 1.1 | ||||||||||
18 | 0.721219977 | -0.221219977 | -1.09528 | 87.5 | 1.1 | ||||||||||
19 | 0.788973065 | -0.288973065 | -1.43073 | 92.5 | 1.1 | ||||||||||
20 | 0.857639052 | -0.357639052 | -1.7707 | 97.5 | 1.3 |