In: Statistics and Probability
4. The Transactional Records Access Clearinghouse at Syracuse University reported data showing the chance of an Internal Revenue Service (IRS) audit. The data in file IRSAudit.xlsx show the average adjusted gross income reported (in $ thousands) and the percent of the returns that were audited for 20 selected IRS districts.
District | Adjusted Gross Income | Percent Audited |
Los Angeles | 36.664 | 1.3 |
Sacramento | 38.845 | 1.1 |
Atlanta | 34.886 | 1.1 |
Boise | 32.512 | 1.1 |
Dallas | 34.531 | 1.0 |
Providence | 35.995 | 1.0 |
San Jose | 37.799 | 0.9 |
Cheyenne | 33.876 | 0.9 |
Fargo | 30.513 | 0.9 |
New Orleans | 30.174 | 0.9 |
Oklahoma City | 30.060 | 0.8 |
Houston | 37.153 | 0.8 |
Portland | 34.918 | 0.7 |
Phoenix | 33.291 | 0.7 |
Augusta | 31.504 | 0.7 |
Albuquerque | 29.199 | 0.6 |
Greensboro | 33.072 | 0.6 |
Columbia | 30.859 | 0.5 |
Nashville | 32.566 | 0.5 |
Buffalo | 34.296 | 0.5 |
Hint: Select Visualizing data > Scatter plots, select cells B1:B21 for X and select cells C1:C21 for Y. Click “Options” and check “Regression lines.”
Hint: Select Modeling data > Linear regression, select cells C1:C21 for “Y / Dependent variables: Quantitative” and select cells B1:B21 for “X / Explanatory variables: Quantitative.”
Hint: You can use either a t-test or an F-test to answer this question. In your answer state the hypotheses, test statistic, p-value, decision, and conclusion.
Ans a ) The scatter plot is
the scatterplot indicate that there is a positive relationship between the average adjusted gross income reported and the percent of the returns that were audited.
b ) the output of regression analysis
we have
Simple Linear Regression Analysis | ||||||
Regression Statistics | ||||||
Multiple R | 0.4659 | |||||
R Square | 0.2171 | |||||
Adjusted R Square | 0.1736 | |||||
Standard Error | 0.2088 | |||||
Observations | 20 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 0.2175 | 0.2175 | 4.9901 | 0.0384 | |
Residual | 18 | 0.7845 | 0.0436 | |||
Total | 19 | 1.0020 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -0.4710 | 0.5842 | -0.8061 | 0.4307 | -1.6984 | 0.7565 |
Adjusted Gross Income | 0.0387 | 0.0173 | 2.2339 | 0.0384 | 0.0023 | 0.0751 |
the estimated regression equation is
the percent audited for a district = -0.4710 +0.0387 * Adjusted Gross Income
c )let the null and alternative hypothesis
Ho:there is not a significant linear relationship between the two variables
Ha: there is a significant linear relationship between the two variables
the value of test stat is 4.99
p value = 0.0384
t since p value is less than 0.05 so we conclude that there is a significant linear relationship between the two variables.
d ) for adjusted gross income of $35,000.
the percent audited for a district = -0.4710 +0.0387*35 = 0.8835