In: Statistics and Probability
A statistical program is recommended.
A study investigated the relationship between audit delay (Delay), the length of time from a company's fiscal year-end to the date of the auditor's report, and variables that describe the client and the auditor. Some of the independent variables that were included in this study follow.
Industry | A dummy variable coded 1 if the firm was an industrial company or 0 if the firm was a bank, savings and loan, or insurance company. |
---|---|
Public | A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0. |
Quality | A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from "virtually none" (1) to "excellent" (5). |
Finished | A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates "all work performed subsequent to year-end" and 4 indicates "most work performed prior to year-end." |
A sample of 40 companies provided the following data.
Delay | Industry | Public | Quality | Finished |
---|---|---|---|---|
62 | 0 | 0 | 3 | 1 |
45 | 0 | 1 | 3 | 3 |
54 | 0 | 0 | 2 | 2 |
71 | 0 | 1 | 1 | 2 |
91 | 0 | 0 | 1 | 1 |
62 | 0 | 0 | 4 | 4 |
61 | 0 | 0 | 3 | 2 |
69 | 0 | 1 | 5 | 2 |
80 | 0 | 0 | 1 | 1 |
52 | 0 | 0 | 5 | 3 |
47 | 0 | 0 | 3 | 2 |
65 | 0 | 1 | 2 | 3 |
60 | 0 | 0 | 1 | 3 |
81 | 1 | 0 | 1 | 2 |
73 | 1 | 0 | 2 | 2 |
89 | 1 | 0 | 2 | 1 |
71 | 1 | 0 | 5 | 4 |
76 | 1 | 0 | 2 | 2 |
68 | 1 | 0 | 1 | 2 |
68 | 1 | 0 | 5 | 2 |
86 | 1 | 0 | 2 | 2 |
76 | 1 | 1 | 3 | 1 |
67 | 1 | 0 | 2 | 3 |
57 | 1 | 0 | 4 | 2 |
55 | 1 | 1 | 3 | 2 |
54 | 1 | 0 | 5 | 2 |
69 | 1 | 0 | 3 | 3 |
82 | 1 | 0 | 5 | 1 |
94 | 1 | 0 | 1 | 1 |
74 | 1 | 1 | 5 | 2 |
75 | 1 | 1 | 4 | 3 |
69 | 1 | 0 | 2 | 2 |
71 | 1 | 0 | 4 | 4 |
79 | 1 | 0 | 5 | 2 |
80 | 1 | 0 | 1 | 4 |
91 | 1 | 0 | 4 | 1 |
92 | 1 | 0 | 1 | 4 |
46 | 1 | 1 | 4 | 3 |
72 | 1 | 0 | 5 | 2 |
85 | 1 | 0 | 5 | 1 |
(a) Develop the estimated regression equation using all of the independent variables. Use x1 for Industry, x2 for Public, x3 for Quality, and x4 for Finished. (Round your numerical values to two decimal places.)
ŷ =
(b) Did the estimated regression equation developed in part (a) provide a good fit? Explain. (Use α = 0.05. For purposes of this exercise, consider an adjusted coefficient of determination value high if it is at least 50%.)
No, testing for significance shows that all independent variables except Public are not significant.
Yes, testing for significance shows that the overall model is significant and all the individual independent variables are significant.
Yes, the low p-value and high value of the adjusted coefficient of determination indicate a good fit.
No, the low value of the adjusted coefficient of determination does not indicate a good fit.
(c) Develop a scatter diagram showing Delay as a function of Finished.
What does this scatter diagram indicate about the relationship between Delay and Finished?
The scatter diagram suggests a linear relationship between these two variables.
The scatter diagram suggests no relationship between these two variables.
The scatter diagram suggests a curvilinear relationship between these two variables.
(d) On the basis of your observations about the relationship between Delay and Finished, use best-subsets regression to develop an alternative estimated regression equation to the one developed in (a) to explain as much of the variability in Delay as possible. Use x1 for Industry, x2 for Public, x3 for Quality, and x4 for Finished. (Round your numerical values to two decimal places.)
ŷ =
(a)
R CODE:
Delay<-
c(62,45,54,71,91,62,61,69,80,52,47,65,60,81,73,89,71,76,68,68,86,76,67,57,55,54,69,82,94,74,75,69,71,79,80,91,92,46,72,85)
Industry<- c(rep(0,times=12),rep(1,times=28))
Public<-
c(0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0)
Quality<-
c(3,3,2,1,1,4,3,5,1,5,3,2,1,1,2,2,5,2,1,5,2,3,2,4,3,5,3,5,1,5,4,2,4,5,1,4,1,4,5,5)
Finished<-
c(1,3,2,2,1,4,2,2,1,3,2,3,3,2,2,1,4,2,2,2,2,1,3,2,2,2,3,1,1,2,3,2,4,2,4,1,4,3,2,1)
summary(lm(Delay~Industry+Public+Quality+Finished))
R OUTPUT:
Call:
lm(formula = Delay ~ Industry + Public + Quality + Finished)
Residuals:
Min 1Q Median 3Q Max
-18.3562 -7.7477 0.7713 7.9524 20.2849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 80.589 6.165 13.072 4.99e-15 ***
Industry 10.775 3.978 2.709 0.0104 *
Public -4.745 4.375 -1.084 0.2856
Quality -2.316 1.205 -1.922 0.0628 .
Finished -4.333 1.909 -2.270 0.0295 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.25 on 35 degrees of freedom
Multiple R-squared: 0.3453, Adjusted R-squared: 0.2705
F-statistic: 4.615 on 4 and 35 DF, p-value: 0.004248
ANSWER:
The regression equation is given by
(b) The answer for this is provided in the R CODE and R OUTPUT of part (a).
No, the low value of adjusted coefficient of determination does not indicate a good fit.
(c)
R CODE:
plot(Finished,Delay)
R OUTPUT:
The scatter diagram suggests a curvillinear relationship between these two variables.
(d)
A multiplicative model may be considered.
R CODE:
summary(lm(Delay~Industry*Public*Quality*Finished))
R OUTPUT:
lm(formula = Delay ~ Industry * Public * Quality * Finished)
Residuals:
Min 1Q Median 3Q Max
-18.576 -4.810 0.000 5.833 16.884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 129.352 23.998 5.390 1.55e-05 ***
Industry -45.166 26.962 -1.675 0.1069
Public -124.852 92.919 -1.344 0.1916
Quality -22.480 8.355 -2.691 0.0128 *
Finished -30.081 16.441 -1.830 0.0798 .
Industry:Public 211.166 172.887 1.221 0.2338
Industry:Quality 22.250 9.030 2.464 0.0213 *
Public:Quality 60.980 34.255 1.780 0.0877 .
Industry:Finished 28.626 17.092 1.675 0.1070
Public:Finished 63.581 45.157 1.408 0.1720
Quality:Finished 8.654 4.226 2.048 0.0517 .
Industry:Public:Quality -85.250 54.211 -1.573 0.1289
Industry:Public:Finished -134.126 92.038 -1.457 0.1580
Industry:Quality:Finished -9.410 4.443 -2.118 0.0447 *
Public:Quality:Finished -28.154 16.579 -1.698 0.1024
Industry:Public:Quality:Finished 45.910 27.578 1.665 0.1090
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11 on 24 degrees of freedom
Multiple R-squared: 0.5709, Adjusted R-squared: 0.3026
F-statistic: 2.128 on 15 and 24 DF, p-value: 0.04785
ANSWER:
The regression equation is given by
Observe that the adjusted coefficient of determination considerably increases and hence this is a better model than the one formulated in (a)
Hopefully this will help you. If you are required to solve this in any other software, let me know; I shall solve it by that method. If you are satisfied with the answer, give it a like. Thanks.