In: Statistics and Probability
A study investigated the relationship between audit delay (Delay), the length of time from a company's fiscal year-end to the date of the auditor's report, and variables that describe the client and the auditor. The independent variables are as follows.
Industry A dummy variable coded 1 if the firm was an industrial company or 0 if the firm was a bank, savings and loan, or insurance company.
Public A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0.
Quality A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from "virtually none" (1) to "excellent" (5).
Finished A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates "all work performed subsequent to year-end" and 4 indicates "most work performed prior to year-end."
A sample of 40 companies provided the following data.
Delay | Industry | Public | Quality | Finished |
62 | 0 | 0 | 3 | 1 |
45 | 0 | 1 | 3 | 3 |
54 | 0 | 0 | 2 | 2 |
71 | 0 | 1 | 1 | 2 |
91 | 0 | 0 | 1 | 1 |
62 | 0 | 0 | 4 | 4 |
61 | 0 | 0 | 3 | 2 |
69 | 0 | 1 | 5 | 2 |
80 | 0 | 0 | 1 | 1 |
52 | 0 | 0 | 5 | 3 |
47 | 0 | 0 | 3 | 2 |
65 | 0 | 1 | 2 | 3 |
60 | 0 | 0 | 1 | 3 |
81 | 1 | 0 | 1 | 2 |
73 | 1 | 0 | 2 | 2 |
89 | 1 | 0 | 2 | 1 |
71 | 1 | 0 | 5 | 4 |
76 | 1 | 0 | 2 | 2 |
68 | 1 | 0 | 1 | 2 |
68 | 1 | 0 | 5 | 2 |
86 | 1 | 0 | 2 | 2 |
76 | 1 | 1 | 3 | 1 |
67 | 1 | 0 | 2 | 3 |
57 | 1 | 0 | 4 | 2 |
55 | 1 | 1 | 3 | 2 |
54 | 1 | 0 | 5 | 2 |
69 | 1 | 0 | 3 | 3 |
82 | 1 | 0 | 5 | 1 |
94 | 1 | 0 | 1 | 1 |
74 | 1 | 1 | 5 | 2 |
75 | 1 | 1 | 4 | 3 |
69 | 1 | 0 | 2 | 2 |
71 | 1 | 0 | 4 | 4 |
79 | 1 | 0 | 5 | 2 |
80 | 1 | 0 | 1 | 4 |
91 | 1 | 0 | 4 | 1 |
92 | 1 | 0 | 1 | 4 |
46 | 1 | 1 | 4 | 3 |
72 | 1 | 0 | 5 | 2 |
85 | 1 | 0 | 5 | 1 |
Enter negative values as negative, if necessary.
a. Develop the estimated regression equation using all four independent variables (to 3 decimals, if necessary). Delay = -------- + -------- Industry + ------- Public + ------- Quality + -----------Finished .
b. What is the value of the coefficient of determination (to 3 decimals)? Note: report R 2 between 0 and 1.
Did the estimated regression equation in part (a) provide a good fit?
c. Which of the following is a scatter diagram for showing Delay as a function of Finished? What does this scatter diagram indicate about the relationship between Delay and Finished?
The scatter diagram of Delay and Finishing suggests exists between these two variables. Add Finished-Squared as a fifth independent variable. Use best subsets regression procedure to answer the following question.
Which independent variables provide the best regression model if two independent variables are in the model?
Which independent variables provides the best regression model if three independent variables are in the model?
d. Using the best subset regression procedure, how many independent variables are in the highest adjusted R 2 model?
What is the value of R 2(adj) (to 1 decimal)? Note: report R 2(adj) as a percentage.
------------%
a) regression equation:- Delay = 0 + 1 industry + 2Public + 3Quality + 4 Finished
b) we will run these data on r software , as
copy data from excel and run the command on r as follow
data=read.table("clipboard",header = TRUE)
data
model=lm(Delay~.,data=data)# to find anova
model
summary(model)
we will get the result as ,
regression equation:- Delay = 80.429 + 11.944* industry + -4.816*Public + -2.624*Quality + -4.073 *Finished
then the value of the coefficient of determination = R2 = 0.3826
the estimated regression provide the poor fit.
c) scatterplot of Delay and Finished
scatter.smooth(Delay,Finished)
the graph is showing the negative correlation.
which independent variables provide the best regression model if two independent variables are in the model ?
=> if we remove Public and finished variable which has less stars then the model on r is,
model1=lm(Delay~Industry+Quality,data=data)
model1
summary(model1)
R2 = 0.2689, Adjusted R-squared = 0.2293
which is less than above model.
Which independent variables provides the best regression model if three independent variables are in the model?
=>if we find model only removing public variable which has no star then the result on r is,
model2=lm(Delay~Industry+Quality+Finished,data=data)# to find
anova
model2
summary(model2)
R2= 0.3597, Adjusted R-squared: 0.3063
which is also less than the main model.
for using each independent variable the adjusted R2 is
model=lm(Delay~Industry,data=data)# to find anova
model
summary(model2)
adjusted R2=0.137
model=lm(Delay~Quality,data=data)# to find anova
model
summary(model)
adjusted R2 =0.04
model=lm(Delay~Finished,data=data)# to find anova
model
summary(model)
adjusted R2 = 0.07
from among variable Industry variable has highest adjusted R2 as 13.7%