Question

In: Statistics and Probability

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway Administration (FHWA) and reported in the National Bridge Inventory (NBI). For each state, the NBI lists the number of structurally deficient bridges and the total area (thousands of square feet) of the deficient bridges. The data for the 50 states (plus the District of Columbia and Puerto Rico). For future planning and budgeting, the FHWA wants to estimate the total area of structurally deficient bridges in a state based on the number of deficient bridges

NumberSD

SDArea

1899

432.71

155

60.92

181

110.57

997

347.35

3140

5177.97

580

316.92

358

387.78

20

9.05

24

59.34

302

412.92

1028

344.86

142

39.8

349

135.43

2501

1192.43

2030

688.19

5153

1069.71

2991

527.47

1362

458.37

1780

1453.26

349

131.13

388

236.18

585

521.83

1584

804.15

1156

325.9

3002

692.75

4433

1187.42

473

90.94

2382

335.75

47

20.08

383

127.66

750

752.43

404

196.67

2128

1427.73

2272

1034.61

743

101.42

2862

965.16

5793

1423.25

514

393.96

5802

2404.61

164

237.96

1260

626.38

1216

209.33

1325

481.31

2186

1031.45

233

102.56

500

153.8

1208

483.68

400

502.03

1058

331.59

1302

399.8

389

143.46

241

195.43

a) Deplaned on scatterplot, can you use linear regression to predict SDArea based onNumberSD? Explain.

b) Develop a simple linear regression equation to predict SDArea based on NumberSD.

c) Is the model you found in (a) a good fit? Why or why not?

d) Predict the SDArea when the NumberSD is 1260 bridges. Find the corresponding residuals.

e) Build a 90% CI, confidence interval, for coefficient of NumberSD ( b1).

f) Repeat (e) with a 95% CI. What is the difference between your answer in (e) and (f)?

Solutions

Expert Solution

We will use R-software to make scatterplot ,and to fit a regression model.

Given data is

NumberSD

SDArea

1899

432.71

155

60.92

181

110.57

997

347.35

3140

5177.97

580

316.92

358

387.78

20

9.05

24

59.34

302

412.92

1028

344.86

142

39.8

349

135.43

2501

1192.43

2030

688.19

5153

1069.71

2991

527.47

1362

458.37

1780

1453.26

349

131.13

388

236.18

585

521.83

1584

804.15

1156

325.9

3002

692.75

4433

1187.42

473

90.94

2382

335.75

47

20.08

383

127.66

750

752.43

404

196.67

2128

1427.73

2272

1034.61

743

101.42

2862

965.16

5793

1423.25

514

393.96

5802

2404.61

164

237.96

1260

626.38

1216

209.33

1325

481.31

2186

1031.45

233

102.56

500

153.8

1208

483.68

400

502.03

1058

331.59

1302

399.8

389

143.46

241

195.43

First we will import data into R

> NumberSD=scan("clipboard")
Read 52 items
> SDArea=scan("clipboard")
Read 52 items

> head(data.frame(NumberSD,SDArea),10)          # to print first 10 observations
   NumberSD SDArea
1      1899       432.71
2       155 60.92
3       181       110.57
4       997 347.35
5      3140      5177.97
6       580        316.92
7       358        387.78
8        20          9.05
9        24         59.34
10      302         412.92

a) Deplaned on scatterplot, can you use linear regression to predict SDArea based onNumberSD? Explain.

> plot(NumberSD,SDArea,main="Scatter Plot",col=12,pch=19)

We can see possitive or increasing trend in data which implies possitive correlation between two variables, although there is one outlier we can use linear regression to predict SDArea based onNumberSD.

b) Develop a simple linear regression equation to predict SDArea based on NumberSD.

R-code and outpt

> fit=lm(SDArea~NumberSD)       # to find regression equation


> summary(fit)                           # to print summary

Call:
lm(formula = SDArea ~ NumberSD)

Residuals:
   Min     1Q Median     3Q    Max
-831.0 -146.3 -107.2 104.7 3972.9

Coefficients:
             Estimate     Std. Error t value    Pr(>|t|)  
(Intercept)       119.86392    123.02005     0.974    0.335  
NumberSD    0.34560 0.06158       5.613    8.69e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 635.2 on 50 degrees of freedom
Multiple R-squared: 0.3865,    Adjusted R-squared: 0.3743
F-statistic: 31.5 on 1 and 50 DF, p-value: 8.695e-07

> anova(fit)                                  # to print ANOVA
Analysis of Variance Table

Response: SDArea
                    Df    Sum Sq      Mean Sq      F value     Pr(>F)  
NumberSD     1    12710096    12710096     31.503    8.695e-07 ***
Residuals 50    20173064    403461                    
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Simple linear regression equation to predict SDArea based on NumberSD is

SDArea (y) = 119.8639 + 0.3456 * NumberSD

c) Is the model you found in (a) a good fit? Why or why not?

Coefficient of determination is

R-squared: 0.3865

Which implies that only 38.65% of variability in SDAreas is explained by independent variable NumberSD , which is to less , to consider fitted model is good.

We will check Residual plot

> zero=rep(0,5000)
> k=1:5000
> plot(NumberSD,Residual,main="Residual Plot",col=2,pch=19)
> lines(k,zero)

We can see a outliers is residual plot too which suggest some transformation should be used. Hence model you found in (a) is not a good fit , as we known that both variables are positively correlated , hence we can use simple regression model to predict SDArea , but for doing this some transformation like square root or log transformation should be used.

d) Predict the SDArea when the NumberSD is 1260 bridges. Find the corresponding residuals.

Given NumberSD = 1260

regression equation to predict SDArea based on NumberSD is

SDArea (y) = 119.8639 + 0.3456 * NumberSD

thus SDArea (y) = 119.8639 + 0.3456 * 1260

      SDArea (y) = 555.3199

Hence Predict the SDArea is 555.3199

Now actual value of SDArea correspond to NumberSD (=1260) is 626.38

> data.frame(NumberSD[40:45],SDArea[40:45])
   NumberSD.40.45. SDArea.40.45.
1             164        237.96
2            1260      626.38
3            1216          209.33
4            1325                481.31
5            2186                 1031.45
6             233          102.56

Hence Predicted value of SDArea is 555.3199   {at NumberSD =1260 }

And Actual value of SDArea is 626.38 {at NumberSD =1260 }

Thus Residual = Actual value - Predicted value = 626.38 - 555.3199 = 71.0601

The corresponding residuals is 71.0601

e) Build a 90% CI, confidence interval, for coefficient of NumberSD ( b1).

90% CI, confidence interval, for ( b1). is given by

CI = { - * SE() , + * SE() }

Here = 0.34560    and    SE() = 0.06158

is t-distributed with n-2 = 52-2 = 50 degree of freedom and =0.10, { for 90% CI, }

It can be computed from statistical book or more accurately from any software like R,Excel

From R

> qt(1-.1/2,50)
[1] 1.675905

Thus = 1.675905

Hence 90% CI, confidence interval, for ( b1) is given by

CI = { - * SE() , + * SE() }

    = { 0.34560 - 1.675905 * 0.06158 , 0.34560 + 1.675905 * 0.06158 }

    = { 0.2423978 , 0.4488022 }

90% CI, confidence interval, for coefficient of NumberSD ( b1) is { 0.24240 , 0.44880 }

f) Repeat (e) with a 95% CI. What is the difference between your answer in (e) and (f)?

90% CI, confidence interval, for ( b1). is given by

CI = { - * SE() , + * SE() }

is t-distributed with 50 degree of freedom but =0.05, { for 95% CI, }

From R

> qt(1-.05/2,50)
[1] 2.008559

Thus = 2.008559

Hence 95% CI, confidence interval, for ( b1) is given by

CI = { - * SE() , + * SE() }

    = { 0.34560 - 2.008559 * 0.06158 , 0.34560 + 2.008559 * 0.06158 }

    = { 0.2219129 , 0.4692871 }

95% CI, confidence interval, for coefficient of NumberSD ( b1) is { 0.22191 , 0.46929 }

Difference between part in (e) and (f) is that 95% confidence interval which is { 0.22191 , 0.46929 } is greater than that of 90% confidence interval { 0.24240 , 0.44880 } . ie A 90 % confidence interval for ( b1) is narrower .


Related Solutions

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...
Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway Administration (FHWA) and reported in the National Bridge Inventory (NBI). For each state, the NBI lists the number of structurally deficient bridges and the total area (thousands of square feet) of the deficient bridges. The data for the 50 states (plus the District of Columbia and Puerto Rico). For future planning and budgeting, the FHWA wants to estimate the total area of structurally deficient...
The table below is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida.
 The table below is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. The analyst would like to know if there is a relationship between Injury type and seatbelts a. Would you consider this an experiment? Do you think that these data were collected as part of an experiment? Explain b. Give the null and alternative for the test c. Use the chi-square test to run the test at a = 0.05 d. Give...
The following table is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. Find a
 The following table is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. Find a95% CI for the relative risk. Conduct a test H0 : θ = 1 against Ha : θ ≠ 1 by using α = 0.05. 
Once budget data has been compiled, it is important that the data is represented properly in...
Once budget data has been compiled, it is important that the data is represented properly in presentations for interpretation. Discuss which one of the methods to display data you would use in representing the data, and why.
Data on investments in the​ high-tech industry by venture capitalists are compiled by a corporation. A...
Data on investments in the​ high-tech industry by venture capitalists are compiled by a corporation. A random sample of 18 ​venture-capital investments in a certain business sector yielded the accompanying​ data, in millions of dollars. Determine and interpret a 95​% confidence interval for the mean​ amount, μ​, of all​ venture-capital investments in this business sector. Assume that the population standard deviation is ​$1.75 million.​ (Note: The sum of the data is ​$95.94 ​million.) Click here to view the investment data....
The U.S. National Highway Traffic Safety Administration gathers data concerning the causes of highway crashes where...
The U.S. National Highway Traffic Safety Administration gathers data concerning the causes of highway crashes where at least one fatality has occurred. From the 1998 annual study, the following probabilities were determined (BAC is blood-alcohol level): ?(???=0|Crash with fatality)=0.631P(BAC=0|Crash with fatality)=0.631 ?(??? is between .01 and .09|Crash with fatality)=0.325P(BAC is between .01 and .09|Crash with fatality)=0.325 ?(??? is greater than .09|Crash with fatality)=0.098P(BAC is greater than .09|Crash with fatality)=0.098 Suppose over a certain stretch of highway during a 1-year period,...
According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...
According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200 with a standard deviation of 3800 miles. A resident of the state of Montana believes the drivers in Montana drive more than the national average. She obtains a random sample of 35 drivers from a list of registered drivers in the state and finds the mean number of miles driven annually for these drivers to be 12,895.90. Is there sufficient evidence to show that...
According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...
According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200 with a standard deviation of 3800 miles. A resident of the state of Montana believes the drivers in Montana drive more than the national average. She obtains a random sample of 35 drivers from a list of registered drivers in the state and finds the mean number of miles driven annually for these drivers to be 12,895.90. Is there sufficient evidence to show that...
In 1997, XYZ Company purchased several acres of land for $1,000,000. In 2019, the Federal Highway...
In 1997, XYZ Company purchased several acres of land for $1,000,000. In 2019, the Federal Highway Administration announced that it planned to build an interstate highway, with exit and entrance ramps to be located within a few miles of this property. Based on this information, the appraised value of the property on December 31, 2019, was $25 million. At what value should XYZ Company report this land in its December 31, 2019, balance sheet? Why? Which value is more relevant?...
A major cause of injuries in highway work zones is weak construction barriers. One federal government...
A major cause of injuries in highway work zones is weak construction barriers. One federal government road-barrier specification involves the velocity of a front-seat passenger immediately following impact. This mean impact velocity must be less than 12 m/s. TSS GmbH, a German company, has just developed a new high-impact road barrier with special absorbing material. In controlled tests the impact velocity was measured for 9 randomly selected crashes. The sample mean was 11.85 m/s. Assume the distribution of impact velocities...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT