Question

In: Statistics and Probability

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway Administration (FHWA) and reported in the National Bridge Inventory (NBI). For each state, the NBI lists the number of structurally deficient bridges and the total area (thousands of square feet) of the deficient bridges. The data for the 50 states (plus the District of Columbia and Puerto Rico). For future planning and budgeting, the FHWA wants to estimate the total area of structurally deficient bridges in a state based on the number of deficient bridges

NumberSD	SDArea
1899	432.71
155	60.92
181	110.57
997	347.35
3140	5177.97
580	316.92
358	387.78
20	9.05
24	59.34
302	412.92
1028	344.86
142	39.8
349	135.43
2501	1192.43
2030	688.19
5153	1069.71
2991	527.47
1362	458.37
1780	1453.26
349	131.13
388	236.18
585	521.83
1584	804.15
1156	325.9
3002	692.75
4433	1187.42
473	90.94
2382	335.75
47	20.08
383	127.66
750	752.43
404	196.67
2128	1427.73
2272	1034.61
743	101.42
2862	965.16
5793	1423.25
514	393.96
5802	2404.61
164	237.96
1260	626.38
1216	209.33
1325	481.31
2186	1031.45
233	102.56
500	153.8
1208	483.68
400	502.03
1058	331.59
1302	399.8
389	143.46
241	195.43

a) Deplaned on scatterplot, can you use linear regression to predict SDArea based onNumberSD? Explain.

b) Develop a simple linear regression equation to predict SDArea based on NumberSD.

c) Is the model you found in (a) a good fit? Why or why not?

d) Predict the SDArea when the NumberSD is 1260 bridges. Find the corresponding residuals.

e) Build a 90% CI, confidence interval, for coefficient of NumberSD ( b₁).

f) Repeat (e) with a 95% CI. What is the difference between your answer in (e) and (f)?

Expert Solution

We will use R-software to make scatterplot ,and to fit a regression model.

Given data is

NumberSD	SDArea
1899	432.71
155	60.92
181	110.57
997	347.35
3140	5177.97
580	316.92
358	387.78
20	9.05
24	59.34
302	412.92
1028	344.86
142	39.8
349	135.43
2501	1192.43
2030	688.19
5153	1069.71
2991	527.47
1362	458.37
1780	1453.26
349	131.13
388	236.18
585	521.83
1584	804.15
1156	325.9
3002	692.75
4433	1187.42
473	90.94
2382	335.75
47	20.08
383	127.66
750	752.43
404	196.67
2128	1427.73
2272	1034.61
743	101.42
2862	965.16
5793	1423.25
514	393.96
5802	2404.61
164	237.96
1260	626.38
1216	209.33
1325	481.31
2186	1031.45
233	102.56
500	153.8
1208	483.68
400	502.03
1058	331.59
1302	399.8
389	143.46
241	195.43

First we will import data into R

> NumberSD=scan("clipboard")
Read 52 items
> SDArea=scan("clipboard")
Read 52 items

> head(data.frame(NumberSD,SDArea),10)          # to print first 10 observations
   NumberSD SDArea
1      1899       432.71
2       155 60.92
3       181       110.57
4       997 347.35
5      3140      5177.97
6       580        316.92
7       358        387.78
8        20          9.05
9        24         59.34
10      302         412.92

a) Deplaned on scatterplot, can you use linear regression to predict SDArea based onNumberSD? Explain.

> plot(NumberSD,SDArea,main="Scatter Plot",col=12,pch=19)

We can see possitive or increasing trend in data which implies possitive correlation between two variables, although there is one outlier we can use linear regression to predict SDArea based onNumberSD.

b) Develop a simple linear regression equation to predict SDArea based on NumberSD.

R-code and outpt

> fit=lm(SDArea~NumberSD) # to find regression equation

> summary(fit) # to print summary

Call:
lm(formula = SDArea ~ NumberSD)

Residuals:
Min 1Q Median 3Q Max
-831.0 -146.3 -107.2 104.7 3972.9

Coefficients:
             Estimate     Std. Error t value    Pr(>|t|)
(Intercept)       119.86392    123.02005     0.974    0.335
NumberSD    0.34560 0.06158       5.613    8.69e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 635.2 on 50 degrees of freedom
Multiple R-squared: 0.3865, Adjusted R-squared: 0.3743
F-statistic: 31.5 on 1 and 50 DF, p-value: 8.695e-07

> anova(fit) # to print ANOVA
Analysis of Variance Table

Response: SDArea
                    Df    Sum Sq      Mean Sq      F value     Pr(>F)
NumberSD     1    12710096    12710096     31.503    8.695e-07 ***
Residuals 50    20173064    403461
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Simple linear regression equation to predict SDArea based on NumberSD is

SDArea (y) = 119.8639 + 0.3456 * NumberSD

c) Is the model you found in (a) a good fit? Why or why not?

Coefficient of determination is

R-squared: 0.3865

Which implies that only 38.65% of variability in SDAreas is explained by independent variable NumberSD , which is to less , to consider fitted model is good.

We will check Residual plot

> zero=rep(0,5000)
> k=1:5000
> plot(NumberSD,Residual,main="Residual Plot",col=2,pch=19)
> lines(k,zero)

We can see a outliers is residual plot too which suggest some transformation should be used. Hence model you found in (a) is not a good fit , as we known that both variables are positively correlated , hence we can use simple regression model to predict SDArea , but for doing this some transformation like square root or log transformation should be used.

d) Predict the SDArea when the NumberSD is 1260 bridges. Find the corresponding residuals.

Given NumberSD = 1260

regression equation to predict SDArea based on NumberSD is

SDArea (y) = 119.8639 + 0.3456 * NumberSD

thus SDArea (y) = 119.8639 + 0.3456 * 1260

SDArea (y) = 555.3199

Hence Predict the SDArea is 555.3199

Now actual value of SDArea correspond to NumberSD (=1260) is 626.38

> data.frame(NumberSD[40:45],SDArea[40:45])
   NumberSD.40.45. SDArea.40.45.
1             164        237.96
2            1260      626.38
3            1216          209.33
4            1325                481.31
5            2186                 1031.45
6             233          102.56

Hence Predicted value of SDArea is 555.3199 {at NumberSD =1260 }

And Actual value of SDArea is 626.38 {at NumberSD =1260 }

Thus Residual = Actual value - Predicted value = 626.38 - 555.3199 = 71.0601

The corresponding residuals is 71.0601

e) Build a 90% CI, confidence interval, for coefficient of NumberSD ( b1).

90% CI, confidence interval, for ( b1). is given by

CI = { - * SE() , + * SE() }

Here = 0.34560 and SE() = 0.06158

is t-distributed with n-2 = 52-2 = 50 degree of freedom and =0.10, { for 90% CI, }

It can be computed from statistical book or more accurately from any software like R,Excel

From R

> qt(1-.1/2,50)
[1] 1.675905

Thus = 1.675905

Hence 90% CI, confidence interval, for ( b1) is given by

CI = { - * SE() , + * SE() }

= { 0.34560 - 1.675905 * 0.06158 , 0.34560 + 1.675905 * 0.06158 }

= { 0.2423978 , 0.4488022 }

90% CI, confidence interval, for coefficient of NumberSD ( b1) is { 0.24240 , 0.44880 }

f) Repeat (e) with a 95% CI. What is the difference between your answer in (e) and (f)?

90% CI, confidence interval, for ( b1). is given by

CI = { - * SE() , + * SE() }

is t-distributed with 50 degree of freedom but =0.05, { for 95% CI, }

From R

> qt(1-.05/2,50)
[1] 2.008559

Thus = 2.008559

Hence 95% CI, confidence interval, for ( b1) is given by

CI = { - * SE() , + * SE() }

= { 0.34560 - 2.008559 * 0.06158 , 0.34560 + 2.008559 * 0.06158 }

= { 0.2219129 , 0.4692871 }

95% CI, confidence interval, for coefficient of NumberSD ( b1) is { 0.22191 , 0.46929 }

Difference between part in (e) and (f) is that 95% confidence interval which is { 0.22191 , 0.46929 } is greater than that of 90% confidence interval { 0.24240 , 0.44880 } . ie A 90 % confidence interval for ( b1) is narrower .

orchestra answered 1 year ago

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway Administration (FHWA) and reported in the National Bridge Inventory (NBI). For each state, the NBI lists the number of structurally deficient bridges and the total area (thousands of square feet) of the deficient bridges. The data for the 50 states (plus the District of Columbia and Puerto Rico). For future planning and budgeting, the FHWA wants to estimate the total area of structurally deficient...

The table below is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida.

The table below is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. The analyst would like to know if there is a relationship between Injury type and seatbelts a. Would you consider this an experiment? Do you think that these data were collected as part of an experiment? Explain b. Give the null and alternative for the test c. Use the chi-square test to run the test at a = 0.05 d. Give...

The following table is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. Find a

The following table is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. Find a95% CI for the relative risk. Conduct a test H0 : θ = 1 against Ha : θ ≠ 1 by using α = 0.05.

Once budget data has been compiled, it is important that the data is represented properly in...

Once budget data has been compiled, it is important that the data is represented properly in presentations for interpretation. Discuss which one of the methods to display data you would use in representing the data, and why.

Data on investments in the high-tech industry by venture capitalists are compiled by a corporation. A...

Data on investments in the high-tech industry by venture capitalists are compiled by a corporation. A random sample of 18 venture-capital investments in a certain business sector yielded the accompanying data, in millions of dollars. Determine and interpret a 95% confidence interval for the mean amount, μ, of all venture-capital investments in this business sector. Assume that the population standard deviation is $1.75 million. (Note: The sum of the data is $95.94 million.) Click here to view the investment data....

The U.S. National Highway Traffic Safety Administration gathers data concerning the causes of highway crashes where...

The U.S. National Highway Traffic Safety Administration gathers data concerning the causes of highway crashes where at least one fatality has occurred. From the 1998 annual study, the following probabilities were determined (BAC is blood-alcohol level): ?(???=0|Crash with fatality)=0.631P(BAC=0|Crash with fatality)=0.631 ?(??? is between .01 and .09|Crash with fatality)=0.325P(BAC is between .01 and .09|Crash with fatality)=0.325 ?(??? is greater than .09|Crash with fatality)=0.098P(BAC is greater than .09|Crash with fatality)=0.098 Suppose over a certain stretch of highway during a 1-year period,...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200 with a standard deviation of 3800 miles. A resident of the state of Montana believes the drivers in Montana drive more than the national average. She obtains a random sample of 35 drivers from a list of registered drivers in the state and finds the mean number of miles driven annually for these drivers to be 12,895.90. Is there sufficient evidence to show that...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200 with a standard deviation of 3800 miles. A resident of the state of Montana believes the drivers in Montana drive more than the national average. She obtains a random sample of 35 drivers from a list of registered drivers in the state and finds the mean number of miles driven annually for these drivers to be 12,895.90. Is there sufficient evidence to show that...

In 1997, XYZ Company purchased several acres of land for $1,000,000. In 2019, the Federal Highway...

In 1997, XYZ Company purchased several acres of land for $1,000,000. In 2019, the Federal Highway Administration announced that it planned to build an interstate highway, with exit and entrance ramps to be located within a few miles of this property. Based on this information, the appraised value of the property on December 31, 2019, was $25 million. At what value should XYZ Company report this land in its December 31, 2019, balance sheet? Why? Which value is more relevant?...

A major cause of injuries in highway work zones is weak construction barriers. One federal government...

A major cause of injuries in highway work zones is weak construction barriers. One federal government road-barrier specification involves the velocity of a front-seat passenger immediately following impact. This mean impact velocity must be less than 12 m/s. TSS GmbH, a German company, has just developed a new high-impact road barrier with special absorbing material. In controlled tests the impact velocity was measured for 9 randomly selected crashes. The sample mean was 11.85 m/s. Assume the distribution of impact velocities...

Question

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...

Solutions

Expert Solution

Related Solutions

Structurally deficient highway bridges. Data on structurally deficient highway bridges is compiled by the Federal Highway...

The table below is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida.

The following table is based on records of accidents in 1988 compiled by the Department of Highway Safety and Motor Vehicles in Florida. Find a

Once budget data has been compiled, it is important that the data is represented properly in...

Data on investments in the high-tech industry by venture capitalists are compiled by a corporation. A...

The U.S. National Highway Traffic Safety Administration gathers data concerning the causes of highway crashes where...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...

According to the U.S. Federal Highway Administration, the mean number of miles driven annually is 12,200...

In 1997, XYZ Company purchased several acres of land for $1,000,000. In 2019, the Federal Highway...

A major cause of injuries in highway work zones is weak construction barriers. One federal government...