Question

In: Statistics and Probability

(b) Draw a scatterplot with stories in the x-axis and height in the y-axis. Describe the trend, strength and shape of the relationship between stories and height.

 

Regression

Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories.

(a) Load the data from buildings.txt.
(Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to see its contents and type the data in: one vector for heights and one vector for stories. Ignore the year data.)

buildings.txt

YEAR   Height   Stories
1990   770   54
1980   677   47
1990   428   28
1989   410   38
1966   371   29
1976   504   38
1974   1136   80
1991   695   52
1982   551   45
1986   550   40
1931   568   49
1979   504   33
1988   560   50
1973   512   40
1981   448   31
1983   538   40
1968   410   27
1927   409   31
1969   504   35
1988   777   57
1987   496   31
1960   386   26
1984   530   39
1976   360   25
1920   355   23
1931   1250   102
1989   802   72
1907   741   57
1988   739   54
1990   650   56
1973   592   45
1983   577   42
1971   500   36
1969   469   30
1971   320   22
1988   441   31
1989   845   52
1973   435   29
1987   435   34
1931   375   20
1931   364   33
1924   340   18
1931   375   23
1991   450   30
1973   529   38
1976   412   31
1990   722   62
1983   574   48
1984   498   29
1986   493   40
1986   379   30
1992   579   42
1973   458   36
1988   454   33
1979   952   72
1972   784   57
1930   476   34
1978   453   46
1978   440   30
1977   428   21

  1. (b) Draw a scatterplot with stories in the x-axis and height in the y-axis. Describe the trend, strength and shape of the relationship between stories and height.

  2. (c) Find the linear correlation coefficient between these variables. How does it support the description you gave in (b)?

  3. (d) Obtain the linear model and summary. Write down the regression equation that relates height with stories. Add the line to the scatterplot.

  4. (e) Test for significance of the regression at  = 0.05. State the null and alternative hypotheses. Can the model be used for predictions? Justify your conclusion using the summary in (d).

  5. (f) State the coefficient of determination. What percentage of variation in height is explained by the number of stories?

  6. (g) Draw diagnostic plots (a plot of stories vs. residuals, and a normal probability plot for the residuals). Do assumptions appear to be satisfied?

  7. (h) Obtain a 95% confidence interval for the true value of the slope. How does the interval support your conclusion in (e)?

  8. (i) What is the estimated height of a building that is 45 stories high? Write a concluding sentence supported by your results above.

Solutions

Expert Solution

a) The data extracted is given below.

H <- c(770, 677, 428, 410, 371, 504, 1136, 695, 551, 550, 568, 504, 560, 512, 448, 538, 410, 409, 504, 777, 496, 386, 530, 360, 355, 1250, 802, 741, 739, 650, 592, 577, 500,469,320, 441, 845,435)

S <- c(54, 47, 28, 38, 29, 38, 80, 52, 45, 40, 49, 33, 50, 40, 31, 40, 27, 31, 35, 57, 31, 26, 39, 25, 23, 102, 72, 57, 54, 56, 45, 42, 36, 30, 22, 31, 52, 29, 34)

(b) SCATTER PLOT:

Scatterplots are useful for interpreting trends in statistical data. As the data shows an uphill pattern as we move from left to right, this indicates a positive relationship between Stories and Height. That is, as the value of variable "Stories" increase (move right), the the value of "Height" tend to increase (move up). Also we could see a linear pattern in the plot, thus there is a positive linear relationship between Stories and Height.

(c) LINEAR CORRELATION COEFFICIENT:

The correlation coefficient between Height and Stories is ​ and it is found to be significant since the p-value is less than significance level ​. As the sign is positive and it is nearly closer to 1, there is a strong positive linear relationship between Stories and Height.

(d) SIMPLE LINEAR REGRESSION MODEL:

ESTIMATED REGRESSION EQUATION:

Thus from the above output, the estimated regression equation is given by,

where ​ is the predicted dependent variable "Height".

is the intercept

is the slope coefficient of the variable "Stories".

X is the independent variable "Stories".

SCATTER PLOT WITH TREND LINE:

(e) SIGNIFICANCE OF INDIVIDUAL PREDICTOR:

We use t-test to test for significance of individual predictors.

HYPOTHESIS:

The hypothesis for t test is given by,

From the regression output, the t-test p-value for the slope coefficient of the variable "Stories" is . Since it is less than the significance level , we reject ​ and conclude that the variable "Stories" is significant variable. And the intercept term is also significant sinc ethe p-value is less than significance level .

INTERCEPT:

Since the intercept ​, the mean value of height without involving the variable "Stories" is ​.

SLOPE COEFFICIENT:

Since the slope coefficient ​, it can be interpreted as: As the number of stories increases by 1 unit, the mean value of height increases by ​ units.

TEST FOR OVERALL SIGNIFICANCE OF MODEL:

We use F-test to determine overall significance of model.

The hypothesis is given by,

From the regression output, the F-test p-value is which is less than the significance level , thus we reject and conclude that the overall model performance is significant.

(f) COEFFICIENT OF DETERMINATION :

The coefficient of determination is and the value of adjusted is ​.

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. ​Whereas R square increases on addition of predictors. Thus we usually prefer adjusted R-squared​ to interpret.

The coefficient of determination is the total amount of variability in Y explained by the independent variable X.

Thus 91% of total variability in the dependent variable "Height" is explained by the independent variable "Stories".

(g) DIAGNOSTIC PLOTS:

RESIDUAL PLOT:

In residual plot, the standardized residuals appear on the y axis and the fitted values appear on the x axis.

From the above plot, we can see that

  • The residuals "bounce randomly" around the 0 line. This suggests that the assumption that the relationship is linear is reasonable.
  • The residuals roughly form a "horizontal band" around the 0 line. This suggests that the variances of the error terms are equal.
  • Three residuals "stands out" from the basic random pattern of residuals. This suggests that there are three outliers in the given data.

NORMAL PROBABILITY PLOT:

A normal probability plot of the residuals is a scatter plot with the theoretical percentiles of the normal distribution on the x axis and the sample percentiles of the residuals on the y axis. We can see that the relationship between the theoretical percentiles and the sample percentiles is approximately linear. Therefore, the normal probability plot of the residuals suggests that the error terms are normally distributed.

 


Related Solutions

Is there a relationship between the number of stories a building has and its height?
  Regression Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories. data from buildings.txt.(Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to see...
Is there a relationship between the number of stories a building has and its height? Some...
Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 52 buildings reported in the 1994 World Almanac. You will use the data set to decide whether height can be predicted from the number of stories. (a) Load the data from buildings.txt (Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to...
Explain why ρ is preferable to Cov(X,Y) in measuring the strength of relationship between X and...
Explain why ρ is preferable to Cov(X,Y) in measuring the strength of relationship between X and Y
Explore the relationship between the selling price appraised value and the selling price. (Draw a scatterplot...
Explore the relationship between the selling price appraised value and the selling price. (Draw a scatterplot and then do simple regression.) . Draw a scatterplot first. What is the regression equation for Selling Price based on Appraised Value? 2. For which of the remaining variables is the relationship with the home's selling price Stronger? 3. Find a regression equation that takes into account ALL the variables in the data set. 4. What percent of a home's selling price is associated...
Describe a probability distribution. Explain the x-axis, y-axis, and area under the distribution.
Describe a probability distribution. Explain the x-axis, y-axis, and area under the distribution.
Using the following data, X- 3,6,9,12, 15, 18 Y - 6,10,15,24,21,20 a Create a scatterplot. b...
Using the following data, X- 3,6,9,12, 15, 18 Y - 6,10,15,24,21,20 a Create a scatterplot. b Find the least-squares line. c Plot the line on the diagram. d Predict: Y if X is 10. Y if X is 17
Find the area enclosed between the x-axis and the curve y=x(x-1)(x+2)
Find the area enclosed between the x-axis and the curve y=x(x-1)(x+2)
Consider the region R between the x-axis and the curve y = x^3 / 3 ,...
Consider the region R between the x-axis and the curve y = x^3 / 3 , between x = 0 and x = 1. (a) Calculate the surface area of the solid obtained by revolving R about the x-axis. (b) Write an integral for the the surface area of the solid obtained by revolving R about the y-axis
Contract the scatter plot of these data. Describe relationship between x and y. What type of relationship appears to exist between two variables?
Use the following data: x y 10 3 6 7 9 3 3 8 2 9 8 5 3 7 Contract the scatter plot of these data. Describe relationship between x and y. What type of relationship appears to exist between two variables? (you can copy and past from Excel,SAS,etc) Compute the correlation coefficient r. Test to determine whether the population correlation coefficient is positive. Use the α=0.01 level to conduct test. (calculate test statistics and make conclusion)
The strength of the linear relationship between two quantitative variables is determined by the value of a. r b. a c. x d. sest
The strength of the linear relationship between two quantitative variables is determined by the value ofa. rb. a c. xd. sest
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT