Question

In: Statistics and Probability

Is there a relationship between the number of stories a building has and its height? Some...

Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 52 buildings reported in the 1994 World Almanac. You will use the data set to decide whether height can be predicted from the number of stories. (a) Load the data from buildings.txt (Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to see its contents and type the data in: one vector for heights and one vector for stories. Ignore the year data.) (b) Draw a scatterplot with stories in the x-axis and height in the y-axis. Does there seem to be a linear relationship between the two variables? (c) Find the linear correlation coefficient between these variables. What does it tell you about the linear relationship? (d) Obtain the linear model and summary. Write down the regression equation that relates height with stories. Add the line to the scatterplot. (e) Test for significance of the regression at  = 0.05. State the null and alternative hypotheses. Can the model be used for predictions? Justify your conclusion using the summary in (d). (f) State the coefficient of determination. What percentage of variation in height is explained by the number of stories? (g) Draw diagnostic plots (a plot of stories vs. residuals, and a normal probability plot for the residuals). Do assumptions appear to be satisfied?

YEAR    Height  Stories
1990    770     54
1980    677     47
1990    428     28
1989    410     38
1966    371     29
1976    504     38
1974    1136    80
1991    695     52
1982    551     45
1986    550     40
1931    568     49
1979    504     33
1988    560     50
1973    512     40
1981    448     31
1983    538     40
1968    410     27
1927    409     31
1969    504     35
1988    777     57
1987    496     31
1960    386     26
1984    530     39
1976    360     25
1920    355     23
1931    1250    102
1989    802     72
1907    741     57
1988    739     54
1990    650     56
1973    592     45
1983    577     42
1971    500     36
1969    469     30
1971    320     22
1988    441     31
1989    845     52
1973    435     29
1987    435     34
1931    375     20
1931    364     33
1924    340     18
1931    375     23
1991    450     30
1973    529     38
1976    412     31
1990    722     62
1983    574     48
1984    498     29
1986    493     40
1986    379     30
1992    579     42

*********************************

Need R console code

Solutions

Expert Solution

(b) SCATTER PLOT:

Scatterplots are useful for interpreting trends in statistical data. As the data shows an uphill pattern as we move from left to right, this indicates a positive relationship between Stories and Height. That is, as the value of variable "Stories" increase (move right), the the value of "Height" tend to increase (move up). Also we could see a linear pattern in the plot, thus there is a positive linear relationship between Stories and Height.

(c) LINEAR CORRELATION COEFFICIENT:

The correlation coefficient between Height and Stories is ​ and it is found to be significant since the p-value is less than significance level ​. As the sign is positive and it is nearly closer to 1, there is a strong positive linear relationship between Stories and Height.

(d) SIMPLE LINEAR REGRESSION MODEL:

ESTIMATED REGRESSION EQUATION:

Thus from the above output, the estimated regression equation is given by,

where ​ is the predicted dependent variable "Height".

is the intercept

is the slope coefficient of the variable "Stories".

X is the independent variable "Stories".

SCATTER PLOT WITH TREND LINE:

(e) SIGNIFICANCE OF INDIVIDUAL PREDICTOR:

We use t-test to test for significance of individual predictors.

HYPOTHESIS:

The hypothesis for t test is given by,

From the regression output, the t-test p-value for the slope coefficient of the variable "Stories" is . Since it is less than the significance level , we reject ​ and conclude that the variable "Stories" is significant variable. And the intercept term is also significant sinc ethe p-value is less than significance level .

INTERCEPT:

Since the intercept ​, the mean value of height without involving the variable "Stories" is ​.

SLOPE COEFFICIENT:

Since the slope coefficient ​, it can be interpreted as: As the number of stories increases by 1 unit, the mean value of height increases by ​ units.

TEST FOR OVERALL SIGNIFICANCE OF MODEL:

We use F-test to determine overall significance of model.

The hypothesis is given by,

From the regression output, the F-test p-value is which is less than the significance level , thus we reject and conclude that the overall model performance is significant.

(f) COEFFICIENT OF DETERMINATION :

The coefficient of determination is and the value of adjusted is ​.

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. ​Whereas R square increases on addition of predictors. Thus we usually prefer adjusted R-squared​ to interpret.

The coefficient of determination is the total amount of variability in Y explained by the independent variable X.

Thus 91% of total variability in the dependent variable "Height" is explained by the independent variable "Stories".

(g) DIAGNOSTIC PLOTS:

RESIDUAL PLOT:

In residual plot, the standardized residuals appear on the y axis and the fitted values appear on the x axis.

From the above plot, we can see that

  • The residuals "bounce randomly" around the 0 line. This suggests that the assumption that the relationship is linear is reasonable.
  • The residuals roughly form a "horizontal band" around the 0 line. This suggests that the variances of the error terms are equal.
  • Three residuals "stands out" from the basic random pattern of residuals. This suggests that there are three outliers in the given data.

NORMAL PROBABILITY PLOT:

A normal probability plot of the residuals is a scatter plot with the theoretical percentiles of the normal distribution on the x axis and the sample percentiles of the residuals on the y axis. We can see that the relationship between the theoretical percentiles and the sample percentiles is approximately linear. Therefore, the normal probability plot of the residuals suggests that the error terms are normally distributed.


Related Solutions

Is there a relationship between the number of stories a building has and its height?
  Regression Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories. data from buildings.txt.(Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to see...
(b) Draw a scatterplot with stories in the x-axis and height in the y-axis. Describe the trend, strength and shape of the relationship between stories and height.
  Regression Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories. (a) Load the data from buildings.txt.(Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open...
What is the relationship between stories about “Ninhursag,” “Pandora,” and “Eve?” Which set of stories comes...
What is the relationship between stories about “Ninhursag,” “Pandora,” and “Eve?” Which set of stories comes first? In what ways have the stories been changed?
What is the relationship between the number of chromosomes in an organism's genome and its biological...
What is the relationship between the number of chromosomes in an organism's genome and its biological complexity? a) more complex organisms have a higher number of chromosomes due to a higher number of genes b) organisms have the same number of chromosomes, even though they vary in complexity c) there is no relationship, many less complex organisms have a large number of chromosomes d) less complex organisms have fewer chromosomes because it takes less time and energy to replicate their...
A researcher wants to determine if there is a linear relationship between height and weight. The...
A researcher wants to determine if there is a linear relationship between height and weight. The following table represents the data collected. Display the data in a scatter plot on your calculator, draw a quick sketch below. Then find the linear regression and put the line of best fit on the sketch. Then state the value for the correlation coefficient and determine if this is a positive correlation or no correlation using the table in the back of the book....
Is there a relationship between a person's height and the salary he or she earns? In...
Is there a relationship between a person's height and the salary he or she earns? In a study published in the Journal of Applied Psychology,† a positive correlation was found between the heights and the salaries of the participants. The correlation was found to be strongest for employees in sales and management positions. Suppose you conduct a similar study by obtaining height and earnings data from 20 randomly selected people in your community. You use statistical computing software (such as...
A researcher is interested in a possible relationship between height and the time it takes to...
A researcher is interested in a possible relationship between height and the time it takes to run a 100 meters. With a random sample of 57 people, the researcher finds a Pearson correlation coefficient r = − 0.492. Group of answer choices A. There is a very strong correlation between these variables. As height increases, generally the 100-meter run time decreases. B. There is a very strong correlation between these variables. As height increases, generally the 100-meter run time increases...
A sociologist believes that there is a relationship between number of friends an elderly person has...
A sociologist believes that there is a relationship between number of friends an elderly person has and their perceived level of stress. Participant # of Friends (Y) Stress Level (X) A 10 1 B 3 7 C 12 2 D 11 3 E 6 5 F 8 4 G 14 1 H 9 2 I 10 3 J 2 10 *higher scores of the measure of stress (from 1 to 10) indicate more stress. Use this information to answer questions...
A sociologist believes that there is a relationship between number of friends an elderly person has...
A sociologist believes that there is a relationship between number of friends an elderly person has and their perceived level of stress. Participant # of Friends (Y) Stress Level (X) A 10 1 B 3 7 C 12 2 D 11 3 E 6 5 F 8 4 G 14 1 H 9 2 I 10 3 J 2 10 *higher scores of the measure of stress (from 1 to 10) indicate more stress. Use this information to answer questions...
What is the relationship between the price of a bond and its YTM? Explain why some...
What is the relationship between the price of a bond and its YTM? Explain why some bonds sell at a premium over par value while other bonds sell at a discount. What do you know about the relationship between the coupon rate and the YTM for premium bonds? What about for discount bonds? For bonds selling at par value? What is the relationship between the current yield and YTM for premium bonds? For discount bonds? For bonds selling at par...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT