In: Statistics and Probability
Regression
Is there a relationship between the number of stories a building has and its height? Some statisticians compiled data on a set of n = 60 buildings reported in the World Almanac. You will use the data set to decide whether height (in feet) can be predicted from the number of stories.
data from buildings.txt.
(Note that this is a text file, so use the appropriate instruction. If you are having trouble uploading the data, open it to see its contents and type the data in: one vector for heights and one vector for stories. Ignore the year data.)
buildings.txt
YEAR Height Stories
1990 770 54
1980 677 47
1990 428 28
1989 410 38
1966 371 29
1976 504 38
1974 1136 80
1991 695 52
1982 551 45
1986 550 40
1931 568 49
1979 504 33
1988 560 50
1973 512 40
1981 448 31
1983 538 40
1968 410 27
1927 409 31
1969 504 35
1988 777 57
1987 496 31
1960 386 26
1984 530 39
1976 360 25
1920 355 23
1931 1250 102
1989 802 72
1907 741 57
1988 739 54
1990 650 56
1973 592 45
1983 577 42
1971 500 36
1969 469 30
1971 320 22
1988 441 31
1989 845 52
1973 435 29
1987 435 34
1931 375 20
1931 364 33
1924 340 18
1931 375 23
1991 450 30
1973 529 38
1976 412 31
1990 722 62
1983 574 48
1984 498 29
1986 493 40
1986 379 30
1992 579 42
1973 458 36
1988 454 33
1979 952 72
1972 784 57
1930 476 34
1978 453 46
1978 440 30
1977 428 21
Draw a scatterplot with stories in the x-axis and height in the y-axis. Describe the trend, strength and shape of the relationship between stories and height.
Find the linear correlation coefficient between these variables. How does it support the description you gave in (b)?
Draw diagnostic plots (a plot of stories vs. residuals, and a normal probability plot for the residuals). Do assumptions appear to be satisfied?
Obtain a 95% confidence interval for the true value of the slope. How does the interval support your conclusion in (e)?
What is the estimated height of a building that is 45 stories high? Write a concluding sentence supported by your results above.
~~~~~~~~~~~~(Please display all RCode)~~~~~~~~~~~~~~
> Height=c(770 ,
677 ,428 ,410 ,371 ,504 ,1136 ,695 ,551 ,550 ,
+
568 ,504 ,560 ,512 ,448 ,538 ,410 ,409 ,504 ,777 ,496 ,386 ,
+
530 ,360 ,355 ,1250 ,802 ,741 ,739 ,650 ,592 ,577 ,500 ,469 ,
+
320 ,441 ,845 ,435 ,435 ,375 ,364 ,340 ,375 ,450 ,529 ,412 ,
+
722 ,574 ,498 ,493 ,379 ,579 ,458 ,454 ,952 ,784 ,476 ,453 ,
+ 440 ,428 )
>
>
Stories=c(54,47,28,38,29,38,80,52,45,40,49,33,50,40,31,40,27,31,35,57,31,
+
26,39,25,23,102,72,57,54,56,45,42,36,30,22,31,52,29,34,20,33,18,23,30,38,
+ 31,62,48,29,40,30,42,36,33,72,57,34,46,30,21)
>
>
>
> ## 1) Scatter plot
>
> plot(Stories~Height)
>
>
>
>
> #
> plot(Height~Stories)
Comment: The trend is used to predict future values based on recently observed data. But, this data does not appropriate to describe the trend. From the scatter plot, increase the value of stories increases the values of Height and vice versa. Hence, the strength is strong and the shape of the relationship is linear between stories and height.
# the linear correlation coefficient between these variables is
> cor(Height, Stories)
[1] 0.9505549
Comment: The estimated correlation value is 0.9505549. It is a positive value and more than 0.5. Hence, the two variables have a strong positive association.
> ### Linera model
>
> model=lm(Height~Stories)
> res=residuals(model)
>
> plot(res, Stories)
Comment: From the above plot between the residuals and stories, we observed that if we remove around four points which is available at 0 to 90 degree, the relationship is slightly negatively linear. Whereas, as all the relationship is random and satisfied the independence between the residual and stories.
## Normal probability plot
qqnorm(res)
qqline(res)
From the normal probability plot for the residuals, the values of the residuals at both the tails are dispersed from the straight line. Hence, the residuals have a high peak, so the normal assumption on the residual may not be satisfied.
# The 95% confidence interval for the true value of the slope is
> confint(model)
2.5 % 97.5 %
(Intercept) 48.34928 132.26993
Stories 10.32267 12.26208
# The 95% confidence interval for the true value of the slope is (10.32267, 12.26208).
The 95% confidence interval does not include the value zero. Hence, we can conclude that stories have a significant effect on height at 0.05 level of significance.
## Prediction
> new.Stories<- data.frame(
+ Stories = c(45)
+ )
>
> predict(model, newdata =new.Stories, interval =
"confidence")
fit lwr upr
1 598.4665 582.7431 614.1899
The estimated height of a building that is 45 stories high is 598.4665.
When increasing the stories high by 45 the mean estimated height is 598.36.