In: Statistics and Probability
Linear functions of the form y = ax + b are used to describe many real-world situations. Research any relationship between two variables that occurs in your field of study or career and then:
Describe the two variables. Which is the input (x) variable and which is the output, or y, variable?
Show an example of this relationship with real-world data from one of the following sources: magazine, internet, advertisement, newspaper, professional journal, college professor, or professional in the field. Create a scatterplot to show the data. Does the relationship appear to be linear? Why or why not? Explain.
Use Excel to find the regression equation, r2-value, and r-value for this data set. Show the equation and the r-value on your scatterplot. Using Table A-6 in the text, determine whether your r-value is significant. Explain.
Here we have to see the real-life example of linear relationships of the form y = ax + b. There are so many situations where we need to use linear regression equation for the prediction of one variable based on other variable(s). Let us consider the example of house size and price of the house. It was observed that there is a strong positive linear relationship or association exists between the two variables house size and price of the house. We have to check this fact by using some real-life data from internet. For convenience, we only collect the data for 20 houses for the two variables house size and house price only. We do not consider other variables in this scenario. There are so many websites available for the collection of house size and price data and we can easily collect this data from these sites. The collected data for house size and price is given as below:
House Size (in sq.ft.) (X) |
House Price (in $) (Y) |
1500 |
278028 |
1013 |
218062 |
1702 |
342077 |
926 |
201107 |
1489 |
273995 |
1139 |
255494 |
1715 |
342467 |
1050 |
244453 |
1565 |
296478 |
1536 |
291101 |
952 |
211094 |
1537 |
281247 |
720 |
183944 |
786 |
173887 |
1598 |
307292 |
1526 |
278920 |
1196 |
244930 |
1779 |
341873 |
1418 |
263763 |
1486 |
308360 |
Suppose we have to estimate the price of the house for the given value for the size of a house. For this purpose, we have to construct the regression model for the prediction of the dependent variable or response variable price of the house in US$ based on the independent variable or explanatory variable house size in square feet. First of all we have to check whether these two variables have any relationship or not. For checking this relationship we have to use scatterplot which is given as below:
From above scatter plot it is observed that there is a very strong positive linear relationship or association exists between given two variables house size and house price.
Now, we have to find out the regression equation for the prediction of the dependent variable house price based on the independent variable house size. The regression model by using excel is given as below:
SUMMARY OUTPUT |
||||||
Regression Statistics |
||||||
Multiple R |
0.964499232 |
|||||
R Square |
0.930258769 |
|||||
Adjusted R Square |
0.926384257 |
|||||
Standard Error |
13751.36527 |
|||||
Observations |
20 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
1 |
45402347459 |
45402347459 |
240.0969657 |
7.47765E-12 |
|
Residual |
18 |
3403800843 |
189100046.9 |
|||
Total |
19 |
48806148303 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
67906.98928 |
13207.13275 |
5.14169052 |
6.8391E-05 |
40159.83303 |
95654.14553 |
House Size (in sq.ft.) (X) |
149.4548948 |
9.645323706 |
15.49506262 |
7.47765E-12 |
129.1908217 |
169.718968 |
From above regression model, the regression equation for the prediction of dependent variable house price is given as below:
House Price = 67906.99 + 149.45*House size
For this regression equation, the y-intercept is given as 67906.99 and slope for this regression equation is given as 149.45.
The correlation coefficient between two variables is given as r = 0.9645 which indicate a strong positive linear association between two variables.
We have sample size = n = 20, df = n – 2 = 20 – 2 = 18
We consider level of significance = α = 0.05
Critical value = 0.444 (by using table)
r = 0.9645 > critical value, so we conclude that this correlation coefficient is statistically significant.
The value for R square or coefficient of determination is given as 0.9303, this means about 93.03% of the variation in the dependent variable house price is explained by the independent variable house size.