In: Statistics and Probability
Identify two important operating standards that are routinely measured in your organization. These measures must be “paired observations” and must also meet the criteria for a quantitative measure (i.e. either interval or ratio level of measurement). Obtain 20 paired observations.
Example: Standard measures that fit these requirements might be total electricity demand measured in kilowatts (ratio level), and outside temperature measured in degrees Fahrenheit (interval level). The hotter outside temperature the higher the electricity demands will be.
1) Identify which of the two variables is the logical choice for the dependent variable (Y), and which is the independent variable (X). In the example above the electricity demand is a logical choice for the dependent variable (Y), and outside temperature is a logical choice for independent variable (X).
2) Using the same variables, perform a simple regression on EXCEL, and determine the regression equation.
3) Identify the Correlation coefficient, and coefficient of determination from the Excel output.
4) Interpret your correlation and regression coefficients. Is simple regression a good prediction model for the example you have provided?
Data Set:
Price of diamond based on carats: | ||
Test | Price (thousands) | Carat |
1 | 8 | 0.86 |
2 | 11.5 | 1.13 |
3 | 16 | 1.85 |
4 | 6 | 0.67 |
5 | 22 | 2.25 |
6 | 10 | 1.01 |
7 | 10 | 0.9 |
8 | 34 | 3.1 |
9 | 28 | 2.8 |
10 | 9 | 0.75 |
11 | 19 | 1.81 |
12 | 12 | 1.32 |
13 | 14 | 1.03 |
14 | 8 | 0.97 |
15 | 21 | 2.3 |
16 | 8 | 0.56 |
17 | 10 | 0.75 |
18 | 14 | 1.15 |
19 | 14 | 1.33 |
20 | 24 | 2.01 |
1) Carat is the independent(X) variable and Price(in 1000) is the dependent(Y) variable as the price of diamonds will be based on the carats.
2) the excel output is:
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 0.010274 | 0.096522 | 0.106445 | 0.916407 | -0.19251 | 0.213059 | -0.19251 | 0.213059 |
X Variable 1 | 0.094956 | 0.005804 | 16.36047 | 2.99E-12 | 0.082763 | 0.10715 | 0.082763 | 0.10715 |
thus the equation is: y= 0.010274 + (0.094956 * x)
3) the excel output is:
Regression Statistics | |
Multiple R | 0.967982 |
R Square | 0.936989 |
Adjusted R Square | 0.933489 |
Standard Error | 0.190398 |
Observations | 20 |
the correlation coefficient ,r = 0.967982
coefficient of determination , R2 = 0.936989
4) r = 0.967982 is high and close to 1. thus there is correlation between X and Y
intercept = 0.010274 and its pvalue= 0.916407 (> 0.05), thus it is not significant.
slope = 0.094956 and its pvalue = 2.99E-12 (< 0.05), thus it is significant.
the excel output for the significance of the model is :
df | SS | MS | F | Significance F | |
Regression | 1 | 9.703249 | 9.703249 | 267.665 | 2.99E-12 |
Residual | 18 | 0.652526 | 0.036251 | ||
Total | 19 | 10.35578 |
here, pvalue of the model = 2.99E-12(< 0.05), thus the model is significant and simple regression equation is a good model for prediction.