In: Statistics and Probability
1.Understand, explain and apply regression theory
2. Choose or collect a data file, use regression theory to set up a model, calculate parameters value, get the regression model, analysis the meaning of model, including R square, F-test, t-test, explain the relation between dependent variable and independent variables. During the analysis, you need to represent the chart, correlation, regression output table. You’d better choose the multiple regression model, preferably one that includes dummy variables.
3. You also need to send me the operation in excel, the excel file.
Answers:
Ans 1)
Multiple Linear Regression Theory:
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable
Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target, or criterion variable).
Ans 2)
Collected Data: To compare the smokers and non-smokers
Dependent variable = Risk
Independent variable = Age, Pressure, Smoker
Categorical variable or dummy variable = Smoker, we assign the Yes = 1 and No = 0
=IF(E4="Yes",1,0)
Risk | Age | Pressure | Smoker | Risk(Y) | Age (X1) | Pressure(X2) | Smoker (X3) | |
12 | 57 | 152 | No | 12 | 57 | 152 | 0 | |
24 | 67 | 163 | No | 24 | 67 | 163 | 0 | |
13 | 58 | 155 | No | 13 | 58 | 155 | 0 | |
56 | 86 | 177 | Yes | 56 | 86 | 177 | 1 | |
28 | 59 | 196 | No | 28 | 59 | 196 | 0 | |
51 | 76 | 189 | Yes | 51 | 76 | 189 | 1 | |
18 | 56 | 155 | Yes | 18 | 56 | 155 | 1 | |
31 | 78 | 120 | No | 31 | 78 | 120 | 0 | |
37 | 80 | 135 | Yes | 37 | 80 | 135 | 1 | |
15 | 78 | 98 | No | 15 | 78 | 98 | 0 | |
22 | 71 | 152 | No | 22 | 71 | 152 | 0 | |
36 | 70 | 173 | Yes | 36 | 70 | 173 | 1 | |
15 | 67 | 135 | Yes | 15 | 67 | 135 | 1 | |
48 | 77 | 209 | Yes | 48 | 77 | 209 | 1 | |
15 | 60 | 199 | No | 15 | 60 | 199 | 0 | |
36 | 82 | 119 | Yes | 36 | 82 | 119 | 1 | |
8 | 66 | 166 | No | 8 | 66 | 166 | 0 | |
34 | 80 | 125 | Yes | 34 | 80 | 125 | 1 | |
3 | 62 | 117 | No | 3 | 62 | 117 | 0 | |
37 | 59 | 207 | Yes | 37 | 59 | 207 | 1 |
Regression Output:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.934605 | |||||
R Square | 0.873487 | |||||
Adjusted R Square | 0.849766 | |||||
Standard Error | 5.756575 | |||||
Observations | 20 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 3660.74 | 1220.247 | 36.82301 | 2.06E-07 | |
Residual | 16 | 530.2104 | 33.13815 | |||
Total | 19 | 4190.95 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -91.7595 | 15.22276 | -6.02778 | 1.76E-05 | -124.03 | -59.4887 |
Age (X1) | 1.076741 | 0.165964 | 6.487814 | 7.49E-06 | 0.724914 | 1.428568 |
Pressure(X2) | 0.251813 | 0.045226 | 5.567951 | 4.24E-05 | 0.15594 | 0.347687 |
Smoker (X3) | 8.739871 | 3.000815 | 2.912499 | 0.010174 | 2.378427 | 15.10132 |
RESIDUAL OUTPUT | PROBABILITY OUTPUT | |||||
Observation | Predicted Risk(Y) | Residuals | Standard Residuals | Percentile | Risk(Y) | |
1 | 7.89039 | 4.10961 | 0.777953 | 2.5 | 3 | |
2 | 21.42775 | 2.572252 | 0.48693 | 7.5 | 8 | |
3 | 9.722571 | 3.277429 | 0.62042 | 12.5 | 12 | |
4 | 54.15109 | 1.848912 | 0.350001 | 17.5 | 13 | |
5 | 21.12366 | 6.876335 | 1.301696 | 22.5 | 15 | |
6 | 46.40544 | 4.594561 | 0.869754 | 27.5 | 15 | |
7 | 16.30896 | 1.69104 | 0.320115 | 32.5 | 15 | |
8 | 22.44392 | 8.556079 | 1.619674 | 37.5 | 18 | |
9 | 37.11448 | -0.11448 | -0.02167 | 42.5 | 22 | |
10 | 16.90402 | -1.90402 | -0.36043 | 47.5 | 24 | |
11 | 22.96476 | -0.96476 | -0.18263 | 52.5 | 28 | |
12 | 35.91598 | 0.084023 | 0.015906 | 57.5 | 31 | |
13 | 23.11684 | -8.11684 | -1.53653 | 62.5 | 34 | |
14 | 52.51845 | -4.51845 | -0.85535 | 67.5 | 36 | |
15 | 22.95585 | -7.95585 | -1.50605 | 72.5 | 36 | |
16 | 35.23894 | 0.761058 | 0.144069 | 77.5 | 37 | |
17 | 21.10645 | -13.1064 | -2.48106 | 82.5 | 37 | |
18 | 34.59634 | -0.59634 | -0.11289 | 87.5 | 48 | |
19 | 4.460623 | -1.46062 | -0.2765 | 92.5 | 51 | |
20 | 32.63348 | 4.366516 | 0.826585 | 97.5 | 56 |
Regression Model:
Predicted risk = -91.759 + 1.076 x Age + 0.251 x Pressure + 8.739 x Smoker
Interpretations:
2) All three independent variable age, pressure, and smoker are statistically significant because all three variables have the p-value is less than 0.05.
3) r is always between -1 and 1 inclusive. The R-squared value, denoted by R 2, is the square of the correlation. It measures the proportion of variation in the dependent variable that can be attributed to the independent variable. Correlation r = 0.9; R=squared = 0.87. Small positive linear association
Graphs:
1) Residual Plots:
2) Line Fit Plot:
3) Normal Probability Plot:
Ans 3).
Age (X1) Residual Plot 10 0 Residuals 20 40 60 80 100 -10 -20 Age (X1)
Pressure(X2) Residual Plot 10 0 Residuals 0 50 100 150 200 250 -10 -20 Pressure(X2)
Smoker (X3) Residual Plot 10 Residuals 0.2 0.4 0.6 0.8 1.2 -10 -20 Smoker (X3)
Age (X1) Line Fit Plot 60 40 Risk(Y) 20 Risk(Y) Predicted Risk(Y) 0 0 100 50 Age (X1)
We were unable to transcribe this image
Smoker (X3) Line Fit Plot 60 40 Risk(Y) 20 Risk(Y) Predicted Risk(Y) 0 0 0.5 1 1.5 Smoker (X3)
Normal Probability Plot 60 40 Risk(Y) 20 0 0 20 100 120 40 60 80 Sample Percentile
We were unable to transcribe this image