Question

In: Statistics and Probability

1.Understand, explain and apply regression theory 2. Choose or collect a data file, use regression theory...

1.Understand, explain and apply regression theory

2. Choose or collect a data file, use regression theory to set up a model, calculate parameters value, get the regression model, analysis the meaning of model, including R square, F-test, t-test, explain the relation between dependent variable and independent variables. During the analysis, you need to represent the chart, correlation, regression output table. You’d better choose the multiple regression model, preferably one that includes dummy variables.

3. You also need to send me the operation in excel, the excel file.

Solutions

Expert Solution

Answers:

Ans 1)

Multiple Linear Regression Theory:

Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest. While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target, or criterion variable).

Ans 2)

Collected Data: To compare the smokers and non-smokers

Dependent variable = Risk

Independent variable = Age, Pressure, Smoker

Categorical variable or dummy variable = Smoker, we assign the Yes = 1 and No = 0

=IF(E4="Yes",1,0)

Risk Age Pressure Smoker Risk(Y) Age (X1) Pressure(X2) Smoker (X3)
12 57 152 No 12 57 152 0
24 67 163 No 24 67 163 0
13 58 155 No 13 58 155 0
56 86 177 Yes 56 86 177 1
28 59 196 No 28 59 196 0
51 76 189 Yes 51 76 189 1
18 56 155 Yes 18 56 155 1
31 78 120 No 31 78 120 0
37 80 135 Yes 37 80 135 1
15 78 98 No 15 78 98 0
22 71 152 No 22 71 152 0
36 70 173 Yes 36 70 173 1
15 67 135 Yes 15 67 135 1
48 77 209 Yes 48 77 209 1
15 60 199 No 15 60 199 0
36 82 119 Yes 36 82 119 1
8 66 166 No 8 66 166 0
34 80 125 Yes 34 80 125 1
3 62 117 No 3 62 117 0
37 59 207 Yes 37 59 207 1

Regression Output:

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.934605
R Square 0.873487
Adjusted R Square 0.849766
Standard Error 5.756575
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3660.74 1220.247 36.82301 2.06E-07
Residual 16 530.2104 33.13815
Total 19 4190.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -91.7595 15.22276 -6.02778 1.76E-05 -124.03 -59.4887
Age (X1) 1.076741 0.165964 6.487814 7.49E-06 0.724914 1.428568
Pressure(X2) 0.251813 0.045226 5.567951 4.24E-05 0.15594 0.347687
Smoker (X3) 8.739871 3.000815 2.912499 0.010174 2.378427 15.10132
RESIDUAL OUTPUT PROBABILITY OUTPUT
Observation Predicted Risk(Y) Residuals Standard Residuals Percentile Risk(Y)
1 7.89039 4.10961 0.777953 2.5 3
2 21.42775 2.572252 0.48693 7.5 8
3 9.722571 3.277429 0.62042 12.5 12
4 54.15109 1.848912 0.350001 17.5 13
5 21.12366 6.876335 1.301696 22.5 15
6 46.40544 4.594561 0.869754 27.5 15
7 16.30896 1.69104 0.320115 32.5 15
8 22.44392 8.556079 1.619674 37.5 18
9 37.11448 -0.11448 -0.02167 42.5 22
10 16.90402 -1.90402 -0.36043 47.5 24
11 22.96476 -0.96476 -0.18263 52.5 28
12 35.91598 0.084023 0.015906 57.5 31
13 23.11684 -8.11684 -1.53653 62.5 34
14 52.51845 -4.51845 -0.85535 67.5 36
15 22.95585 -7.95585 -1.50605 72.5 36
16 35.23894 0.761058 0.144069 77.5 37
17 21.10645 -13.1064 -2.48106 82.5 37
18 34.59634 -0.59634 -0.11289 87.5 48
19 4.460623 -1.46062 -0.2765 92.5 51
20 32.63348 4.366516 0.826585 97.5 56

Regression Model:

Predicted risk = -91.759 + 1.076 x Age + 0.251 x Pressure + 8.739 x Smoker

Interpretations:

2) All three independent variable age, pressure, and smoker are statistically significant because all three variables have the p-value is less than 0.05.

3)  r is always between -1 and 1 inclusive. The R-squared value, denoted by R 2, is the square of the correlation. It measures the proportion of variation in the dependent variable that can be attributed to the independent variable. Correlation r = 0.9; R=squared = 0.87. Small positive linear association

Graphs:

1) Residual Plots:

2) Line Fit Plot:

3) Normal Probability Plot:

Ans 3).

Age (X1) Residual Plot 10 0 Residuals 20 40 60 80 100 -10 -20 Age (X1)

Pressure(X2) Residual Plot 10 0 Residuals 0 50 100 150 200 250 -10 -20 Pressure(X2)

Smoker (X3) Residual Plot 10 Residuals 0.2 0.4 0.6 0.8 1.2 -10 -20 Smoker (X3)

Age (X1) Line Fit Plot 60 40 Risk(Y) 20 Risk(Y) Predicted Risk(Y) 0 0 100 50 Age (X1)

We were unable to transcribe this image

Smoker (X3) Line Fit Plot 60 40 Risk(Y) 20 Risk(Y) Predicted Risk(Y) 0 0 0.5 1 1.5 Smoker (X3)

Normal Probability Plot 60 40 Risk(Y) 20 0 0 20 100 120 40 60 80 Sample Percentile

We were unable to transcribe this image


Related Solutions

Choose or collect a data file, use regression theory to set up a model, calculate parameters...
Choose or collect a data file, use regression theory to set up a model, calculate parameters value, get the regression model, analysis the meaning of model, including R square, F-test, t-test, explain the relation between dependent variable and independent variables. During the analysis, you need to represent the chart, correlation, regression output table. You’d better choose the multiple regression model, preferably one that includes dummy variables.
explain and apply regression theory
explain and apply regression theory
Is Watsons theory used to understand and apply into practice?
Is Watsons theory used to understand and apply into practice?
Consider the applications for home mortgages data in the file of P12_04.xlsx. Use multiple regression to...
Consider the applications for home mortgages data in the file of P12_04.xlsx. Use multiple regression to develop an equation that can be used to predict future applications for home mortgages (hint: use dummy variables for the quarters and create a time variable for the quarter numbers) Quarter Year Applications 1 1 96 2 1 114 3 1 112 4 1 81 1 2 97 2 2 103 3 2 120 4 2 99 1 3 105 2 3 110 3...
7. Chapter 13, Question 2: Use the attached data file “Chapter 13 Data Set 1” to...
7. Chapter 13, Question 2: Use the attached data file “Chapter 13 Data Set 1” to answer this question in the book. Do you agree with the author’s conclusion about whether practice time makes a difference? <15 Hours Practice 15-25 Hours Practice More than 25 Hours Practice 58.7 64.4 68 55.3 55.8 65.9 61.8 58.7 54.7 49.5 54.7 53.6 64.5 52.7 58.7 61 67.8 58.7 65.7 61.6 65.7 51.4 58.7 66.5 53.6 54.6 56.7 59 51.5 55.4 54.7 51.5 61.4...
Question 1 A residual is: choose one The difference between a data point and the regression...
Question 1 A residual is: choose one The difference between a data point and the regression line. A value that can be 1 or zero. A value that is always negative because it is a difference The difference between two different lines. Question 2 The correlation coefficient: choose one Is a number with a range from -1 to 1 If there is no correlation, the coefficient is negative If the correlation coefficient is negative, it indicates a strong positive relationship...
Using the Excel file Weddings, apply the Regression tool using the wedding cost as the dependent...
Using the Excel file Weddings, apply the Regression tool using the wedding cost as the dependent variable and attendance as the independent variable. What is the regression model? Interpret all key regression results, hypothesis tests, and confidence intervals in the output. If a couple is planning a wedding for 175 guests, how much should they budget? Couple's Income Bride's age Payor Wedding cost Attendance Value Rating $130,000 22 Bride's Parents $60,700.00 300 3 $157,000 23 Bride's Parents $52,000.00 350 1...
Assignment 1 Choose any one variable of interest (e.g., cups of coffee) and collect data from...
Assignment 1 Choose any one variable of interest (e.g., cups of coffee) and collect data from two independent samples (e.g., men vs. women, children vs. adults, college students vs. non-college students, etc.) could make up the data.. of minimum size n=5 each. Complete the following: Indicate whether your variable is continuous or discrete. Indicate which scale of measurement your variable is categorized as (nominal, ordinal, interval, or ratio). Calculate the mean, median, and mode for each sample. Provide a conclusion...
1*Explain the general theory of cost of capital and explain why businesses need to understand cost...
1*Explain the general theory of cost of capital and explain why businesses need to understand cost of capital.
In this project, you will collect data from real world to construct a multiple regression model....
In this project, you will collect data from real world to construct a multiple regression model. The resulting model will be used for a prediction purpose. For example, suppose you are interested in “sales price of houses”. In a multiple regression model, this is called a “response variable”. There are many important factors that affect the prices of houses. Those factors include size (square feet), number of bedrooms, number of baths, age of the house, distance to a major grocery...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT