Question

In: Statistics and Probability

We consider the multiple linear regression with LIFE (y) as the response variable, and MALE, BIRTH,...

We consider the multiple linear regression with LIFE (y) as the response variable, and MALE, BIRTH, DIVO , BEDS, EDUC, and INCO, as predictors.

QUESTION: Plot the standardized residuals against the fitted values. Are there any notable points. In particular look for points with large residuals or that may be influential.

# please screenshot the Rcode for the plot.

# data information are as follows:

"STATE" "MALE" "BIRTH" "DIVO" "BEDS" "EDUC" "INCO" "LIFE"
AK 119.1 24.8 5.6 603.3 14.1 4638 69.31
AL 93.3 19.4 4.4 840.9 7.8 2892 69.05
AR 94.1 18.5 4.8 569.6 6.7 2791 70.66
AZ 96.8 21.2 7.2 536.0 12.6 3614 70.55
CA 96.8 18.2 5.7 649.5 13.4 4423 71.71
CO 97.5 18.8 4.7 717.7 14.9 3838 72.06
CT 94.2 16.7 1.9 791.6 13.7 4871 72.48
DC 86.8 20.1 3.0 1859.4 17.8 4644 65.71
DE 95.2 19.2 3.2 926.8 13.1 4468 70.06
FL 93.2 16.9 5.5 668.2 10.3 3698 70.66
GA 94.6 21.1 4.1 705.4 9.2 3300 68.54
HW 108.1 21.3 3.4 794.3 14.0 4599 73.60
IA 94.6 17.1 2.5 773.9 9.1 3643 72.56
ID 99.7 20.3 5.1 541.5 10.0 3243 71.87
IL 94.2 18.5 3.3 871.0 10.3 4446 70.14
IN 95.1 19.1 2.9 736.1 8.3 3709 70.88
KS 96.2 17.0 3.9 854.6 11.4 3725 72.58
KY 96.3 18.7 3.3 661.9 7.2 3076 70.10
LA 94.7 20.4 1.4 724.0 9.0 3023 68.76
MA 91.6 16.6 1.9 1103.8 12.6 4276 71.83
MD 95.5 17.5 2.4 841.3 13.9 4267 70.22
ME 94.8 17.9 3.9 919.5 8.4 3250 70.93
MI 96.1 19.4 3.4 754.7 9.4 4041 70.63
MN 96.0 18.0 2.2 905.4 11.1 3819 72.96
MO 93.2 17.3 3.8 801.6 9.0 3654 70.69
MS 94.0 22.1 3.7 763.1 8.1 2547 68.09
MT 99.9 18.2 4.4 668.7 11.0 3395 70.56
NC 95.9 19.3 2.7 658.8 8.5 3200 69.21
ND 101.8 17.6 1.6 959.9 8.4 3077 72.79
NE 95.4 17.3 2.5 866.1 9.6 3657 72.60
NH 95.7 17.9 3.3 878.2 10.9 3720 71.23
NJ 93.7 16.8 1.5 713.1 11.8 4684 70.93
NM 97.2 21.7 4.3 560.9 12.7 3045 70.32
NV 102.8 19.6 18.7 560.7 10.8 4583 69.03
NY 91.5 17.4 1.4 1056.2 11.9 4605 70.55
OH 94.1 18.7 3.7 751.0 9.3 3949 70.82
OK 94.9 17.5 6.6 664.6 10.0 3341 71.42
OR 95.9 16.8 4.6 607.1 11.8 3677 72.13
PA 92.4 16.3 1.9 948.9 8.7 3879 70.43
RI 96.2 16.5 1.8 960.5 9.4 3878 71.90
SC 96.5 20.1 2.2 739.9 9.0 2951 67.96
SD 98.4 17.6 2.0 984.7 8.6 3108 72.08
TN 93.7 18.4 4.2 831.6 7.9 3079 70.11
TX 95.9 20.6 4.6 674.0 10.9 3507 70.90
UT 97.6 25.5 3.7 470.5 14.0 3169 72.90
VA 97.7 18.6 2.6 835.8 12.3 3677 70.08
VT 95.6 18.8 2.3 1026.1 11.5 3447 71.64
WA 98.7 17.8 5.2 556.4 12.7 3997 71.72
WI 96.3 17.6 2.0 814.7 9.8 3712 72.48
WV 93.9 17.8 3.2 950.4 6.8 3038 69.48
WY 100.7 19.6 5.4 925.9 11.8 3672 70.29

Solutions

Expert Solution

data <- read.csv("life.csv",header=T)
head(data)

model <- lm(LIFE~ MALE +BIRTH + DIVO+ BEDS + EDUC + INCO,data=data)
summary(model)
standaredRes <- rstandard (model)

yhat <- predict(model,data =data)
plot(yhat,standaredRes)

> summary(model)

Call:
lm(formula = LIFE ~ MALE + BIRTH + DIVO + BEDS + EDUC + INCO, 
    data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5563 -0.6629  0.0755  0.6983  3.3215 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 70.5577813  4.2897471  16.448  < 2e-16 ***
MALE         0.1261019  0.0472318   2.670  0.01059 *  
BIRTH       -0.5160558  0.1172775  -4.400 6.78e-05 ***
DIVO        -0.1965375  0.0739533  -2.658  0.01093 *  
BEDS        -0.0033392  0.0009795  -3.409  0.00141 ** 
EDUC         0.2368223  0.1110225   2.133  0.03853 *  
INCO        -0.0003612  0.0004598  -0.786  0.43633    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.176 on 44 degrees of freedom
Multiple R-squared:  0.4685,    Adjusted R-squared:  0.396 
F-statistic: 6.464 on 6 and 44 DF,  p-value: 6.112e-05

There are some influential points

there is one point where standardized residual is greater than 3

and few less than -2


Related Solutions

A MLR model have LIFE (y) as the response variable, and MALE (x1), BIRTH (x2), DIVO...
A MLR model have LIFE (y) as the response variable, and MALE (x1), BIRTH (x2), DIVO (x3), BEDS (x4), EDUC (x5), and INCO (x6), as predictors. I know you can use first fit the model using lm(y~x) then use anova(model) to check the SSreg,my question is, what is the difference between  SSreg(β2|β0,β3) and SSreg(β1|β0,β3,β2)? What should you put as the argument of lm() function with respect to (β2|β0,β3) and (β1|β0,β3,β2)
When we estimate a linear multiple regression model (including a linear simple regression model), it appears...
When we estimate a linear multiple regression model (including a linear simple regression model), it appears that the calculation of the coefficient of determination, R2, for this model can be accomplished by using the squared sample correlation coefficient between the original values and the predicted values of the dependent variable of this model. Is this statement true? If yes, why? If not, why not? Please use either matrix algebra or algebra to support your reasoning.
A multiple linear regression was performed using birth weight and gestational age to predict thyroid volume....
A multiple linear regression was performed using birth weight and gestational age to predict thyroid volume. Gestational age was included in the model because it is highly related to both thyroid volume and birth weight. The overall model was significant (p=.0001). Interpret the following regression results. Be sure to talk about all three parameters. Parameter Estimates Source DF Parameter Estimates Standard Error t-value p-value Intercept 1 336.31 168.35 2.00 0.0506 Birth Weight 1 0.1555 0.046 3.41 0.0012 Gestational Age 1...
Estimate a multiple linear regression relationship with the U.K. stock returns as the dependent variable, and...
Estimate a multiple linear regression relationship with the U.K. stock returns as the dependent variable, and U.K. Corporate Bond yield (Interest rate), U.S. Stock Returns, and Japan Stock Returns as the independent variables using the monthly data covering the sample period 1980-2017 (Finding the determinants of U.K. stock returns). Show the estimated regression relationship Conduct a t-test for statistical significance of the individual slope coefficients at the 1% level of significance. Provide the interpretation of the significant slope estimates. Conduct...
In a multiple linear regression how do we calculate the standard error of B2 ( we...
In a multiple linear regression how do we calculate the standard error of B2 ( we have two independent variables and a constant so we have B0 B1 and B2) how do we calculate the standard error of the three.
Question 3 Suppose that the estimated simple linear regression of a response Y on a predictor...
Question 3 Suppose that the estimated simple linear regression of a response Y on a predictor X based on n = 6 observations produces the following residuals: resid <- c(-0.09, 0.18, -0.27, 0.16, -0.06, 0.09) Note: For this question, all of the computations should be performed “by-hand”. (a) (1 point) What is the estimate of σ 2? (b) (2 points) Further, you know that the estimated regression parameters are βˆ 0 = −0.54 and βˆ 1 = 0.08. Additionally, the...
What are the similarities and differences between multiple linear regression, and logistic regression? Please consider Typical...
What are the similarities and differences between multiple linear regression, and logistic regression? Please consider Typical Application (used when), assumptions needed, Data Type
Assignment on Multiple Linear Regression                                     &nb
Assignment on Multiple Linear Regression                                                                                          The Excel file BankData shows the values of the following variables for randomly selected 93 employees of a bank. This real data set was used in a court lawsuit against discrimination. Let = monthly salary in dollars (SALARY), = years of schooling at the time of hire (EDUCAT), = number of months of previous work experience (EXPER), = number of months that the individual was hired by the bank (MONTHS), = dummy variable...
Discuss the application of multiple linear regression
Discuss the application of multiple linear regression
Linear Regression Linear regression is used to predict the value of one variable from another variable....
Linear Regression Linear regression is used to predict the value of one variable from another variable. Since it is based on correlation, it cannot provide causation. In addition, the strength of the relationship between the two variables affects the ability to predict one variable from the other variable; that is, the stronger the relationship between the two variables, the better the ability to do prediction. What is one instance where you think linear regression would be useful to you in...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT