In: Math
When using r programming or statistical software:
(A) From the summary, which variables seem useful for predicting changes in independent variable?
(B) For the purpose of variable selection, does the ANOVA table provide any useful information not already in the summary?
Sol:
(A) From the summary, which variables seem useful for predicting changes in independent variable?
yes from summary of model look for p values .
Choose alpha=0.05
if p<0.05 then that variables are significant variables
if p>0.05 then that variables are not significant variables.
We can exclude that variabls
(B) For the purpose of variable selection, does the ANOVA table provide any useful information not already in the summary?
ANOVA does not serve the pupose of variable selection.
It says whether the model is important or not good.
from F statistic and p value we get that information
if p<0.05 model is significant.
we can use model for predicting dependent variable
Examle we have inbuilt dataset iris .we shall build a regression model as
sepal.length=dependent varaible rest all are independent variables
Rcode is
regmod <- lm(iris$Sepal.Length~.,data=iris)
summary(regmod)
anova(regmod)
output from summary
oefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.17127 0.27979 7.760 1.43e-12 ***
Sepal.Width 0.49589 0.08607 5.761 4.87e-08 ***
Petal.Length 0.82924 0.06853 12.101 < 2e-16 ***
Petal.Width -0.31516 0.15120 -2.084 0.03889 *
Speciesversicolor -0.72356 0.24017 -3.013 0.00306 **
Speciesvirginica -1.02350 0.33373 -3.067 0.00258 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3068 on 144 degrees of freedom
Multiple R-squared: 0.8673, Adjusted R-squared: 0.8627
F-statistic: 188.3 on 5 and 144 DF, p-value: < 2.2e-16
here p for Sepal.Width is p= 4.87e-08 ***
p<0.05 Sepal width is significant
p for Petal.Length is < 2e-16 ***
p<0.05
Petal.Length is signfiicant variable can be used for prediction
p for Petal.Width is 0.03889 *
p<0.05
Petal.Width is signfiicant variable can be used for prediction of sepal length
Speciesversicolor is significant as p =0.00306 and p<0.05
For
Speciesvirginica p=0.00258 ,p<0.05
Speciesvirginica is signifcant variable.
For Anova model
anova(regmod)
Analysis of Variance Table
Response: iris$Sepal.Length
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 1.412 1.412 15.0011 0.0001625 ***
Petal.Length 1 84.427 84.427 896.8059 < 2.2e-16 ***
Petal.Width 1 1.883 1.883 20.0055 1.556e-05 ***
Species 2 0.889 0.444 4.7212 0.0103288 *
Residuals 144 13.556 0.094
P here is p<0.05 model is significant.