In: Statistics and Probability
******. The openintro package contains a data set called bdims, ****which consists of the body dimensions of 507 physically active individuals. ****Complete a full multivariate regression analysis, ***predicting the variable wgt (weight) using all significant elements. *****You should do a stepwise variable selection procedure, and explore the data. ******** R code
Solution:
Given that
The openintro package contains a data set called bdims.
which consists of the body dimensions of 507 physically active individuals.??
# Install the package
install.packages("openintro")
library(openintro)
# Data heading
head(bdims)
# Full model with all the covariates/independent variable
model<-lm(wgt~ ., data=bdims)
summary(model)
# From the out we see that many variables are non-significant
model2<-lm(wgt~ che.de + sho.gi+ che.gi+wai.gi+ hip.gi+
thi.gi+ for.gi+
kne.gi + cal.gi+ age+ hgt+ sex , data=bdims)
summary(model2)
# Histogram
hist(bdims$wgt)
hist(bdims$che.di)
hist(bdims$sho.gi)
hist(bdims$che.gi)
hist(bdims$wai.gi)
hist(bdims$hip.gi)
hist(bdims$thi.gi)
hist(bdims$for.gi)
hist(bdims$kne.gi)
hist(bdims$cal.gi)
First Model:
We have used all predictor variable. From output we see that
so many variables are insignificant as there p-value is >
0.05
so, we reject the corresponding H0.
H0:βi=0 vs H1:βi!=0 here i=1,2,....,25
Usual stepwise regression says remove the most insignificant
variable one at
a time and observe the R^2 - value.
But here we will remove all insignificant variable at one go.
From output2, it is to be observe that although we have removed
so many
variables but still our R^2 value is 97.48% which is quite good
infact excellent.
Regression line would be
wgt=-120.35+0.2462(che.de)+sho.gi(0/09)+0.16(che.gi)+0.36(wai.gi)+0.24(lip.gi)+0.24(thi.gi)
+0.62(far.gi)+0.27(kne.gi)+0.39(cal.gi)-0.04(age)+0.32(hgt)-1.09(sex)