Question

In: Statistics and Probability

******. The openintro package contains a data set called bdims, ****which consists of the body dimensions...

******. The openintro package contains a data set called bdims, ****which consists of the body dimensions of 507 physically active individuals. ****Complete a full multivariate regression analysis, ***predicting the variable wgt (weight) using all significant elements. *****You should do a stepwise variable selection procedure, and explore the data. ******** R code

Solutions

Expert Solution

Solution:

Given that

The openintro package contains a data set called bdims.

which consists of the body dimensions of 507 physically active individuals.??

# Install the package
install.packages("openintro")
library(openintro)

# Data heading
head(bdims)

# Full model with all the covariates/independent variable
model<-lm(wgt~ ., data=bdims)
summary(model)

# From the out we see that many variables are non-significant

model2<-lm(wgt~ che.de + sho.gi+ che.gi+wai.gi+ hip.gi+ thi.gi+ for.gi+
kne.gi + cal.gi+ age+ hgt+ sex , data=bdims)
summary(model2)

# Histogram
hist(bdims$wgt)

hist(bdims$che.di)

hist(bdims$sho.gi)

hist(bdims$che.gi)
hist(bdims$wai.gi)
hist(bdims$hip.gi)
hist(bdims$thi.gi)
hist(bdims$for.gi)
hist(bdims$kne.gi)
hist(bdims$cal.gi)

First Model:
We have used all predictor variable. From output we see that
so many variables are insignificant as there p-value is > 0.05
so, we reject the corresponding H0.

H0:βi=0 vs H1:βi!=0 here i=1,2,....,25

Usual stepwise regression says remove the most insignificant variable one at
a time and observe the R^2 - value.
But here we will remove all insignificant variable at one go.

From output2, it is to be observe that although we have removed so many
variables but still our R^2 value is 97.48% which is quite good infact excellent.

Regression line would be

wgt=-120.35+0.2462(che.de)+sho.gi(0/09)+0.16(che.gi)+0.36(wai.gi)+0.24(lip.gi)+0.24(thi.gi)
+0.62(far.gi)+0.27(kne.gi)+0.39(cal.gi)-0.04(age)+0.32(hgt)-1.09(sex)


Related Solutions

This problem uses the data set Heights from the alr4 package, which contains the heights of...
This problem uses the data set Heights from the alr4 package, which contains the heights of n = 1375 pairs of mothers (mheight) and daughters (dheight) in inches. (Solve this problem in r) (a) Compute the regression of dheight on mheight, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two that summarizes the results of these computa- tions. (b) Obtain a 99% confidence interval for...
1.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and...
1.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and heart weight (in grams) for 12 cats. Give the estimated regression line using the body weight as the predictor variable (x-variable) and the heart weight as the response variable (y-variable). Also, provide interpretations in terms of the problem for the slope and the y-intercept. 2.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and heart weight (in grams)...
1.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and...
1.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and heart weight (in grams) for 12 cats. Test at the 5% significance level that a positive linear relationship exists between the body weight of cat and their mean heart weight. Provides all parts of the test including hypotheses, test statistic, p-value, decision, and interpretation. 2.) The file cats.csv contains a data set consisting of the body weight (in kilograms) and heart weight (in grams)...
Problem (7) With the rmr data set (ISwR package), plot metabolic rate versus body weight. Fit...
Problem (7) With the rmr data set (ISwR package), plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body weight of 80kg?
R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone,...
R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone, solar radiation, wind and temperature measurements taken in New York city from May to September of 1973. Use the command read.csv to read the data set. Now write a code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Display the mean as a variables which includes the name of...
R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone,...
R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone, solar radiation, wind and temperature measurements taken in New York city from May to September of 1973. Use the command read.csv to read the data set. Now write a code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Display the mean as a variables which includes the name of...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treat- ments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those...
Use RStudio. The carsafety data set in the UsingR package has records of the number of...
Use RStudio. The carsafety data set in the UsingR package has records of the number of passenger deaths in the “Other.deaths” column and the type of vehicle in the type column. Determine which type of vehicle is the deadliest for passengers by looking at difference in variance between groups.
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains...
Load “Lock5Data” into your R console. Load “OlympicMarathon” data set in “Lock5Data”. This data set contains population of all times to finish the 2008 Olympic Men’s Marathon. a) What is the population size? b) Now using “Minutes” column generate a random sample of size 5. c) Calculate the sample mean and record it (create a excel sheet or write a direct R program to record this) d) Continue steps (b) and (c) 10,000 time (that mean you have recorded 10,000...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT