Question

In: Statistics and Probability

This problem uses the data set Heights from the alr4 package, which contains the heights of...

  1. This problem uses the data set Heights from the alr4 package, which contains the

    heights of n = 1375 pairs of mothers (mheight) and daughters (dheight) in inches. (Solve this problem in r)

    1. (a) Compute the regression of dheight on mheight, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two that summarizes the results of these computa- tions.

    2. (b) Obtain a 99% confidence interval for β1 from the data.

    3. (c) Obtain a predicted value and 90% prediction interval for a daughter whose mother

      is 58 inches tall.

Solutions

Expert Solution

a) R Code with comments

#load the library alr4
library(alr4)
#print some records from the dataset
head(Heights)

#part a)
#regression
fit<-lm(dheight~mheight,data=Heights)
#get the summary information
fit.s<-summary(fit)
#print the results
sprintf('Estimate of the intercept is %.4f',fit.s$coef[1,1])
sprintf('The standard error of the intercept estimate is %.4f',fit.s$coef[1,2])
sprintf('Estimate of the slope is %.4f',fit.s$coef[2,1])
sprintf('The standard error of the slope estimate is %.4f',fit.s$coef[2,2])
sprintf('The value of the coefficient of determination is %.4f',fit.s$r.squared)
sprintf('The value of the estimate of variance is %.4f',fit.s$sigma^2)

# get this

the estimated regression equation is

ans: The positive slope value of 0.5417 indicates that the heights of mother and daughter move in the same direction. That means, for 1 inch increase in the height of the mother, the predicted height of a daughter increases by 0.5417 inches. The coefficient of determination of 0.2408 indicates that 24.08% variation in daughter's height is explained by the height of the mother.

b) 99% confidence interval for slope

R code

#part b
ci<-confint(fit,'mheight',level=0.99)
sprintf('A 99%% confidence interval for the slope is [%.4f,%.4f]',ci[1],ci[2])

#get this

c) 90% prediction interval for mheight=58 inches

R code

#part c)
p.i<-predict(fit,newdata=list(mheight=58),interval="prediction", level=.90)
sprintf('The predicted value of the height of a daughter %.4f inches',p.i[1])
sprintf('The 90%% prediction interval for a daughter is [%.4f,%.4f]',p.i[2],p.i[3])

#get this

All code together

------------

#install the package for the first time
#install.packages('alr4')

#load the library alr4
library(alr4)
#print some records from the dataset
head(Heights)

#part a)
#regression
fit<-lm(dheight~mheight,data=Heights)
#get the summary information
fit.s<-summary(fit)
#print the results
sprintf('Estimate of the intercept is %.4f',fit.s$coef[1,1])
sprintf('The standard error of the intercept estimate is %.4f',fit.s$coef[1,2])
sprintf('Estimate of the slope is %.4f',fit.s$coef[2,1])
sprintf('The standard error of the slope estimate is %.4f',fit.s$coef[2,2])
sprintf('The value of the coefficient of determination is %.4f',fit.s$r.squared)
sprintf('The value of the estimate of variance is %.4f',fit.s$sigma^2)

#part b
ci<-confint(fit,'mheight',level=0.99)
sprintf('A 99%% confidence interval for the slope is [%.4f,%.4f]',ci[1],ci[2])

#part c)
p.i<-predict(fit,newdata=list(mheight=58),interval="prediction", level=.90)
sprintf('The predicted value of the height of a daughter %.4f inches',p.i[1])
sprintf('The 90%% prediction interval for a daughter is [%.4f,%.4f]',p.i[2],p.i[3])

-----------


Related Solutions

******. The openintro package contains a data set called bdims, ****which consists of the body dimensions...
******. The openintro package contains a data set called bdims, ****which consists of the body dimensions of 507 physically active individuals. ****Complete a full multivariate regression analysis, ***predicting the variable wgt (weight) using all significant elements. *****You should do a stepwise variable selection procedure, and explore the data. ******** R code
The data set below contains 100 records of heights and weights for some current and recent Major...
The data set below contains 100 records of heights and weights for some current and recent Major League Baseball (MLB) players. Note: BMI 18.5 - 24.9 normal group, 25 - 29.9 overweight group and > 30 obese group.  Use the data set to answer the following questions in order: 1.A researcher believes that there is a difference between the BMI of players in the National League vs American League. At a 5% level of significance, is there enough evidence to support the researcher’s claim....
Use R studio to do this problem. This problem uses the wblake data set in the...
Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im() (a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two...
Data set “quine” from MASS package children from an Australian town is classified by ethnic background,...
Data set “quine” from MASS package children from an Australian town is classified by ethnic background, gender, age, learning status and the number of days absent from school. The columns “Eth” indicates whether the student is Aboriginal or not (“A” or “N”), and the column Sex indicates Male or Female (“M” or “F”). a) Print the first five observations of the data. b) Is the proportion of aboriginal female different from that of male? Use R to solve and show...
DaughtersHeight is a data set on the height of adult daughters and the heights of their...
DaughtersHeight is a data set on the height of adult daughters and the heights of their mothers and fathers, all in inches. The data were extracted from the US Department of Health and Human Services, Third National Health and Nutrition Examination Survey (use R studio for graphing). Gender daughtersheight mothersheight fathersheight F 58.6 63 64 F 64.7 67 65 F 65.3 64 67 F 61 60 72 F 65.4 65 72 F 67.4 67 72 F 60.9 59 67 F...
Problem (7) With the rmr data set (ISwR package), plot metabolic rate versus body weight. Fit...
Problem (7) With the rmr data set (ISwR package), plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body weight of 80kg?
DaughtersHeight data set on the height of adult daughters and the heights of their mothers and...
DaughtersHeight data set on the height of adult daughters and the heights of their mothers and fathers, all in inches. analyze these data with child height as the dependent variable. using R studio graph the data and Conduct a separate analysis with Type I and Type III sum of squares tables. Gender daughtersheight mothersheight fathersheight F 58.6 63 64 F 64.7 67 65 F 65.3 64 67 F 61 60 72 F 65.4 65 72 F 67.4 67 72 F...
The following data set shows the heights in inches for the boys in a class of...
The following data set shows the heights in inches for the boys in a class of 20 students. 66; 66; 67; 67; 68; 68; 68; 68; 68; 69; 69; 69; 70; 71; 72; 72; 72; 73; 73; 80 a/ Find the 5-number summary b/Find the interquartile range IQR c/ Are there any outliers? If there are list them d/ Construct a box plot Show Your Work
DaughtersHeight data set on the height of adult daughters and the heights of their mothers and...
DaughtersHeight data set on the height of adult daughters and the heights of their mothers and fathers, all in inches. analyze these data with child height as the dependent variable. using R studio graph the data (the problem is a multiple regression problem, include axis and a legend). Gender daughtersheight mothersheight fathersheight F 58.6 63 64 F 64.7 67 65 F 65.3 64 67 F 61 60 72 F 65.4 65 72 F 67.4 67 72 F 60.9 59 67...
Use the Moneyball data set which contains selected statistics for Major League Baseball teams from 1962–2012....
Use the Moneyball data set which contains selected statistics for Major League Baseball teams from 1962–2012. Based on historical data, the probability that in a given year the NYM will make the playoffs is p = 7/47 = 0.149. Let X be the discrete random variable that gives the total number of Playoffs made by NYM in the last 20 years, i.e., from 1993 to 2012.                                                                                                                     (12 Points) What is the probability that the total...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT