In: Statistics and Probability
A doctor wanted to determine whether there is a relation between a male's age and his HDL (so-called good) cholesterol. The doctor randomly selected 17 of his patients and determined their HDL cholesterol. The data obtained by the doctor is the in the data table below. Complete parts (a) through (f) below.
Age, x | HDL Cholesterol, y | Age, x | HDL Cholesterol, y |
38 | 56 | 39 | 46 |
44 | 56 | 65 | 64 |
45 | 32 | 28 | 52 |
33 | 57 | 53 | 37 |
53 | 37 | 29 | 44 |
54 | 40 | 50 | 38 |
62 | 40 | 50 | 57 |
60 | 40 | 40 | 27 |
28 | 45 |
b) determine the least-squares regression equation from the sample data.
y = __ x +__
c) are there any outliers or influential observations?
d) assuming the residuals are normally distributed, test whether a linear relation exists between age and HDL cholesterol levels at the a = .01 level of significance.
e) assuming the residuals are normally distributed, construct a 95% confidence interval about the slope of the true least-squares regression line.
f) for a 42-year-old male patient who visits the doctor's office, would using the least-squares regression line obtained in part (b) to predict the HDL cholesterol of this patient be recommended?
We perform a linear regression analysis for the variables HDL cholestrol( dependent ) and Age in years( independent) and the results are given below
b) The regression equation is y= -0.0796x+48.79.
c) Outliers are values below First quartile-1.5*IQR and above Third quartile +1.5*IQR ie values outside the range (14.8,75.8). There is no such values outside the range in the data . So no outlier
d) The hypothesis test to test for the significance of linear model is
The test statistic to test the above hypothesis is
The p value is the value obtained from the t table for 15 degrees of freedom and 1% significance level ie 0.3656. Since pvalue is >0.05 we fail to reject the null hypothesis. that population correlation coefficient =0. So predictions based on the model cannot be relied as there is no relationship between the variables.R-square value (0.0085) indicates that only 0.8% of the variation in dependent variable (HDL) can be explained by the Age in years
The relationship between Age and HDL is not statistically significant at 1% significance level. So there is no relationship between them.
f) y= -0.0796*42+48.79.45.85
e)
=0.2205
Margin of ERRor =S.E*2.131( t value for 15 degrees of freedom and probability value 0.95)
There fore the confidence interval for the slope value is (slope-M.E,slope+M.E) =(-0.549,0.391)