The data below are for 30 people. The
independent variable is “age” and the dependent variable is
“systolic blood pressure.” Also, note that the variables are
presented in the form of vectors that can be used in R.
age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69)
systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175)
- Using R, develop and show a scatterplot of systolic blood
pressure (dependent variable) by age (independent variable), and
calculate the correlation between these two variables.
- Assume that these data are “straight enough” to model using a
linear regression line. Develop and show that model (write out the
model in the terms of the problem), and also show in a plot the
line that best fits these data.
- Plot the residuals and comment on what you see as to how
appropriate the model is.
- Using a boxplot, determine if there are any outliers in
systolic blood pressure. If so, point out which points are
outliers, if any.
- Assuming there is at least one outlier in systolic blood
pressure, remove that outlier and re-do parts a) through c) again
using the remaining data without the outlier(s). State and comment
on this second model.
- In your second model, explain in the context of age and
systolic blood pressure what the slope of your fitted line means.
Also, for your second model, calculate R2 (the
coefficient of determination), and explain what that means in the
context of your second model.