In: Math
The data below are for 30 people. The independent variable is “age” and the dependent variable is “systolic blood pressure.” Also, note that the variables are presented in the form of vectors that can be used in R.
age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69)
systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175)
>
age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69)
;age
[1] 39 47 45 47 65 46 67 42 67 56 64 56 59 34 42 48 45 17 20 19 36
50 39 21 44
[26] 53 63 29 25 69
>
systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175);systolic.BP
[1] 144 20 138 145 162 142 170 124 158 154 162 150 140 110 128 130
135 114 116
[20] 124 136 142 120 120 160 158 144 130 125 175
#a) Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables
Ans
> plot(age,systolic.BP)
> cor(age,systolic.BP)
[1] 0.5032292
#. b) Assume that these data are “straight enough” to model using a linear regression line. Develop and show that model (write out the model in the terms of the problem), and also show in a plot the line that best fits these data
Ans:
> mod1=lm(systolic.BP~age);mod1
Call:
lm(formula = systolic.BP ~ age)
Coefficients:
(Intercept) age
94.5320 0.9158
> summary(mod1)
Call:
lm(formula = systolic.BP ~ age)
Residuals:
Min 1Q Median 3Q Max
-117.576 -3.934 4.760 8.765 25.171
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.5320 14.1390 6.686 2.95e-07 ***
age 0.9158 0.2972 3.081 0.00459 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 24.48 on 28 degrees of freedom
Multiple R-squared: 0.2532, Adjusted R-squared: 0.2266
F-statistic: 9.495 on 1 and 28 DF, p-value: 0.004588
> abline(lm(systolic.BP~age))
c) Plot the residuals and comment on what you see as to how appropriate the model is
Ans> residual=resid(mod1)
> plot(age,residual,ylab="residual",xlab="age"
>
#d) Using a boxplot, determine if there are any outliers in systolic blood pressure. If so, point out which points are outliers, if any.
outlier.values=boxplot(systolic.BP)$out;outlier.values
[1] 20
> boxplot(systolic.BP)
>
# 20 is outlier in in systolic