Question

In: Math

The data below are for 30 people. The independent variable is “age” and the dependent variable...

The data below are for 30 people. The independent variable is “age” and the dependent variable is “systolic blood pressure.” Also, note that the variables are presented in the form of vectors that can be used in R.

age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69)

systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175)

  1. Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables.
  2. Assume that these data are “straight enough” to model using a linear regression line. Develop and show that model (write out the model in the terms of the problem), and also show in a plot the line that best fits these data.
  3. Plot the residuals and comment on what you see as to how appropriate the model is.
  4. Using a boxplot, determine if there are any outliers in systolic blood pressure. If so, point out which points are outliers, if any.
  5. Assuming there is at least one outlier in systolic blood pressure, remove that outlier and re-do parts a) through c) again using the remaining data without the outlier(s). State and comment on this second model.
  6. In your second model, explain in the context of age and systolic blood pressure what the slope of your fitted line means. Also, for your second model, calculate R2 (the coefficient of determination), and explain what that means in the context of your second model.

Solutions

Expert Solution

> age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69) ;age
[1] 39 47 45 47 65 46 67 42 67 56 64 56 59 34 42 48 45 17 20 19 36 50 39 21 44
[26] 53 63 29 25 69
> systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175);systolic.BP
[1] 144 20 138 145 162 142 170 124 158 154 162 150 140 110 128 130 135 114 116
[20] 124 136 142 120 120 160 158 144 130 125 175

#a) Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables

Ans
> plot(age,systolic.BP)

> cor(age,systolic.BP)
[1] 0.5032292

#. b) Assume that these data are “straight enough” to model using a linear regression line. Develop and show that model (write out the model in the terms of the problem), and also show in a plot the line that best fits these data

Ans:
> mod1=lm(systolic.BP~age);mod1

Call:
lm(formula = systolic.BP ~ age)

Coefficients:
(Intercept) age
94.5320 0.9158

> summary(mod1)

Call:
lm(formula = systolic.BP ~ age)

Residuals:
Min 1Q Median 3Q Max
-117.576 -3.934 4.760 8.765 25.171

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.5320 14.1390 6.686 2.95e-07 ***
age 0.9158 0.2972 3.081 0.00459 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 24.48 on 28 degrees of freedom
Multiple R-squared: 0.2532, Adjusted R-squared: 0.2266
F-statistic: 9.495 on 1 and 28 DF, p-value: 0.004588

> abline(lm(systolic.BP~age))


c) Plot the residuals and comment on what you see as to how appropriate the model is

Ans> residual=resid(mod1)
  
> plot(age,residual,ylab="residual",xlab="age"


>

#d) Using a boxplot, determine if there are any outliers in systolic blood pressure. If so, point out which points are outliers, if any.

outlier.values=boxplot(systolic.BP)$out;outlier.values
[1] 20
> boxplot(systolic.BP)
> # 20 is outlier in in systolic


Related Solutions

The data below are for 30 people. The independent variable is “age” and the dependent variable...
The data below are for 30 people. The independent variable is “age” and the dependent variable is “systolic blood pressure.” Also, note that the variables are presented in the form of vectors that can be used in R. age=c(39,47,45,47,65,46,67,42,67,56,64,56,59,34,42,48,45,17,20,19,36,50,39,21,44,53,63,29,25,69) systolic.BP=c(144,20,138,145,162,142,170,124,158,154,162,150,140,110,128,130,135,114,116,124,136,142,120,120,160,158,144,130,125,175) Using R, develop and show a scatterplot of systolic blood pressure (dependent variable) by age (independent variable), and calculate the correlation between these two variables. Assume that these data are “straight enough” to model using a linear regression line. Develop...
Consider the following data for a dependent variable and two independent variables, and . x1= 30...
Consider the following data for a dependent variable and two independent variables, and . x1= 30 46 24 50 41 51 74 36 60 77. x2 = 13 11 17 17 6 20 8 12 13 16. y = 95 109 112 178 94 176 170 117 142 212 Round your all answers to two decimal places. Enter negative values as negative numbers, if necessary. a. Develop an estimated regression equation relating to y to x1 . Predict y if...
The data shown below for the dependent​ variable, y, and the independent​ variable, x, have been...
The data shown below for the dependent​ variable, y, and the independent​ variable, x, have been collected using simple random sampling. x 10 13 16 11 20 17 16 13 16 17 y 90 50 30 80 10 10 40 70 20 30 a. Develop a simple linear regression equation for these data. b. Calculate the sum of squared​ residuals, the total sum of​ squares, and the coefficient of determination. c. Calculate the standard error of the estimate. d. Calculate...
The data shown below for the dependent variable, y, and the independent variable, x, have been...
The data shown below for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling. x 10 15 17 11 19 18 17 15 17 18 y 120 150 170 120 170 180 160 140 180 190 a. Develop a simple linear regression equation for these data. b. Calculate the sum of squared residuals, the total sum of squares, and the coefficient of determination. c. Calculate the standard error of the estimate. d. Calculate...
he data shown below for the dependent​ variable, y, and the independent​ variable, x, have been...
he data shown below for the dependent​ variable, y, and the independent​ variable, x, have been collected using simple random sampling. x 10 14 17    11 19 18 17 14 17 18 y 120 140 190 140 190 180 180 160 170 190 .a. Develop a simple linear regression equation for these data. .b. Calculate the sum of squared​ residuals, the total sum of​squares, and the coefficient of determination. .c. Calculate the standard error of the estimate. .d. Calculate...
a. Develop a scatter plot with income as the dependent variable and age as the independent...
a. Develop a scatter plot with income as the dependent variable and age as the independent variable. Include the estimated regression equation and the coefficient of determination on your scatter plot. Briefly comment on the relationship between the two variables, and fully interpret the coefficient of determination. b. Using the Excel’s Regression Tool, develop the estimated regression equation to show how income (y annual income in $1000s) is related to the independent variables education (?_1level of education attained in number...
Identify the independent and dependent variable.
A researcher is studying relationship between speed of cycling and the presence of people. Formulate a relevant hypothesis and identify the independent and dependent variable.
Given here are the data from a dependent variable and two independent variables. The second independent...
Given here are the data from a dependent variable and two independent variables. The second independent variable is an indicator variable with several categories. Hence, this variable is represented by x2, x3, and x4. How many categories are there for this independent variable? Use a computer to perform a multiple regression analysis on this data to predict y from the x values. Discuss the output and pay particular attention to the dummy variables. y x1 x2 x3 x4 11 1.9...
Given the research questions below, state the independent and dependent variables (or Variable 1 and Variable...
Given the research questions below, state the independent and dependent variables (or Variable 1 and Variable 2) and level of measurement. Write null and research/alternative hypotheses. Then conduct the appropriate statistical analysis. State tests used, report and interpret key statistics, and explain your findings, including the likelihood you might be committing Type I or II errors given your decision regarding the null hypothesis. Research question is, How and how well age of patients predict how long they take to cure...
The following data for the dependent variable, y, and the independent variable, x, have been collected...
The following data for the dependent variable, y, and the independent variable, x, have been collected using simple random sampling: X Y 10 120 14 130 16 170 12 150 20 200 18 180 16 190 14 150 16 160 18 200 Construct a scatter plot for these data. Based on the scatter plot, how would you describe the relationship between the two variables? Compute the correlation coefficient.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT