In: Statistics and Probability
Write R code:
Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets.
obs |
lcavol |
lweight |
age |
lbph |
svi |
lcp |
gleason |
pgg45 |
lpsa |
1 |
-0.579819 |
2.7695 |
50 |
-1.38629 |
0 |
-1.38629 |
6 |
0 |
-0.43078 |
2 |
-0.994252 |
3.3196 |
58 |
-1.38629 |
0 |
-1.38629 |
6 |
0 |
-0.16252 |
3 |
-0.510826 |
2.6912 |
74 |
-1.38629 |
0 |
-1.38629 |
7 |
20 |
-0.16252 |
4 |
-1.203973 |
3.2828 |
58 |
-1.38629 |
0 |
-1.38629 |
6 |
0 |
-0.16252 |
5 |
0.7514161 |
3.4324 |
62 |
-1.38629 |
0 |
-1.38629 |
6 |
0 |
0.37156 |
6 |
-1.049822 |
3.2288 |
50 |
-1.38629 |
0 |
-1.38629 |
6 |
0 |
0.76547 |
Perform a simple linear regression with lpsa as the response and lcavol as the predictor. Show the ANOVA table and provide a histogram of the residuals.
Hint: If your linear model name is “lmod” then
> residuals(lmod) #prints out the residuals
R code
install.packages("faraway")
library(faraway)
data("prostate")
head(prostate)
model = lm(lpsa~lcavol,data=prostate)
summary(model)
hist(residuals(model),main="Histogram of Residuals")
anova(model)
Output
> library(faraway)
Warning message:
package ‘faraway’ was built under R version 3.6.2
> library(faraway)
> data("prostate")
> head(prostate)
lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 -0.5798185 2.7695 50 -1.386294 0 -1.38629 6 0 -0.43078
2 -0.9942523 3.3196 58 -1.386294 0 -1.38629 6 0 -0.16252
3 -0.5108256 2.6912 74 -1.386294 0 -1.38629 7 20 -0.16252
4 -1.2039728 3.2828 58 -1.386294 0 -1.38629 6 0 -0.16252
5 0.7514161 3.4324 62 -1.386294 0 -1.38629 6 0 0.37156
6 -1.0498221 3.2288 50 -1.386294 0 -1.38629 6 0 0.76547
> model = lm(lpsa~lcavol,data=prostate)
> summary(model)
Call:
lm(formula = lpsa ~ lcavol, data = prostate)
Residuals:
Min 1Q Median 3Q Max
-1.67625 -0.41648 0.09859 0.50709 1.89673
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.50730 0.12194 12.36 <2e-16 ***
lcavol 0.71932 0.06819 10.55 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7875 on 95 degrees of freedom
Multiple R-squared: 0.5394, Adjusted R-squared: 0.5346
F-statistic: 111.3 on 1 and 95 DF, p-value: < 2.2e-16
> head(residuals(model))
1 2 3 4 5 6
-1.52100281 -0.95463223 -1.30237079 -0.80377605 -1.67624667 0.01333025
> hist(residuals(model))
> hist(residuals(model),main="Histogram of Residuals")
> anova(model)
Analysis of Variance Table
Response: lpsa
Df Sum Sq Mean Sq F value Pr(>F)
lcavol 1 69.003 69.003 111.27 < 2.2e-16 ***
Residuals 95 58.915 0.620
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1