In: Statistics and Probability
Write R code:
Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets.
| 
 obs  | 
 lcavol  | 
 lweight  | 
 age  | 
 lbph  | 
 svi  | 
 lcp  | 
 gleason  | 
 pgg45  | 
 lpsa  | 
| 
 1  | 
 -0.579819  | 
 2.7695  | 
 50  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 6  | 
 0  | 
 -0.43078  | 
| 
 2  | 
 -0.994252  | 
 3.3196  | 
 58  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 6  | 
 0  | 
 -0.16252  | 
| 
 3  | 
 -0.510826  | 
 2.6912  | 
 74  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 7  | 
 20  | 
 -0.16252  | 
| 
 4  | 
 -1.203973  | 
 3.2828  | 
 58  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 6  | 
 0  | 
 -0.16252  | 
| 
 5  | 
 0.7514161  | 
 3.4324  | 
 62  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 6  | 
 0  | 
 0.37156  | 
| 
 6  | 
 -1.049822  | 
 3.2288  | 
 50  | 
 -1.38629  | 
 0  | 
 -1.38629  | 
 6  | 
 0  | 
 0.76547  | 
Perform a simple linear regression with lpsa as the response and lcavol as the predictor. Show the ANOVA table and provide a histogram of the residuals.
Hint: If your linear model name is “lmod” then
> residuals(lmod) #prints out the residuals
R code
install.packages("faraway")
library(faraway)
data("prostate")
head(prostate)
model = lm(lpsa~lcavol,data=prostate)
summary(model)
hist(residuals(model),main="Histogram of Residuals")
anova(model)
Output
> library(faraway)
Warning message:
package ‘faraway’ was built under R version 3.6.2
> library(faraway)
> data("prostate")
> head(prostate)
lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 -0.5798185 2.7695 50 -1.386294 0 -1.38629 6 0 -0.43078
2 -0.9942523 3.3196 58 -1.386294 0 -1.38629 6 0 -0.16252
3 -0.5108256 2.6912 74 -1.386294 0 -1.38629 7 20 -0.16252
4 -1.2039728 3.2828 58 -1.386294 0 -1.38629 6 0 -0.16252
5 0.7514161 3.4324 62 -1.386294 0 -1.38629 6 0 0.37156
6 -1.0498221 3.2288 50 -1.386294 0 -1.38629 6 0 0.76547
> model = lm(lpsa~lcavol,data=prostate)
> summary(model)
Call:
lm(formula = lpsa ~ lcavol, data = prostate)
Residuals:
Min 1Q Median 3Q Max
-1.67625 -0.41648 0.09859 0.50709 1.89673
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.50730 0.12194 12.36 <2e-16 ***
lcavol 0.71932 0.06819 10.55 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7875 on 95 degrees of freedom
Multiple R-squared: 0.5394, Adjusted R-squared: 0.5346
F-statistic: 111.3 on 1 and 95 DF, p-value: < 2.2e-16
> head(residuals(model))
1 2 3 4 5 6
-1.52100281 -0.95463223 -1.30237079 -0.80377605 -1.67624667 0.01333025
> hist(residuals(model))
> hist(residuals(model),main="Histogram of Residuals")
> anova(model)
Analysis of Variance Table
Response: lpsa
Df Sum Sq Mean Sq F value Pr(>F)
lcavol 1 69.003 69.003 111.27 < 2.2e-16 ***
Residuals 95 58.915 0.620
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
