Question

In: Statistics and Probability

Write R code: Here are the first six observations from the prostate data set found in...

Write R code:

Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets.

obs

lcavol

lweight

age

lbph

svi

lcp

gleason

pgg45

lpsa

1

-0.579819

2.7695

50

-1.38629

0

-1.38629

6

0

-0.43078

2

-0.994252

3.3196

58

-1.38629

0

-1.38629

6

0

-0.16252

3

-0.510826

2.6912

74

-1.38629

0

-1.38629

7

20

-0.16252

4

-1.203973

3.2828

58

-1.38629

0

-1.38629

6

0

-0.16252

5

0.7514161

3.4324

62

-1.38629

0

-1.38629

6

0

0.37156

6

-1.049822

3.2288

50

-1.38629

0

-1.38629

6

0

0.76547

Perform a simple linear regression with lpsa as the response and lcavol as the predictor. Show the ANOVA table and provide a histogram of the residuals.

Hint: If your linear model name is “lmod” then

      > residuals(lmod)   #prints out the residuals

Solutions

Expert Solution

R code

install.packages("faraway")

library(faraway)

data("prostate")

head(prostate)

model = lm(lpsa~lcavol,data=prostate)

summary(model)

hist(residuals(model),main="Histogram of Residuals")

anova(model)

Output

> library(faraway)

Warning message:

package ‘faraway’ was built under R version 3.6.2

> library(faraway)

> data("prostate")

> head(prostate)

      lcavol lweight age      lbph svi      lcp gleason pgg45     lpsa

1 -0.5798185 2.7695 50 -1.386294   0 -1.38629       6     0 -0.43078

2 -0.9942523 3.3196 58 -1.386294   0 -1.38629       6     0 -0.16252

3 -0.5108256 2.6912 74 -1.386294   0 -1.38629       7    20 -0.16252

4 -1.2039728 3.2828 58 -1.386294   0 -1.38629       6     0 -0.16252

5 0.7514161 3.4324 62 -1.386294   0 -1.38629       6     0 0.37156

6 -1.0498221 3.2288 50 -1.386294   0 -1.38629       6     0 0.76547

> model = lm(lpsa~lcavol,data=prostate)

> summary(model)

Call:

lm(formula = lpsa ~ lcavol, data = prostate)

Residuals:

     Min       1Q   Median       3Q      Max

-1.67625 -0.41648 0.09859 0.50709 1.89673

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) 1.50730    0.12194   12.36   <2e-16 ***

lcavol       0.71932    0.06819   10.55   <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7875 on 95 degrees of freedom

Multiple R-squared: 0.5394, Adjusted R-squared: 0.5346

F-statistic: 111.3 on 1 and 95 DF, p-value: < 2.2e-16

> head(residuals(model))

          1           2           3           4           5           6

-1.52100281 -0.95463223 -1.30237079 -0.80377605 -1.67624667 0.01333025

> hist(residuals(model))

> hist(residuals(model),main="Histogram of Residuals")

> anova(model)

Analysis of Variance Table

Response: lpsa

          Df Sum Sq Mean Sq F value    Pr(>F)   

lcavol     1 69.003 69.003 111.27 < 2.2e-16 ***

Residuals 95 58.915   0.620                     

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Related Solutions

Write code in R for this questions,, will vote!! Load the Taxi.txt data set into R....
Write code in R for this questions,, will vote!! Load the Taxi.txt data set into R. (a) Calculate the mean, median, standard deviation, 30th percentile, and 65th percentile for Mileage and TripTime. (b) Make a frequency table for PaymentProvider that includes a Sum column. Report the resulting table. (c) Make a contingency table comparing PaymentType and Airport. Report the resulting table. (d) Use the cor() function to find the correlation between each pair of the Meter, Tip, Mileage, and TripTime...
Write the R code First, generate 1000 observations from a binomial distribution with n=30 and p=0.2...
Write the R code First, generate 1000 observations from a binomial distribution with n=30 and p=0.2 Use the 1000 observations you generated: a) Generate poisson, binomial, negative binomial Diagnostic Distribution Plots using distplot. b) Generate a histogram and overlay a kernel estimator of the density (You can use: binom <- rbinom(n=1000,size=30, prob=0.2))
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer...
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer to the lymph nodes for 53 patients. For a sample of prostate cancer patients, a set of possible predictor variables were measured before surgery to determine if the lymph nodes were compromised. Subsequently, the patient underwent surgery and the status of his lymph nodes was determined. The data set contains 53 observations of 7 variables: id: identifiers for each subject in the study. ssln:...
Using R program and with a For loop. Assuming a data set of 1000 observations and...
Using R program and with a For loop. Assuming a data set of 1000 observations and 10 predictors. How would one use a for loop to cycle through different proportions of training and test sizes. For example, 20% of data goes to training and 80% for test in first iteration. Each iteration adding another 10% to the training set. So first set= (20% train, 80% test), second set = (30% train, 70% test), third set= (40% train,60%test) and so on....
17)Here is the set of data from the previous page:10,11,12,13,15,17,17,18,20,20. The first, second, and third quartiles...
17)Here is the set of data from the previous page:10,11,12,13,15,17,17,18,20,20. The first, second, and third quartiles for this data set are 12, 16, and 18 respectively. Identify the 5-number summary for this data set, and create a box plot(box-and-whisker plot) representing it. 18)a) What 3 numbers are cited in the Empirical Rule for normal distributions?(b) Suppose that height for a species of tree is normally distributed with a mean of 30 feet and a standarddeviation of 3 feet. Approximately what...
Using the provided code (found here), write a program using the main method where the user...
Using the provided code (found here), write a program using the main method where the user enters Strings and the program echoes these strings to the console until the user enters “quit”. When user quits the program should print out, “Goodbye”. You may assume that the case is ignored when the user enters, “quit”, so “quit”, “QUIT”, “Quit”,“qUiT”, etc. are all acceptable ways to end the program. The results should be printed out to the console in the following format:...
Answer the following bootstrap question by showing the R code : A set of data X...
Answer the following bootstrap question by showing the R code : A set of data X contains the following numbers: 119.7 104.1 92.8 85.4 108.6 93.4 67.1 88.4 101.0 97.2 95.4 77.2 100.0 114.2 150.3 102.3 105.8 107.5 0.9 94.1 We generated n = 20 observations Xi = 10 Wi+100, where Wi has a contaminated normal distribution with proportion of contamination 20% and σc = 4. Suppose we are interested in testing: H0 : μ = 90 versus H1 :...
The data set “UCBAdmissions” in R contains admission decisions by gender at six departments of UC...
The data set “UCBAdmissions” in R contains admission decisions by gender at six departments of UC Berkeley. For this data set, carry out appropriate test for independence between the admission decision and gender for each of the departments. What are your conclusions? Please submit your R script with the answer.
Here is the R code for running a t-test: t.test( numeric vector of data values, another...
Here is the R code for running a t-test: t.test( numeric vector of data values, another optional numeric vector of data values,        alternative = c("two.sided", "less", "greater"),        mu = Ho, paired = c(TRUE, FALSE), var.equal = c(TRUE,FALSE),conf.level =1-) 1.) Suppose 30 students are all taking the same Math 115 and English 101 classes at CSUN. You want to know in which class students tend to do better. The data below represents the class averages of the students in both classes....
Here is the R code for running a t-test: t.test( numeric vector of data values, another...
Here is the R code for running a t-test: t.test( numeric vector of data values, another optional numeric vector of data values,        alternative = c("two.sided", "less", "greater"),        mu = Ho, paired = c(TRUE, FALSE), var.equal = c(TRUE,FALSE),conf.level =1-) 2) You want to determine if the average height of men in California is greater than the average height of men in Nebraska. You take a random sample of 30 men in California and 30 men in Nebraska. The data below represents...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT