Question

In: Math

Use R studio to do this problem. This problem uses the wblake data set in the...

Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im()

(a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two that summarizes the results of these computations

(b) Obtain a 99% confidence interval for from the data. Interpret this interval in the context of the data.

(c) Obtain a prediction and a 99% prediction interval for a small mouth bass at age 1 . Interpret this interval in the context of the data.

Solutions

Expert Solution

----------------------------------------------------------------------------------------------------------------

R output:

Code in text format:

# install.packages("alr4")
library(alr4)
data(wblake)
names(wblake)

# Part (a) solution:
X = wblake$Age; Y = wblake$Length
xbar = mean(X); ybar = mean(Y); n = length(Y)
SSxy = sum((X-xbar)*(Y-ybar))
SSxx = sum((X-xbar)^2); SSyy = sum((Y-ybar)^2)

(b1 = SSxy/SSxx); (b0 = ybar-b1*xbar) # Reg Estimates

Y.hat = b0+b1*X # Reg of length on age

SSE = sum((Y-Y.hat)^2); SSR = sum((Y.hat-ybar)^2); SST = SSR+SSE
(R.Sq = SSR/SST) # Coefficient of Deter

(MSE = SSE/(n-2)) # Estimate of Variance
(SE.b1 = sqrt(MSE/SSxx)) # SE of slope b1
(SE.b0 = sqrt(MSE*((1/n)+(xbar^2/SSxx)))) # SE of intercept b0

#------------------------------------------------------------------------
#
# Part (b) solution:
# Calculate a 99% confidence interval for beta1.
a = 1-0.99 # Confidence level = 0.99
t.star = round(qt(1-a/2,n-2),3) # t critical value
ME = t.star*SE.b1
b1-c(ME,-ME) # 99% CI for beta1

#-------------------------------------------------------------------------
#
# Part(c) solution:
# Cal a prection and a 99% preidction interval for a small mouth at age 1
x0 = 1
(Y.Pred = b0+b1*x0) # Predicted Y when X=1
ME = t.star*sqrt(MSE)*sqrt(1+(1/n)+((x0-xbar)^2)/SSxx)
Y.Pred - c(ME,-ME)


Related Solutions

( In R / R studio ) im not sure how to share my data set,...
( In R / R studio ) im not sure how to share my data set, but below is the title of my data set and the 12 columns of my data set. Please answer as best you can wheather its pseudo code, partial answers, or just a suggestion on how i can in to answer the question. thanks #---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The dataset incovid_sd_20201001.RDatacontains several variables related to infections of covid-19 for eachzip code in San Diego County as of October...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the...
** Number 2 implemented in R (R Studio) ** Set up the Auto data: Load the ISLR package and the Auto data Determine the median value for mpg Use the median to create a new column in the data set named mpglevel, which is 1 if mpg>median and otherwise is 0. Make sure this variable is a factor. We will use mpglevel as the target (response) variable for the algorithms. Use the names() function to verify that your new column...
1. Basic use of R/R Studio. Solve the following problem in R and print out the...
1. Basic use of R/R Studio. Solve the following problem in R and print out the commands and outputs. (a) Create a vector of the positive odd integers less than 100; Remove the values greater than 60 and less than 80; Find the variance of the remaining set of values (b) What’s the difference in output between the commands 2*1:5 and (2*1):5? Why is there a difference? (c) If you wanted to enter the odd numbers from 1 to 19...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer...
R-Studio; Statistics The data set in the table considers information on the spread of prostate cancer to the lymph nodes for 53 patients. For a sample of prostate cancer patients, a set of possible predictor variables were measured before surgery to determine if the lymph nodes were compromised. Subsequently, the patient underwent surgery and the status of his lymph nodes was determined. The data set contains 53 observations of 7 variables: id: identifiers for each subject in the study. ssln:...
Please use R to do it. Using the SATGPA data set in Stat2Data package. Test by...
Please use R to do it. Using the SATGPA data set in Stat2Data package. Test by using α= .05 Question: Test if the proportion of MathSAT greater than VerbalSAT is 0.60 > library(Stat2Data) > data("SATGPA") > data(SATGPA) > SATGPA
pl use r studio to do that What is the most appropriate analysis to perform on...
pl use r studio to do that What is the most appropriate analysis to perform on the following data?   x<-c(8.1, 9.4, 9.9, 9.6, 10.7, 10.2, 10.4, 13.6, 15.5, 17.8) Y<-c(7.3, 8.6, 9.9, 9.6, 9.3, 9.2, 10.9, 10.7, 11.4, 16.1) Determine Spearman’s Rho coefficient (2dp) for the following data. x<-c(56,56,65,65,50,25,87,44,35) y<-c(87,91,85,91,75,28,122,66,58)
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT