In: Statistics and Probability
Perform all the steps for ALL PARTS (a - d) in R code and show and explain the results. Thank you.
Problem 2.23 Consider the simple linear regression model y = 50 + 10x + E where E is NID(0,16). Suppose that the n = 20 pairs of observations are used to fit this model. Generate 500 samples of 20 observations, drawing one observation for each level of x = 0.5,1, 1.5, 2, ..., 10 [i.e. going up by an interval of 0.5 to 10 by 0.5 for n=20 observations; a for loop will possibly be needed to do this 500 times] for each sample.
# (a) For EACH sample, compute the least-squares regression estimates of the slope, (Beta1) and intercept, (Beta0). Construct HISTOGRAMS of the sample values of Beta0hat and Beta1hat. Discuss the shape of these histograms.
# (b) For each sample, compute an estimate of E(y|x=5). Construct a histogram of the estimates you obtained. Discuss the shape of the histogram.
# (c) For each sample, compute a 95% CI (i.e. alpha = 0.05; alpha/2 = 0.025). on the Slope, (Beta1). How many of these intervals contain the true value, Beta1 = 10? Is this what you would expect? Why or Why not?
# (d) For each estimate of E(y|x = 5) in part b, compute the 95% CI. How many of these intervals contain the true VALUE of E(y|x=5) = 100 (i.e. 10^2; POTENTIALLY associated with the variance, s^2)? Is this what you would expect?
ii) estimate of E(y|x=5)
The Estimated model is E(y|x=5)=beta0hat+beta1hat*x
Rcode:
n=20; x1=seq(0.5,10,0.5); x0=rep(1,20);
x=as.matrix(data.frame(x0,x1));
beta=as.matrix(c(50,10))
result=matrix(0,nrow=500, ncol=2)
for( i in 1:500)
{
y=x%*%beta+rnorm(20,0,4); ## var=16 and sd=4
paraest=as.matrix(lm(y~x1)$coefficients)
result[i,]=t(paraest);
}
hist(result[,1], main="histogram of beta0hat")
hist(result[,2],main="histogram of beta1hat")
### computation of E(y|x=5)
est=result%*%as.matrix(c(1,5))
hist(est,main=" Histogram of E(y|x=5)");
iii) #### 95% CI for beta1########
n=20;
x1=seq(0.5,10,0.5);
x0=rep(1,20);
x=as.matrix(data.frame(x0,x1));
beta=as.matrix(c(50,10))
ci=matrix(0,nrow=500, ncol=2)
z={};
for( i in 1:500)
{
y=x%*%beta+rnorm(20,0,4); ## var=16 and sd=4
paraest=lm(y~x1)
ci[i,]=as.matrix(confint(paraest,'x1',level=0.95));
z[i]=ifelse(ci[i,1]<10 && ci[i,2],1,0);
}
mean(z)
Answer: 0.972
coverage probability is 0.972