In: Statistics and Probability
student_id=100
set.seed(student_id)
Group1=round(rnorm(15,mean=10,sd=4),2)
Group2= round(rnorm(12,mean=7,sd=4),2)
For this question you are not allowed to use the lm() command in R or the equivalent of lm() in python or other software. If you want to use a software, only use it as a calculator. So if you are using R, you may only use mean(), sum(), qt() and of course +,-,*,/. (Similar restrictions for Python, excel or others).
Consider all the 27 numbers printed under Group1 and Group2 as your y values, and the two group indicators as a categorical variable x ( indicating Group1 vs Group2).
(a) Fit a least square regression line and calculate the intercept and the slope.
(b) At 5% level of significance, test that the true slope parameter is zero.
(c) Match your answer from part (b) to your answers from Question 2. Describe briefly any similarity that you see.
Let the regression model be
a) R code with comments
---
student_id=100
set.seed(student_id)
#set the values of groups
Group1=round(rnorm(15,mean=10,sd=4),2)
Group2= round(rnorm(12,mean=7,sd=4),2)
#combine the group values into y
y<-c(Group1,Group2)
#set the dummy variable x, with Group1=1 and group2=0
x<-c(rep(1,length(Group1)),rep(0,length(Group2)))
#part a)
#we calculate the following
n<-length(x)
xbar<-mean(x)
ybar<-mean(y)
x2<-sum(x^2)
y2<-sum(y^2)
xy<-sum(x*y)
#get these sum of squares
ssx<-x2-n*xbar^2
ssy<-y2-n*ybar^2
ssxy<-xy-n*xbar*ybar
#estimate the slope
b1<-ssxy/ssx
#estimae the intercept
b0<-ybar-b1*xbar
sprintf('The estimated slope is %.4f',b1)
sprintf('The estimated intercept is %.4f',b0)
sprintf('The regression equation is yhat=%.4f+%.4fx',b0,b1)
--
Get this
b) The hypotheses are
R code
---
#part b)
#get the sum of square error
sse<-ssy-b1*ssxy
#get the mean square error
mse<-sse/(n-2)
#get the standard error of regression
s<-sqrt(mse)
#set the standard error of slope
sb1<-s/sqrt(ssx)
#get the test statistic
t<-(b1-0)/sb1
#get the right tail critical value for alpha=0.05
tc<-qt(1-0.05/2,df=n-2)
#reject null if t is not with in -tc and +tc
sprintf('The test statistic is t=%.3f',t)
sprintf('The crirical values are -%.4f,%.4f',tc,tc)
if (abs(t)>tc) { #reject null
sprintf('Reject the null hypothesis')
}else {
sprintf('Fail to Reject the null hypothesis')
}
---
get this
ans: We reject the null hypothesis. There is sufficient evidence to reject the claim that the true slope parameter is zero. That is the slope coefficient is significant.
c) answers from Question 2 are not given for comparison
All the code together
---
student_id=100
set.seed(student_id)
#set the values of groups
Group1=round(rnorm(15,mean=10,sd=4),2)
Group2= round(rnorm(12,mean=7,sd=4),2)
#combine the group values into y
y<-c(Group1,Group2)
#set the dummy variable x, with Group1=1 and group2=0
x<-c(rep(1,length(Group1)),rep(0,length(Group2)))
#part a)
#we calculate the following
n<-length(x)
xbar<-mean(x)
ybar<-mean(y)
x2<-sum(x^2)
y2<-sum(y^2)
xy<-sum(x*y)
#get these sum of squares
ssx<-x2-n*xbar^2
ssy<-y2-n*ybar^2
ssxy<-xy-n*xbar*ybar
#estimate the slope
b1<-ssxy/ssx
#estimae the intercept
b0<-ybar-b1*xbar
sprintf('The estimated slope is %.4f',b1)
sprintf('The estimated intercept is %.4f',b0)
sprintf('The regression equation is yhat=%.4f+%.4fx',b0,b1)
#part b)
#get the sum of square error
sse<-ssy-b1*ssxy
#get the mean square error
mse<-sse/(n-2)
#get the standard error of regression
s<-sqrt(mse)
#set the standard error of slope
sb1<-s/sqrt(ssx)
#get the test statistic
t<-(b1-0)/sb1
#get the right tail critical value for alpha=0.05
tc<-qt(1-0.05/2,df=n-2)
#reject null if t is not with in -tc and +tc
sprintf('The test statistic is t=%.3f',t)
sprintf('The crirical values are -%.4f,%.4f',tc,tc)
if (abs(t)>tc) { #reject null
sprintf('Reject the null hypothesis')
}else {
sprintf('Fail to Reject the null hypothesis')
}
---