Question

In: Statistics and Probability

student_id=100 set.seed(student_id) Group1=round(rnorm(15,mean=10,sd=4),2) Group2= round(rnorm(12,mean=7,sd=4),2) For this question you are not allowed to use the lm()...

student_id=100

set.seed(student_id)

Group1=round(rnorm(15,mean=10,sd=4),2)

Group2= round(rnorm(12,mean=7,sd=4),2)

For this question you are not allowed to use the lm() command in R or the equivalent of lm() in python or other software. If you want to use a software, only use it as a calculator. So if you are using R, you may only use mean(), sum(), qt() and of course +,-,*,/. (Similar restrictions for Python, excel or others).

Consider all the 27 numbers printed under Group1 and Group2 as your y values, and the two group indicators as a categorical variable x ( indicating Group1 vs Group2).

(a) Fit a least square regression line and calculate the intercept and the slope.

(b) At 5% level of significance, test that the true slope parameter is zero.

(c) Match your answer from part (b) to your answers from Question 2. Describe briefly any similarity that you see.

Expert Solution

Let the regression model be

a) R code with comments

---

student_id=100
set.seed(student_id)

#set the values of groups
Group1=round(rnorm(15,mean=10,sd=4),2)
Group2= round(rnorm(12,mean=7,sd=4),2)

#combine the group values into y
y<-c(Group1,Group2)
#set the dummy variable x, with Group1=1 and group2=0
x<-c(rep(1,length(Group1)),rep(0,length(Group2)))

#part a)
#we calculate the following
n<-length(x)
xbar<-mean(x)
ybar<-mean(y)
x2<-sum(x^2)
y2<-sum(y^2)
xy<-sum(x*y)
#get these sum of squares
ssx<-x2-n*xbar^2
ssy<-y2-n*ybar^2
ssxy<-xy-n*xbar*ybar
#estimate the slope
b1<-ssxy/ssx
#estimae the intercept
b0<-ybar-b1*xbar
sprintf('The estimated slope is %.4f',b1)
sprintf('The estimated intercept is %.4f',b0)
sprintf('The regression equation is yhat=%.4f+%.4fx',b0,b1)

--

Get this

b) The hypotheses are

R code

---

#part b)
#get the sum of square error
sse<-ssy-b1*ssxy
#get the mean square error
mse<-sse/(n-2)
#get the standard error of regression
s<-sqrt(mse)
#set the standard error of slope
sb1<-s/sqrt(ssx)
#get the test statistic
t<-(b1-0)/sb1
#get the right tail critical value for alpha=0.05
tc<-qt(1-0.05/2,df=n-2)
#reject null if t is not with in -tc and +tc
sprintf('The test statistic is t=%.3f',t)
sprintf('The crirical values are -%.4f,%.4f',tc,tc)
if (abs(t)>tc) { #reject null
sprintf('Reject the null hypothesis')
}else {
sprintf('Fail to Reject the null hypothesis')
}

---

get this

ans: We reject the null hypothesis. There is sufficient evidence to reject the claim that the true slope parameter is zero. That is the slope coefficient is significant.

c) answers from Question 2 are not given for comparison

All the code together

---

student_id=100
set.seed(student_id)

#set the values of groups
Group1=round(rnorm(15,mean=10,sd=4),2)
Group2= round(rnorm(12,mean=7,sd=4),2)

#combine the group values into y
y<-c(Group1,Group2)
#set the dummy variable x, with Group1=1 and group2=0
x<-c(rep(1,length(Group1)),rep(0,length(Group2)))

#part a)
#we calculate the following
n<-length(x)
xbar<-mean(x)
ybar<-mean(y)
x2<-sum(x^2)
y2<-sum(y^2)
xy<-sum(x*y)
#get these sum of squares
ssx<-x2-n*xbar^2
ssy<-y2-n*ybar^2
ssxy<-xy-n*xbar*ybar
#estimate the slope
b1<-ssxy/ssx
#estimae the intercept
b0<-ybar-b1*xbar
sprintf('The estimated slope is %.4f',b1)
sprintf('The estimated intercept is %.4f',b0)
sprintf('The regression equation is yhat=%.4f+%.4fx',b0,b1)

#part b)
#get the sum of square error
sse<-ssy-b1*ssxy
#get the mean square error
mse<-sse/(n-2)
#get the standard error of regression
s<-sqrt(mse)
#set the standard error of slope
sb1<-s/sqrt(ssx)
#get the test statistic
t<-(b1-0)/sb1
#get the right tail critical value for alpha=0.05
tc<-qt(1-0.05/2,df=n-2)
#reject null if t is not with in -tc and +tc
sprintf('The test statistic is t=%.3f',t)
sprintf('The crirical values are -%.4f,%.4f',tc,tc)
if (abs(t)>tc) { #reject null
sprintf('Reject the null hypothesis')
}else {
sprintf('Fail to Reject the null hypothesis')
}

---

orchestra answered 2 years ago

Run the following R commends. set.seed(2019) x = rnorm(100,mean=0, sd=2) y = rnorm(100,mean=0, sd=20) [NOTE: Do...

Run the following R commends. set.seed(2019) x = rnorm(100,mean=0, sd=2) y = rnorm(100,mean=0, sd=20) [NOTE: Do NOT use built-in functions which can directly solve the problems.] (1) Compute the means of values in the vectors x and y, respectively. Which one is more close to zero? Explain the reason. (2) Compute the sample standard deviations of values in the vector x and y, respectively. (3) Find the index positions in x and y of the values between -5 and 5,...

Use student-t to derive the following paired test set.seed(20) girls_height<- rnorm(50, mean = 160, sd =...

Use student-t to derive the following paired test set.seed(20) girls_height<- rnorm(50, mean = 160, sd = 30) boys_height<- rnorm(50, mean = 172, sd = 30) #------------------------------------------------------------- State the hypothesis for the paired-test of the mean of # girls' height and boys' height. Check the three assumptions before running the student t test # for the samples. You need to write and run the code, and # report the results.

For the data set 1 2 3 4 7 7 7 8 11 12 12 15...

For the data set 1 2 3 4 7 7 7 8 11 12 12 15 15 16 17 17 17 18 20 20 22 24 24 25 26 26 26 26 27 30 32 32 33 34 34 36 38 39 43 44 45 46 47 47 48 51 52 52 53 54 54 54 55 56 58 58 59 61 63 65 65 67 69 70 73 75 75 76 77 77 79 80 81 82 82 (a)...

5.) For the data set 2 4 4 5 7 8 9 10 12 12 13...

5.) For the data set 2 4 4 5 7 8 9 10 12 12 13 13 16 16 16 16 17 19 19 20 23 24 24 24 25 26 26 27 28 28 29 31 32 34 34 36 37 38 42 44 45 46 47 47 48 50 52 53 53 54 55 56 56 57 58 (a) Find the 80th percentile. The 80t percentile is = (a) Find the 42nd percentile. The 42nd percentile is...

Periods 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 12% 14% 15% 16% 18%...

Periods 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 12% 14% 15% 16% 18% 20% 1 0.990 0.980 0.971 0.962 0.952 0.943 0.935 0.926 0.917 0.909 0.893 0.877 0.870 0.862 0.847 0.833 2 0.980 0.961 0.943 0.925 0.907 0.890 0.873 0.857 0.842 0.826 0.797 0.769 0.756 0.743 0.718 0.694 3 0.971 0.942 0.915 0.889 0.864 0.840 0.816 0.794 0.772 0.751 0.712 0.675 0.658 0.641 0.609 0.579 4 0.961 0.924 0.888 0.855 0.823 0.792 0.763 0.735 0.708 0.683 0.636...

3 6 4 8 1 10 2 9 11 12 15 22 3 6 7 5...

3 6 4 8 1 10 2 9 11 12 15 22 3 6 7 5 8 1 12 14 Each column represents a different treatment given to sick rats. Each cell is a different rat. Use statistical analysis and use post hoc testing using contrasts to find the best treatment. Treatment 1: vitamins Treatment 2: prescription pills Treatment 3: brain surgery Treatment 4: shock therapy Treatment 5: dietary changes

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15...

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of Aides Absent 5 8 11 15 4 2 7 1 4 6 14 19 3 5 8 In which of the following ranges you can find the Upper Control Limit of the control chart? 0.1427 0.1536 0.1677 Not computable with information available In which of the following ranges you can find the Lower Control Limit of the control chart? Does not exit...

student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15...

student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Test score 67 67 87 89 87 77 73 74 68 72 58 98 98 70 77 Above we have the final averages of the last stats and I want to know if the class average falls within the boundaries of all my statistics classes for the past 20 years. Find the sample size, mean, and standard deviation of the data above (Table 1)....

Question 2 of 4 Use decimals if needed, not fractions. Do not round at all (without...

Question 2 of 4 Use decimals if needed, not fractions. Do not round at all (without rounding, truncate final answer after three decimal places if needed). Suppose the demand for union representation is given by RD=240-4p and the supply of union representation is LS=2p. A. Graph the supply and demand for union representation. If union membership is free, the maximum number of workers who would want a union is _______ . If union fees are above $ _______ , zero...

2, 9, 15, 5, 11, 7, 11, 7, 5, 10 find the population mean, the population...

2, 9, 15, 5, 11, 7, 11, 7, 5, 10 find the population mean, the population variance and the population standard deviation