In: Statistics and Probability
1) For your sample, use R to find the mean of x, the mean of y, the standard deviation of x, the standard deviation of y, and the correlation between x and y. (You must give R code for credit.)
> set.seed(0003653838)
> x=rnorm(500)
> y=2*x+rnorm(500)
> Xbar=mean(x)
> Xbar
[1] 0.0712552
> Ybar=mean(y)
> Ybar
[1] 0.2370446
> sdx=sd(x)
> sdx
[1] 0.9841084
> sdy=sd(y)
> sdy
[1] 2.277837
> corxy=cor(x,y)
> corxy
[1] 0.9040743
2) Find the equation of the regression line to predict y from x.
> lm(y~x)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.08794 2.09259
Y = -0.08794 + 2.09259*x
3) We select a point (x1,y1) from the parent population of your data. Suppose x = 1. What is the probability that y is greater than 3? (You may find this either by using theory or based on your data.)
4) Find a 95% confidence interval for the slope coefficient of the regression line predicting y from x.
> confint(lm(y~x), level = 0.95)
2.5 % 97.5 %
(Intercept) 0.002089795 0.1737838
x 2.005495780 2.1796802
The 95% confidence interval is (2.005, 2.180)
** I believe my answers are correct for Questions 1, 2 and 4. I am stuck on #3, so please help with that and provide any suggestions if I have missed anything on the other three questions.
Here I write R- for above problem as:
x=rnorm(500)
y=2*x+rnorm(500)
xbar=mean(x)
xbar
ybar=mean(y)
ybar
sdx=sd(x)
sdx
sdy=sd(y)
sdy
corxy=cor(x,y)
corxy
l=lm(y~x)
summary(l)
confint(l,level=0.95)
And the output is:
$Rscript main.r [1] -0.02588524 [1] -0.05605924 [1] 1.019452 [1] 2.219479 [1] 0.8981331 Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -3.00930 -0.68355 0.00277 0.61991 2.64296 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.005444 0.043704 -0.125 0.901 x 1.955352 0.042899 45.580 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9769 on 498 degrees of freedom Multiple R-squared: 0.8066, Adjusted R-squared: 0.8063 F-statistic: 2078 on 1 and 498 DF, p-value: < 2.2e-16 Error in confin(l, level = 0.95) : could not find function "confin" Execution halted
Also for third question The R-code is:
x=rnorm(500)
y=rnorm(500)
z={}
for(i in 1:500)
{
if(x[i]==1&&y[i]<=3){z[i]=1}
else{z[i]=0}
}
p=sum(z)/500
p
Thus we get probability is 0.