In: Statistics and Probability
1) Generate a data set with three variables (X, Y and Z). X and Y have 10 observations for each (N=10), and Z has 13 observations (N=13). Each observation should have two digits (such as “83” or “8.3”).
2) Draw a stem-and-leaf display for variable Z only and draw a box plot display for variable Z after specifying the 5 numbers (UEX, LEX, FU, FL, MD).
3) Calculate the mean and standard deviation for variable X
4) Calculate the mean and standard deviation for variable Y
5) In order to predict Y from X, we need to set up a regression equation: (a) Calculate two regression constants (slope and y-intercept) and (b) present the equation.
6) As you have the mean for variable X and Y (from questions 3 and 4 above), once you have the mean for variable Z, can you obtain the mean for the entire data set by computing the mean of the three means? Why or why not? Explain.
#### (1) ####
X=round(rnorm(10,50,5),2)
Y=round(rnorm(10,50,5),2)
Z=round(rnorm(13,50,5),2)
######## (2) ######
stem(Z)
The decimal point is 1 digit(s) to the right of the |
4 | 4
4 | 677899
5 | 02
5 | 668
6 | 0
boxplot(Z)
summary(Z)
min 1st Qu.Median
Mean 3rd Qu. Max.
43.73 47.39 49.06
50.90 55.50 60.05
mean(X)
[1] 49.683
> sd(X)
[1] 5.123309
> ###########(4) #####
> mean(Y)
[1] 50.704
> sd(Y)
[1] 5.522938
######### (5) ####
L=lm(Y~X)
summary(L)
Call:
lm(formula = Y ~ X)
Residuals:
Min
1Q Median
3Q Max
-10.7655 -0.7191 0.0195 2.8518
5.9810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 80.4052 15.8325 5.078
0.000955 ***
X
-0.5978 0.3172 -1.885 0.096175 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.875 on 8 degrees of freedom
Multiple R-squared: 0.3075, Adjusted R-squared:
0.221
F-statistic: 3.553 on 1 and 8 DF, p-value: 0.09617
Y = 80.4 - 0.59 X
######### (6) ####
> mean(Z)
[1] 50.9
> mean(mean(X), mean(Y), mean(Z))
[1] 49.683