In: Statistics and Probability
Use Random number generator (under Data Analysis) to simulate the following data set. Create 10 columns, each 20 points long and use the following parameters: Number of variables (10), number of data point (20), Distribution (Normal), Mean (40), Standard Deviation (10), Random seed (1234). The data should be in columns: A,B,C,….,I,J.
Randomly pick two columns (say Column B and Column H) and perform 2-sided t-test on these two data columns. Record the P-value and repeat this procedure several times (at least 5 times). That is, each time randomly pick two columns, perform 2-sided t-test and record the P-value. And answer the questions. (Pick the closest answer)
17. What did you observe?
a. Most of the P-values are very small, and some below 5%.
b. P-values are very different, some small and some large, but very few, if any, below 5%
c. Most of the P-values are very large, around 0.9 and 0.95 range.
d. ??Essentially all P-values are below 5% and some even below 1% range.
18. What is the Statistical interpretation?
a. Since data are created randomly one expect to see small P-value for t-test.
b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.
c. The t-test worked as designed, in most cases it did not detect the difference since the data are created with equal means (equal averages).
d. None of the above
Create one more random column of data. This time use the following parameters: Number of variables (1), number of data point (20), Distribution (Normal), Mean (50), Standard Deviation (10), Random seed (3434). CUT &PASTE this data in the same sheet as the previous 10 columns and put it in column M.
Randomly pick one column out of A,B,…,J (say Column F) and perform 2-sided t-test based on this randomly picked column and the newly created column M. Record the P-value and repeat this procedure several times (at least 5 times). That is, each time randomly pick one data from the ten previously created and perform 2-sided t-test versus the newly created column M. Record the P-value. And answer the questions. (Pick the closest answer)
19. What did you observe?
a. Most of the P-values are very small, and some below 5%.
b. P-values are very different, some small and some large, but very few, if any, below 5%
c. Most of the P-values are very large, around 0.9 and 0.95 range.
d. Essentially all P-values are below 5% and some even below 1% range.
20. What is the Statistical interpretation?
a. Since data are created randomly one expect to see small P-value for t-test.
b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.
c. The t-test worked as designed, in most cases it did not detect the difference since the data are created with equal means (equal averages).
d. None of the above
Random Number Generation in R:
set.seed(1234)
A = rnorm(20,40.10)
B = rnorm(20,40.10)
C = rnorm(20,40.10)
D = rnorm(20,40.10)
E = rnorm(20,40.10)
F1 = rnorm(20,40.10)
G = rnorm(20,40.10)
H = rnorm(20,40.10)
I = rnorm(20,40.10)
J = rnorm(20,40.10)
> A
[1] 38.89293 40.37743 41.18444 37.75430 40.52912 40.60606 39.52526
39.55337 39.53555 39.20996 39.62281 39.10161
[13] 39.32375 40.16446 41.05949 39.98971 39.58899 39.18880 39.26283
42.51584
> B
[1] 40.23409 39.60931 39.65945 40.55959 39.40628 38.65180 40.67476
39.07634 40.08486 39.16405 41.20230 39.62441
[13] 39.39056 39.59874 38.47091 38.93238 37.91996 38.75901 39.80571
39.63410
> C
[1] 41.54950 39.03136 39.24464 39.81938 39.10566 39.13149 38.99268
38.84801 39.57617 39.60315 38.29397 39.51792
[13] 38.99111 39.08504 39.93769 40.66306 41.74782 39.32665 41.70591
38.94219
> D
[1] 40.75659 42.64899 40.06524 39.43037 40.09240 41.87708 38.96139
41.46783 41.42956 40.43647 40.10689 39.64453
[13] 39.73348 40.74829 42.17027 39.94660 38.70930 39.37642 40.35826
39.78294
> E
[1] 39.92221 39.93001 38.72770 39.92621 40.95023 40.79761 40.65000
39.69727 39.90841 38.90547 40.04684 40.35520
[13] 41.80596 41.10151 39.60442 40.45555 38.96539 40.97820 41.07292
42.22112
> F1
[1] 40.51452 39.62528 40.16599 39.59752 39.27400 40.26699 39.20374
40.26819 40.45497 40.04789 39.90407 39.45093
[13] 38.99023 40.94927 40.12236 40.93114 38.85571 40.26903 40.77317
40.07372
> H
[1] 39.76596 41.49515 40.73667 39.99157 40.61376 40.49927 41.76286
40.37589 40.60627 40.44755 39.72276 40.19762
[13] 41.73874 39.22441 40.22176 41.46213 39.86538 39.04662 39.23022
39.70987
> I
[1] 39.25265 39.83936 39.68558 39.91695 40.50706 40.72463 41.77821
40.03131 39.77916 41.57101 41.80433 40.14324
[13] 39.76734 38.27776 41.51126 39.26242 38.97624 43.14377 40.33502
40.06674
> J
[1] 37.36778 40.00021 41.07603 40.51387 41.01232 42.08373 41.26911
39.59126 40.80418 39.90158 39.56193 37.24424
[13] 39.31035 40.58781 42.26803 40.60069 40.72021 39.13410 40.26265
38.02176
> K
We can calculate the paired t-test in R easily using the
t.test(variable1,variable2)
by default, this function uses 0.05 level of significance.
R-output:
The paired t-test between A and B:
> t.test(A,B)
Welch Two Sample t-test
t = 1.1349, df = 35.889, p-value = 0.2639
The paired t-test between I and C:
> t.test(I,C)
Welch Two Sample t-test
t = 1.9468, df = 37.189, p-value = 0.05914
The paired t-test between F1 and D:
> t.test(F1,D)
Welch Two Sample t-test
t = -1.4528, df = 30.391, p-value = 0.1565
The paired t-test between D and E:
> t.test(D,E)
Welch Two Sample t-test
t = 0.27281, df = 37.22, p-value = 0.7865
The paired t-test between H and I:
> t.test(H,I)
Welch Two Sample t-test
t = 0.053938, df = 34.132, p-value = 0.9573
# Observation of p-value:
Observation b is correct.
Qus: 17
b. P-values are very different, some small and some large, but very few, if any, below 5%
Qus: 18
in all cases p-value > 0.05 means null hypothesis accepted in all case.
means in all cases average difference of two variable are same.
since option C is correct
c. The t-test worked as designed, in most cases, it did not detect the difference since the data are created with equal means (equal averages).
Part:2
> set.seed(3434)
> M = rnorm(20,50,10)
> M
[1] 45.93904 35.87163 60.34832 44.38388 49.84033 49.61631 48.19858
46.23254 46.82688 67.83189 62.74389 50.82694 50.86437 57.84251
68.30389 28.44567 51.51725 44.33325 37.06757 47.23892
T-Test:
> t.test(A,M)
Welch Two Sample t-test
t = -4.3517, df = 19.384, p-value = 0.0003296
> t.test(I,M)
Welch Two Sample t-test
t = -4.1386, df = 19.497, p-value = 0.0005327
> t.test(F1,M)
Welch Two Sample t-test
t = -4.3046, df = 19.142, p-value = 0.0003767
> t.test(D,M)
Welch Two Sample t-test
t = -4.1123, df = 19.425, p-value = 0.0005699
> t.test(H,M)
Welch Two Sample t-test
t = -4.1446, df = 19.247, p-value = 0.0005379
Observation:
Qus: 19
Option D correct
d. Essentially all P-values are below 5% and some even below 1% range.
Qus: 20
Option B correct
b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.?
since secod variable with diffrant mean
>>>>>>>>>>>> Best Luck >>>>>>>>>>>