Question

In: Statistics and Probability

Use Random number generator (under Data Analysis) to simulate the following data set. Create 10 columns,...

Use Random number generator (under Data Analysis) to simulate the following data set. Create 10 columns, each 20 points long and use the following parameters: Number of variables (10), number of data point (20), Distribution (Normal), Mean (40), Standard Deviation (10), Random seed (1234). The data should be in columns: A,B,C,….,I,J.

Randomly pick two columns (say Column B and Column H) and perform 2-sided t-test on these two data columns. Record the P-value and repeat this procedure several times (at least 5 times). That is, each time randomly pick two columns, perform 2-sided t-test and record the P-value. And answer the questions. (Pick the closest answer)

17. What did you observe?

a. Most of the P-values are very small, and some below 5%.

b. P-values are very different, some small and some large, but very few, if any, below 5%

c. Most of the P-values are very large, around 0.9 and 0.95 range.

d. ??Essentially all P-values are below 5% and some even below 1% range.

18. What is the Statistical interpretation?

a. Since data are created randomly one expect to see small P-value for t-test.

b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.

c. The t-test worked as designed, in most cases it did not detect the difference since the data are created with equal means (equal averages).

d. None of the above

Create one more random column of data. This time use the following parameters: Number of variables (1), number of data point (20), Distribution (Normal), Mean (50), Standard Deviation (10), Random seed (3434). CUT &PASTE this data in the same sheet as the previous 10 columns and put it in column M.

Randomly pick one column out of A,B,…,J (say Column F) and perform 2-sided t-test based on this randomly picked column and the newly created column M. Record the P-value and repeat this procedure several times (at least 5 times). That is, each time randomly pick one data from the ten previously created and perform 2-sided t-test versus the newly created column M. Record the P-value. And answer the questions. (Pick the closest answer)

19. What did you observe?

a. Most of the P-values are very small, and some below 5%.

b. P-values are very different, some small and some large, but very few, if any, below 5%

c. Most of the P-values are very large, around 0.9 and 0.95 range.

d. Essentially all P-values are below 5% and some even below 1% range.

20. What is the Statistical interpretation?

a. Since data are created randomly one expect to see small P-value for t-test.

b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.

c. The t-test worked as designed, in most cases it did not detect the difference since the data are created with equal means (equal averages).

d. None of the above

Solutions

Expert Solution

Random Number Generation in R:

set.seed(1234)

A = rnorm(20,40.10)
B = rnorm(20,40.10)
C = rnorm(20,40.10)
D = rnorm(20,40.10)
E = rnorm(20,40.10)
F1 = rnorm(20,40.10)
G = rnorm(20,40.10)
H = rnorm(20,40.10)
I = rnorm(20,40.10)
J = rnorm(20,40.10)

> A
[1] 38.89293 40.37743 41.18444 37.75430 40.52912 40.60606 39.52526 39.55337 39.53555 39.20996 39.62281 39.10161
[13] 39.32375 40.16446 41.05949 39.98971 39.58899 39.18880 39.26283 42.51584
> B
[1] 40.23409 39.60931 39.65945 40.55959 39.40628 38.65180 40.67476 39.07634 40.08486 39.16405 41.20230 39.62441
[13] 39.39056 39.59874 38.47091 38.93238 37.91996 38.75901 39.80571 39.63410
> C
[1] 41.54950 39.03136 39.24464 39.81938 39.10566 39.13149 38.99268 38.84801 39.57617 39.60315 38.29397 39.51792
[13] 38.99111 39.08504 39.93769 40.66306 41.74782 39.32665 41.70591 38.94219
> D
[1] 40.75659 42.64899 40.06524 39.43037 40.09240 41.87708 38.96139 41.46783 41.42956 40.43647 40.10689 39.64453
[13] 39.73348 40.74829 42.17027 39.94660 38.70930 39.37642 40.35826 39.78294
> E
[1] 39.92221 39.93001 38.72770 39.92621 40.95023 40.79761 40.65000 39.69727 39.90841 38.90547 40.04684 40.35520
[13] 41.80596 41.10151 39.60442 40.45555 38.96539 40.97820 41.07292 42.22112
> F1
[1] 40.51452 39.62528 40.16599 39.59752 39.27400 40.26699 39.20374 40.26819 40.45497 40.04789 39.90407 39.45093
[13] 38.99023 40.94927 40.12236 40.93114 38.85571 40.26903 40.77317 40.07372
> H
[1] 39.76596 41.49515 40.73667 39.99157 40.61376 40.49927 41.76286 40.37589 40.60627 40.44755 39.72276 40.19762
[13] 41.73874 39.22441 40.22176 41.46213 39.86538 39.04662 39.23022 39.70987
> I
[1] 39.25265 39.83936 39.68558 39.91695 40.50706 40.72463 41.77821 40.03131 39.77916 41.57101 41.80433 40.14324
[13] 39.76734 38.27776 41.51126 39.26242 38.97624 43.14377 40.33502 40.06674
> J
[1] 37.36778 40.00021 41.07603 40.51387 41.01232 42.08373 41.26911 39.59126 40.80418 39.90158 39.56193 37.24424
[13] 39.31035 40.58781 42.26803 40.60069 40.72021 39.13410 40.26265 38.02176
> K

We can calculate the paired t-test in R easily using the

t.test(variable1,variable2)

by default, this function uses 0.05 level of significance.

R-output:

The paired t-test between A and B:

> t.test(A,B)

Welch Two Sample t-test

t = 1.1349, df = 35.889, p-value = 0.2639

The paired t-test between I and C:

> t.test(I,C)

Welch Two Sample t-test

t = 1.9468, df = 37.189, p-value = 0.05914

The paired t-test between F1 and D:

> t.test(F1,D)

Welch Two Sample t-test

t = -1.4528, df = 30.391, p-value = 0.1565

The paired t-test between D and E:

> t.test(D,E)

Welch Two Sample t-test

t = 0.27281, df = 37.22, p-value = 0.7865

The paired t-test between H and I:

> t.test(H,I)

Welch Two Sample t-test

t = 0.053938, df = 34.132, p-value = 0.9573

# Observation of p-value:

Observation b is correct.

Qus: 17

b. P-values are very different, some small and some large, but very few, if any, below 5%

Qus: 18

in all cases p-value > 0.05 means null hypothesis accepted in all case.

means in all cases average difference of two variable are same.

since option C is correct

c. The t-test worked as designed, in most cases, it did not detect the difference since the data are created with equal means (equal averages).

Part:2

> set.seed(3434)
> M = rnorm(20,50,10)
> M
[1] 45.93904 35.87163 60.34832 44.38388 49.84033 49.61631 48.19858 46.23254 46.82688 67.83189 62.74389 50.82694 50.86437 57.84251 68.30389 28.44567 51.51725 44.33325 37.06757 47.23892

T-Test:

> t.test(A,M)

Welch Two Sample t-test

t = -4.3517, df = 19.384, p-value = 0.0003296

> t.test(I,M)

Welch Two Sample t-test

t = -4.1386, df = 19.497, p-value = 0.0005327

> t.test(F1,M)

Welch Two Sample t-test

t = -4.3046, df = 19.142, p-value = 0.0003767

> t.test(D,M)

Welch Two Sample t-test

t = -4.1123, df = 19.425, p-value = 0.0005699

> t.test(H,M)

Welch Two Sample t-test

t = -4.1446, df = 19.247, p-value = 0.0005379

Observation:

Qus: 19

Option D correct

d. Essentially all P-values are below 5% and some even below 1% range.

Qus: 20

Option B correct

b. The t-test worked as designed, since in most cases it detected the difference, sometimes even with 1% threshold.?

since secod variable with diffrant mean

>>>>>>>>>>>> Best Luck >>>>>>>>>>>


Related Solutions

Create a random number generator object named myRandom and an integer variable named intRoulette. Set intRoulette...
Create a random number generator object named myRandom and an integer variable named intRoulette. Set intRoulette to be a random number from 0 to 36 (including the numbers 0 and 36). (visual studios 2015) using tryparse
Use a random number generator to produce 1000 uniformly distributed numbers with a mean of 10, a
Use a random number generator to produce 1000 uniformly distributed numbers with a mean of 10, a minimum of 2, and a maximum of 18. Obtain the mean and the histogram of these numbers, and discuss whether they appear uniformly distributed with the desired mean.
Collect the Data: Use a random number generator to generate 50 values between 0 and 1...
Collect the Data: Use a random number generator to generate 50 values between 0 and 1 (inclusive). Theoretical Distribution In words, X = The theoretical distribution of X is X ~ U(0, 1). In theory, based upon the distribution X ~ U(0, 1), find μ = __________ σ = __________ 1st quartile = __________ 3rd quartile = __________ median = __________ Construct a box plot of the data. Be sure to use a ruler to scale accurately and draw straight...
How to create a compacted data set by combining the columns Old, Older, Young, Younger and...
How to create a compacted data set by combining the columns Old, Older, Young, Younger and place them in into one single new column called age using python pandas. id Test1 Old Older Young Younger 0.1 1 False False False False 0.2 2 False True True False 0.3 3 True False False False 0.4 4 False False False False
Use Python. Create a Shakespearean Insult Generator The generator takes one element / data chunk from...
Use Python. Create a Shakespearean Insult Generator The generator takes one element / data chunk from each list and combines them to form the Shakespearean Insult. For example - List1 - artless List2 - beef-witted List3 - barnacle Result - Thou is a artless beef-witted barnacle    Your program will have three modes. There is the automatically mode where the program will randomly select one element from each column and combine them. The second mode is the user is presented...
Question 1. Go to random.org. This website is a random number generator. Use it to generate...
Question 1. Go to random.org. This website is a random number generator. Use it to generate three numbers a, b, c between -10 and 10. Now let your a, b and c be the coefficients of the quadratic function f(x)=ax2 +bx+c. (For example, if the numbers you generated happened to be a = 2,b = 12, c = −1, your function for the rest of the question would bef(x) = 2x2 +12x−1.) (a) Put f(x) into “standard” or “vertex” formf(x)=a(x−h)2...
For Exercises use random numbers to simulate the experiments. The number in parentheses is the number of times the experiment should be done.
For Exercises use random numbers to simulate the experiments. The number in parentheses is the number of times the experiment should be done.Two coins are tossed. Find the average number of times two tails will appear. (40) 
There are four numeric columns in R programming language's iris data set. Create a scatter plot...
There are four numeric columns in R programming language's iris data set. Create a scatter plot between the four numeric columns using R programming language and give answers to the following parts. Calculate the correlation between each pair of the four numeric columns in iris. Which pair of variables has the strongest linear relationship? Interpret their ??. Which pair of variables has the weakest linear relationship? Interpret their ??. Which pair(s) of variables can you conclude have a population correlation...
The following data set represents the average number of minutes played for a random sample of...
The following data set represents the average number of minutes played for a random sample of professional basketball players in a recent season. 35.9        33.8          34.7          31.5          33.2        29.1          30.7          31.2 36.1          34.9 a) Find the sample mean and sample standard deviation b) Construct a 90% confidence interval for the population mean and interpret the results. Assume the population is normally distributed. c) Calculate the two standard deviation interval and discuss the difference in meaning from it and the confidence...
Develop a random number generator for a Poisson distribution with mean = 10. Generate five values...
Develop a random number generator for a Poisson distribution with mean = 10. Generate five values manually with a random number table. Please show work.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT