In: Statistics and Probability
Use the Happy 1 variable for this exercise. Suppose someone claims the population mean is 55, and the standard deviation is 10.
PART 1 - For now, assume both of the claims about the population are correct.
1a. Given the assumed pop. mean and st.dev, calculate the probability of observing a value above the number for your first data point in the data set. (which is 36)
1b. Suppose you collected 8 new data points in a new sample.
Calculate the probability that the mean of these 8 new data points
is above the number for your first data point in your file.
1c. If this is a normally distributed variable, above what value
should you find 70% of data points? How many of the values from
your data set are above this value?
1d. If this is a normally distributed variable, between what two
numbers (centered around the assumed mean) should you find 68% of
data points? What percentage of your data points are between these
numbers?
1e. Think about your answers to 1c and 1d. Does this variable
appear to be normally distributed with this mean and standard
deviation?
Happy1 |
36 |
18 |
66 |
43 |
28 |
39 |
47 |
40 |
24 |
46 |
48 |
57 |
36 |
58 |
39 |
62 |
43 |
65 |
74 |
36 |
39 |
44 |
61 |
50 |
47 |
63 |
60 |
38 |
45 |
51 |
55 |
46 |
68 |
32 |
42 |
38 |
61 |
45 |
31 |
32 |
44 |
30 |
29 |
62 |
49 |
54 |
64 |
38 |
49 |
55 |
28 |
53 |
55 |
52 |
50 |
54 |
76 |
28 |
49 |
70 |
29 |
34 |
77 |
40 |
50 |
40 |
56 |
54 |
36 |
51 |
42 |
71 |
45 |
53 |
55 |
37 |
51 |
36 |
39 |
36 |
51 |
40 |
51 |
52 |
53 |
33 |
66 |
37 |
76 |
67 |
55 |
46 |
1a) In this question we want to calculate P(X>36)
so
or
or
Hence the probability of observing a value greater than 36 = 0.97
1b) Now we have taken 8 new data points in a new sample. We will use the central limit theorem which defines the mean and standard deviation of a sample to be equal to and where
Mean of the population
Standard Deviation of the population
Sample Size
For this question as well, we want to calculate the probability
or,
or
Hence, if we take 8 new data points in anew sample then the probability that the sample mean will be greater than 36 will be approximately equal to 1.
1c) We want to know the value of a data point so that 70% of the data points are above that value.
In other words, we want to find a right-tailed confidence interval of the variable Happy1 with 70% confidence.
Hence, the Z-score of the value of the required data point should be more than 0.525 to have more 70% of data points lying above that value.
Let the required value of the data point be
So,
or
Hence, 70% of the data points will have a value greater than 60.25.
In the given data set only 17 data points of all has a value greater than 60.25 which is contradictory to the above statement of 70% of the datas being greater than 60.25.
1d) We again assume this dataset to be normally distributed. This time we want to know the range of values between which 68% of the data points will lie.
In other words we want to calculate a two tailed confidence interval for variable Happy1 with 68% confidence.
= 1 - 0.68 = 0.32
/2 = 0.16
Two tailed Value of z for = 0.32 = 0.995 (observed from the z-table)
Now, the expression for the 68% confidence interval for the variabe Happy1 can be given by
( - Z * , + Z * )
= ( 55 - 0.995 * 10 , 55 + 0.995 * 10)
= (45.05 , 64.95)
Hence, we can say that 68% of the values of the data set will lie between 45.05 and 64.95.
In the dataset, upon observation,we find that 40 value are between this interval. Which is less than 50% of the values and this answer is also contradictory with our question that 68% of the values should lie in between this interval.
1e) The answers in question c and d were highly contradictory to the assumption that the following dataset is normally distributed. Due to this wrong results, we have to say that the assumption is not true. That is, the given data set is not normally distributed.
We can show this by plotting these datas in excel.
As we can see the data points are not at all normally distributed.
Hence, the assumption is not true and that is why we are getting contradictory results in question c and d.
Thank You!!
Please Upvote!!