In: Statistics and Probability
I had recently conducted an experiment: the moment the fuel light in my car goes on, I will find a closest gas station to fill up the tank. But the number of gallons needed to fill up are all over the place. This makes me wonder: 1) how can we verify it is actually one gallon when the pump says it pumps one gallon of gas into your car, 2) if we know some gas stations are short selling us, how to find out what the true volume of gas is.
These two problems are known as hypothesis testing (testing the claim, i.e., the hypothesis, that a supposed gallon is indeed one gallon), and estimation (estimating the true volume) in statistical inference. They are a lot more complex than they might appear because of the nature of continuous variables.
Since the volume is measured, i.e., a continuous variable, you know if you take a measure of the volume of a supposed gallon of gas pumped out the of machine, it will never be exactly 1.0000000... gallon.
So if it turns out to be 0.9823, can you say they are cheating?
You might say I will take 30 measures and look at their mean, but again if it turns out to be 0.9912? can you say they are cheating?
1. Outline your thoughts and share. The statistical inference on modern stat textbooks are the results of two statisticians's answer (Fisher's p-value and Neyman's critical values) to these questions, and they are not even in full agreement. So don't feel bad if you can't solve them in 1 hour. However, the more thoughts you put into this, the less challenging you will find the answers provided in the next two sections.
1)
Sample size, n =30
Sample mean, =0.9912 gallon.
Let the population standard deviation based on the historical data be =0.02 gallon.
Null Hypothesis, H0:
The population mean is not significantly different from 1 gallon. =1
AlternativeHypothesis, H1:
The population mean is significantly different from 1 gallon. 1 (claim) (two-tailed test).
Test statistic, Z =()/() =(0.9912 - 1)/(0.02/) = -2.41
For a two-tailed test, at 0.05 significance level, Z-critical =1.96
Decision criteria:
We reject the null hypothesis, H0 if the test statistic, Z falls in the rejection region. Otherwise, we fail to reject H0.
Conclusion:
Since the test statistic, Z: -2.41 fell in the rejection region, we reject the null hypothesis, H0 at 5% significance level.
(OR For Z = -2.41, for a two-tailed test, the P-value =0.016. Since P-value: 0.016 < 0.05 significance level, we reject the null hypothesis, H0 at 5% significance level).
Thus, we have a sufficient statistical evidence to claim that the population mean, is significantly different from 1 gallon.
So, there is an evidence to claim that they are cheating.
2)
True population mean, is determined by constructing a confidence interval which gives a range of values and we are confident with some confidence level that the interval contains the true population mean, .
Let the confidence level =95%.
So, the significance level, =1 - 95% =0.05
For a two-tailed case, at 0.05 significance level, Z-critical =1.96
Standard Error, SE = =0.02/ =0.00365
Margin of Error, MoE =Z-critical*SE =1.96*0.00365 =0.0072
95% confidence interval for the population mean, = MOE =0.9912 0.0072 =(0.9840, 0.9984)
Interpretation:
We are 95% confident that the interval (0.9840, 0.9984) contains the true population mean, .
Since this interval does not contain 1 but contains the values that are all less than 1 gallon, we have sufficient evidence to claim that the true population mean is not equal to 1 gallon. So, we can say that they are cheating.