In: Statistics and Probability
Recall that Benford's Law claims that numbers chosen from very
large data files tend to have "1" as the first nonzero digit
disproportionately often. In fact, research has shown that if you
randomly draw a number from a very large data file, the probability
of getting a number with "1" as the leading digit is about 0.301.
Now suppose you are the auditor for a very large corporation. The
revenue file contains millions of numbers in a large computer data
bank. You draw a random sample of n = 229 numbers from
this file and r = 88 have a first nonzero digit of 1. Let
p represent the population proportion of all numbers in
the computer file that have a leading digit of 1.
(i) Test the claim that p is more than 0.301. Use
α = 0.05.
(a) What is the level of significance?
State the null and alternate hypotheses.
H0: p = 0.301; H1: p < 0.301
H0: p > 0.301; H1: p = 0.301
H0: p = 0.301; H1: p ≠ 0.301
H0: p = 0.301; H1: p > 0.301
(b) What sampling distribution will you use?
The standard normal, since np < 5 and nq < 5.
The Student's t, since np < 5 and nq < 5.
The standard normal, since np > 5 and nq > 5.
The Student's t, since np > 5 and nq > 5.
What is the value of the sample test statistic? (Round your answer
to two decimal places.)
(c) Find the P-value of the test statistic. (Round your
answer to four decimal places.)
(d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level α?
At the α = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.
At the α = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant.
At the α = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant.
At the α = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.
(e) Interpret your conclusion in the context of the
application.
There is sufficient evidence at the 0.05 level to conclude that the true proportion of numbers with a leading 1 in the revenue file is greater than 0.301.
There is insufficient evidence at the 0.05 level to conclude that the true proportion of numbers with a leading 1 in the revenue file is greater than 0.301.
(ii) If p is in fact larger than 0.301, it would seem
there are too many numbers in the file with leading 1's. Could this
indicate that the books have been "cooked" by artificially lowering
numbers in the file? Comment from the point of view of the Internal
Revenue Service. Comment from the perspective of the Federal Bureau
of Investigation as it looks for "profit skimming" by unscrupulous
employees.
No. There does not seem to be too many entries with a leading digit 1.
Yes. There seems to be too many entries with a leading digit 1.
No. There seems to be too many entries with a leading digit 1.
Yes. There does not seem to be too many entries with a leading digit 1.
(iii) Comment on the following statement: If we reject the null
hypothesis at level of significance α , we have not proved
H0 to be false. We can say that the probability
is α that we made a mistake in rejecting
Ho. Based on the outcome of the test, would you
recommend further investigation before accusing the company of
fraud?
We have not proved H0 to be false. Because our data lead us to accept the null hypothesis, more investigation is not merited.
We have not proved H0 to be false. Because our data lead us to reject the null hypothesis, more investigation is not merited.
We have proved H0 to be false. Because our data lead us to reject the null hypothesis, more investigation is not merited.
We have not proved H0 to be false. Because our data lead us to reject the null hypothesis, more investigation is merited.
We would be looking first question (i) all 5 parts here as:
(i) a) The level of significance is given in the problem as 0.05. Therefore 0.05 is the level of significance here.
As we are testing here whether p is more than 0.301, therefore the null and the alternative hypothesis here are given as:
b) np = 0.301*229 is clearly greater than 10,
Also (1 - 0.301)*229 is greater than 10.
Therefore The standard normal, since np > 5 and nq > 5. is the correct answer here.
The sample proportion here is computed as:
p = x/n = 88/229 = 0.3843
The test statistic here is computed as:
Therefore 2.75 is the test statistic value here.
c) As this is a one tailed test, the p-value here is computed from the standard normal tables as:
p = P(Z > 2.75) = 0.0030
Therefore 0.0030 is the required p-value here.
d) As the p-value here is 0.003 < 0.05 which is the level of significance, therefore the test is significant here and we can reject the null hypothesis here. At the α = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.
e) As the test is significant here, therefore There is sufficient evidence at the 0.05 level to conclude that the true proportion of numbers with a leading 1 in the revenue file is greater than 0.301.