In: Statistics and Probability
You might think that if you looked at the first digit in
randomly selected numbers that the distribution would be uniform.
Actually, it is not! Simon Newcomb and later Frank Benford both
discovered that the digits occur according to the following
distribution: (digit, probability)
(1,0.301),(2,0.176),(3,0.125),(4,0.097),(5,0.079),(6,0.067),(7,0.058),(8,0.051),(9,0.046)(1,0.301),(2,0.176),(3,0.125),(4,0.097),(5,0.079),(6,0.067),(7,0.058),(8,0.051),(9,0.046)
The IRS currently uses Benford's Law to detect fraudulent tax
data. Suppose you work for the IRS and are investigating an
individual suspected of embezzling. The first digit of 177 checks
to a supposed company are as follows:
Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Observed Frequency |
53 | 20 | 19 | 21 | 11 | 16 | 17 | 9 | 11 |
a. State the appropriate null and alternative hypotheses for this
test.
b. Explain why ?=0.01?=0.01 is an appropriate choice for the level
of significance in this situation.
c. What is the P-Value? Report answer to 4 decimal places
P-Value =
d. What is your decision?
Reject the Null Hypothesis
Fail to reject the Null Hypothesis
e. Write a statement to the law enforcement officials that will use
it to decide whether to pursue the case further or not. Structure
your essay as follows:
Given a brief explanation of what a Goodness of Fit test is.
Explain why a Goodness of Fit test should be applied in this
situation.
State the hypotheses for this situation.
Interpret the answer to part c.
Use the answer to part c to justify the decision in part d.
Use the decision in part d to make a conclusion about whether the
individual is likely to have embezzled.
Use this to then tell the law enforcement officials whether they
should pursue the case or not.
Chi square goodness of fit test:
Hypothesis:
H0: Observed frequency is equal to expected value
Ha: Observed freqency is different from expected value
b) Significant value= 0.01, it reduces the occuring of false poisitve(type 1 error)
The degree of freeodm= n-1=8
c)
Digit | Ob Frequency | Probability | Expected | (O-E) | (O-E)^2 | (O-E)^2/E |
1 | 53 | 0.301 | 53.277 | -0.277 | 0.076729 | 0.00144 |
2 | 20 | 0.176 | 31.152 | -11.152 | 124.3671 | 3.992267 |
3 | 19 | 0.125 | 22.125 | -3.125 | 9.765625 | 0.441384 |
4 | 21 | 0.097 | 17.169 | 3.831 | 14.67656 | 0.854829 |
5 | 11 | 0.079 | 13.983 | -2.983 | 8.898289 | 0.636365 |
6 | 16 | 0.067 | 11.859 | 4.141 | 17.14788 | 1.44598 |
7 | 17 | 0.058 | 10.266 | 6.734 | 45.34676 | 4.417179 |
8 | 9 | 0.051 | 9.027 | -0.027 | 0.000729 | 8.08E-05 |
9 | 11 | 0.046 | 8.142 | 2.858 | 8.168164 | 1.003213 |
Sum | 177 | Sum | 12.79274 |
P-value: 0.119
d) Fail to reject null hypothesis
e) There is sufficient evidence to support that the observed values equal to the expectred values and the law enforcement officials allowed to use this report.