In: Statistics and Probability
You might think that if you looked at the first digit in
randomly selected numbers that the distribution would be uniform.
Actually, it is not! Simon Newcomb and later Frank Benford both
discovered that the digits occur according to the following
distribution: (digit, probability)
(1,0.301),(2,0.176),(3,0.125),(4,0.097),(5,0.079),(6,0.067),(7,0.058),(8,0.051),(9,0.046)
The IRS currently uses Benford's Law to detect fraudulent tax data.
Suppose you work for the IRS and are investigating an individual
suspected of embezzling. The first digit of 201 checks to a
supposed company are as follows:
Digit |
Observed Frequency |
---|---|
1 | 49 |
2 | 31 |
3 | 24 |
4 | 14 |
5 | 15 |
6 | 20 |
7 | 21 |
8 | 20 |
9 | 7 |
a. State the appropriate null and alternative hypotheses for this
test.
b. Explain why ?=0.01?=0.01 is an appropriate choice for the level
of significance in this situation.
c. What is the P-Value? Report answer to 4 decimal places
P-Value =
d. What is your decision?
Fail to reject the Null Hypothesis
Reject the Null Hypothesis
a)Chi-sqaure goodness of fit test:
H0: Frequanyc distribution is equal to the proportion
Ha: Frequancy distribution is different from proportion
b) Significant value= 0.01
One of the proporitonal value is 0.046. So it is appropriate to use for the significance level.
c)
Digit | Observed Freq | P(x) | E(X)=n*p(x) | (O-E) | (O-E)^2 | (O-E)^2/E |
1 | 49 | 0.301 | 60.501 | -11.501 | 132.273 | 2.186294 |
2 | 31 | 0.176 | 35.376 | -4.376 | 19.14938 | 0.54131 |
3 | 24 | 0.125 | 25.125 | -1.125 | 1.265625 | 0.050373 |
4 | 14 | 0.097 | 19.497 | -5.497 | 30.21701 | 1.549829 |
5 | 15 | 0.079 | 15.879 | -0.879 | 0.772641 | 0.048658 |
6 | 20 | 0.067 | 13.467 | 6.533 | 42.68009 | 3.169235 |
7 | 21 | 0.058 | 11.658 | 9.342 | 87.27296 | 7.486101 |
8 | 20 | 0.051 | 10.251 | 9.749 | 95.043 | 9.271583 |
9 | 7 | 0.046 | 9.246 | -2.246 | 5.044516 | 0.545589 |
Sum(n) | 201 | Sum | 24.84897 |
c) The degree of freedom: df= c-1= 8
P-value: 0.002
d) Reject the null hypothesis because P-value is lessthan significant value.