In: Statistics and Probability
You might think that if you looked at the first digit in randomly selected numbers that the distribution would be uniform. Actually, it is not! Simon Newcomb and later Frank Benford both discovered that the digits occur according to the following distribution: (digit, probability)
( 1 , 0.301 ) , ( 2 , 0.176 ) , ( 3 , 0.125 ) , ( 4 , 0.097 ) , ( 5 , 0.079 ) , ( 6 , 0.067 ) , ( 7 , 0.058 ) , ( 8 , 0.051 ) , ( 9 , 0.046 )
The IRS currently uses Benford's Law to detect fraudulent tax data. Suppose you work for the IRS and are investigating an individual suspected of embezzling. The first digit of 133 checks to a supposed company are as follows:
Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
Observed Frequency |
37 | 22 | 18 | 9 | 9 | 12 | 7 | 10 | 9 |
a. State the appropriate null and alternative hypotheses for this test.
b. Explain why ? = 0.01 is an appropriate choice for the level of significance in this situation.
c. What is the P-Value? Report answer to 4 decimal places P-Value =
d. What is your decision? Fail to reject the Null Hypothesis Reject the Null Hypothesis
e. Write a statement to the law enforcement officials that will use it to decide whether to pursue the case further or not. Structure your essay as follows: Given a brief explanation of what a Goodness of Fit test is. Explain why a Goodness of Fit test should be applied in this situation. State the hypotheses for this situation. Interpret the answer to part c. Use the answer to part c to justify the decision in part d. Use the decision in part d to make a conclusion about whether the individual is likely to have embezzled. Use this to then tell the law enforcement officials whether they should pursue the case or not.
Solution:-
State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: All the 9 digits are in accordance with Benford's Law.
Alternative hypothesis: At least one of the proportions in the null hypothesis is false.
Formulate an analysis plan. For this analysis, the significance level is 0.01. Using sample data, we will conduct a chi-square goodness of fit test of the null hypothesis.
Analyze sample data. Applying the chi-square goodness of fit test to sample data, we compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic. Based on the chi-square statistic and the degrees of freedom, we determine the P-value.
DF = k - 1 = 9 - 1
D.F = 8
(Ei) = n * pi
X2 = 5.844
where DF is the degrees of freedom, k is the number of levels of the categorical variable, n is the number of observations in the sample, Ei is the expected frequency count for level i, Oi is the observed frequency count for level i, and X2 is the chi-square test statistic.
The P-value is the probability that a chi-square statistic having 8 degrees of freedom is more extreme than 5.844.
c) We use the Chi-Square Distribution Calculator to find P(X2 > 5.844) = 0.665
Interpret results. Since the P-value (0.665) is greater than the significance level (0.01), we have to accept the null hypothesis.
d) Do not reject H0.
e) From the above test we have sufficient evidence in the favor of the claim that all the 9 digits are in accordance with Benford's Law.