In: Statistics and Probability
Internal auditors sometimes check random samples of transactions within a database. Suppose that in a particular set of transactions, 2% contain an error of some kind. The auditor takes a random sample of 20 transactions for checking. Let X denote the number of transactions found to be in error in the sample.
(a) State the probability distribution of X (including the values of all parameters) and find the probability that 2 transactions are found to be in error.
(b) If three or more transactions are found to be in error then a larger sample is taken for checking. How often will this happen? (Use the appropriate template).
(c) What assumption is required for the validity of the above answers?
(a) We know that in a particular set of transactions, 2% contain an error of some kind. Hence if X is the random variable defined as: X= number of transactions found to be in error in a sample of 20,
Probability Distribution of X = Binomial with success probability p = 0.02 and number of trials n = 20.
P[X=2] = Combin(20,2) * (0.02)^2 * (1-0.02)^18
= 190*0.0004 * 0.695135 = 0.05283
Probability that 2 transactions are found to be in error = 0.05283
b) If three or more transactions are found to be in error then a larger sample is taken for checking. How often will this happen?
P[X>=3] = 1- P[X<3] = 1- { P[X=0] + P[X=1] + P[X=2] }
Using the formula from (a),
P[X=0] = Combin(20,0) * (0.02)^0 * (1-0.02)^20
P[X=1] = Combin(20,1) * (0.02)^1 * (1-0.02)^19
P[X=2] = Combin(20,2) * (0.02)^2 * (1-0.02)^18
Evaluate these in Excel and we get
P[X=0] =
0.667607972 |
P[X=1] =
0.27249305 |
P[X=2] =
0.05283 |
Therefore P[X=0] + P[X=1] + P[X=2] =
0.992931 |
Therefore P[X>=3] = 1- 0.992931 =
0.007069 |
Hence three or more transactions are found to be in error happens only 0.707% of times
(c) What assumption is required for the validity of the above answers?
Assumptions: Sampling of 20 is done without replacement, but we assume that the total population of transactions N is sufficiently large. So that, we can use Binomial distribution for the sample of 20 and do not have to use the Hypergeometric distribution.