In: Computer Science
Consider inserting n distinct keys into a hash table with m buckets, where collisions are resolved by chaining and simple uniform hashing applies.
1. What is the probability that none of the n keys hashed to a particular bucket?
2. What is the expected number of empty buckets?
3. What is the expected number of collisions?
1.
Let's take the bucket 1 to be empaty means note a single key inserted in that bucket. If bucket 1 is empty after n insertions that means keys must have mapped to remaining m-1 locations, n insertions means n keys mapped.
So the probability that none of the keys hashed to bucket 1 = (1 - 1 / m)n
same applies for all the buckets, so the probability that none of the n keys hashed to a particular bucket is
= (1 - 1 / m)n
2.
The probability that a bucket remains empty after mapping all n keys is (1 − 1 / m )n. Defining
X = { 1 if bucket remains empty; 0 otherwise }
thus E(X) = (1 − 1 / m )n.
The number of empty buckets is X = X1 + X2 + . . . + Xm.
Hence, the expected number of empty buckets is
sum(X) = m (1 - 1 / m)n
3.
The number of collisions can be determined from the number of empty buckets.
Writing X for the number of empty buckets, as above,
we have m − X items hashed without collision and therefore a total of n − m + X collisions.
Writing Z for the number of collisions,
thus E(Z) = n − m + E(X)
= n − m + m (1 − 1 / m )n
For m = n, we get limn→∞ n(1 − 1 / n )n = n / e
in other words the third of the keys is expected to cause collisions.