In: Advanced Math
Please Answer 4.4.2: Do you see any problems with the choice of hash functions in Exercise 4.4.1? What advice could you give someone who was going to use a hash function of the form h ( x ) = ax + b mod 2 k ?
Exercise 4.4.1:
Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2,6, 5. Our hash functions will all be of the form h (x) = ax+b mod 32 for some a and b. You should treat the result as a 5-bit binary integer. Determine the tail length for each stream element and the resulting estimate of the number of distinct elements if the hash function is:
(a) h(x) = 2x+ 1 mod 32.
(b) h(x) = 3x+ 7 mod 32.
(c) h(x) = 4x mod 32.
4.4.2
During the computational processes, it is obvious that this algorithm is quite sensitive to the hash function parameters. Some of the following advice may be useful for someone who is going to use the hash function of the form h ( x ) = ax + b mod 2^k. That is the parameters have to be odd numbers for the best result. Since when it is even number, whatever b is, the hash function does not generate good results.
When a is even, and b is odd, the hash function always returns odd numbers. That causes the binary value always ends by 1's, making the tail length is always equal to 0 as in case [a].
When a is even, and b is also even, the hashed value will be badly affected. For instance, in case [c] hashed values of 2 different elements become the same (both 1 and 9 have the same hashed value of 4). It revokes the primary rule of a hash function that is: "with different elements, the hash function is supposed to generate different values".