Homework 6 Optional Problems

Recall that a set of hash functions (mapping the elements of a universe to the buckets is universal if for every distinct , the probability that and collide, assuming that the hash function is chosen uniformly at random from , is at most . In this problem you will prove that a collision probability of is essentially the best possible. Precisely, suppose that is a family of hash functions mapping to , as above. Show that there must be a pair of distinct elements such that, if is chosen uniformly at random from , then

ANSWER: Let be the size of the set , for . That is, maps inputs to bucket 1, maps inputs to bucket 2, and so on.

  • Let us fix a hash function .
  • Let .
  • Given = number of buckets.

Then, the number of distinct that collide in bucket 1 is . The number of distinct that collide in all buckets is, therefore, (call this expression ). Note that . We are interested in lower-bounding the expression .

It can be shown that the summation is minimized when all are equal, that is, when 1 2. Therefore, we get:

Suppose . Let us define be the set of all distinct . Note that . Let be an indicator random variable, such that, given a hash function and distinct :

Taking expectation of :

By the Pigeon hole Principle, we get: There exists such that the above inequality holds.