# Homework 6 Optional Problems

Recall that a set $H$ of hash functions (mapping the elements of a universe $U$ to the buckets $\{0,1,2,...,n - 1\})$ is universal if for every distinct $x, y \in U$, the probability $Prob[h(x) = h(y)]$ that $x$ and $y$ collide, assuming that the hash function $h$ is chosen uniformly at random from $H$, is at most $\frac{1}{n}$. In this problem you will prove that a collision probability of $\frac{1}{n}$ is essentially the best possible. Precisely, suppose that $H$ is a family of hash functions mapping $U$ to $\{0,1,2,...,n - 1\}$, as above. Show that there must be a pair $x, y \in U$ of distinct elements such that, if $h$ is chosen uniformly at random from $H$, then $Prob[h(x) = h(y)] \geq \frac{1}{n} - \frac{1}{\lvert U \rvert}$

ANSWER: Let $n_i$ be the size of the set $\{x: x \in U, h(x) = i\}$, for $i \in \{0,1,2,...,n - 1\}$. That is, $h_1$ maps inputs to bucket 1, $h_2$ maps inputs to bucket 2, and so on.

• Let us fix a hash function $h \in H$.
• Let $N = \lvert U \rvert$.
• Given $n$ = number of buckets.

Then, the number of distinct $x_1, x_2 \in U$ that collide in bucket 1 is $\binom{n_i}{2}$. The number of distinct $x_1, x_2 \in U$ that collide in all buckets is, therefore, $\sum_{i=1}^{n} \binom{n_i}{2}$ (call this expression $L_h$). Note that $n_i \geq 0 \text{ and } \sum_{i=1}^{n} n_i = N$. We are interested in lower-bounding the expression $L_h$.

It can be shown that the summation is minimized when all $n_i$ are equal, that is, when $n_i = \frac{N}{n}$ 1 2. Therefore, we get:

Suppose $H = \{h_1,h_2,...,h_k\} \text{ and } \vert H \rvert = K$. Let us define $P$ be the set of all distinct $x_1, x_2 \in U$. Note that $\lvert P \rvert = \frac{N(N-1)}{2}$. Let $X_i$ be an indicator random variable, such that, given a hash function $h_i \in H$ and distinct $x_1, x_2 \in U$:

Taking expectation of $X_i$:

By the Pigeon hole Principle, we get: There exists $x, y \in U$ such that the above inequality holds.