4 Hashing
4.1 Symmetric Cryptography
Symmetric Cryptography refers to cryptographic systems where encryption and decryption happen with the same key. It includes for example hash functions, block ciphers as well as symmetric encryption schemes.
4.1.1 Hash Functions
A hash function takes an arbitrary length bit-string and maps it to a fixed length output. It generally looks like this: \(H: \{0,1\}^*\rightarrow\{0,1\}^n\). Examples of well-known hash functions include MD5, SHA1, SHA3.
We use hash functions for a variety of different purposes:
- They can help identify a long piece of data \(m\) from a short hash \(H(m)\).
- If we hash a short seed, it can easily be used for random number generation.
- Hash functions need to be hard to invert (even partially). This is for example used in proofs of work for blockchains.
- Time-stamping and signatures, by signing or storing the hash instead of the message itself.
- Building blocks in bigger cryptographic constructions, etc…
Hash functions need to fulfill certain security requirements. They need to possess, among others, collision resistance, one-wayness and pseudorandomness.
4.1.1.1 Collision resistance
Collision resistance means it is hard to find two inputs to the hash function which share the same output. A hash function is collision resistant iff it is hard to find \(x\neq x'\) such that \(H(x)=H(x')\). A hash function without this resistance would be problematic because finding e.g. two documents with the same hash, you could substitute the documents without anyone noticing.
Collisions of course always exist, because the output is shorter than the input. We just need collisions to be hard to find.
This definition is, unfortunately, impossible to satisfy for reasonable \((t,\epsilon)\). Since \((x,x')\) exist, we can just create an adversary that ouputs one hard-coded collision always and instantly. In practice this definition still makes sense, this collision of course still takes infeasibly long to find.
(This can be made formal, see (2006). Alternatively, one can choose to use a variant where the hash function depends on some parameter or key \(k\).) We now ask ourselves: How secure can a hash function be in the best case, classically or quantumly?
4.2 The Birthday Attack
The birthday problem is about the chance of two people in a room having the same birthday. For a surprisingly low number of people, this chance starts to get surprisingly high. The probability for two people in the room to have the same birthday is \(\approx\frac{n^2}{2\cdot 365}\) (as long as n is not too large).
Correspondingly, the birthday attack against collision resistance works as follows: We first pick \(2^{\frac{n}{2}}\) different values \(x_i\xleftarrow{\$}\{0,1\}^m\). With our collision defined as \(H: \{0,1\}^m\rightarrow\{0,1\}^n\) with \(m\gg n\), find \(x\neq x'\) with \(H(x)=H(x')\). We then compute \(h_i:=H(x_i)\) for all \(i\). Since these are, essentially, random values, the probability of a collision is similar to before. Pr[two \(h_i\) are equal]\(\approx\frac{{(2^{n/2}})^2}{2^n}\approx 1\). Now we find the collision in the \(h_i\) (just find two identical \(h_i\)), and output that collision. Thus we can find a collision using \(O(2^{n/2})\) evaluations of H, and time \(\tilde{O}(2^{n/2})\) (for sorting the \(h_i\)’s to find n duplicates). Here, it’s important to note that \(\tilde{O}\) is a “soft” notation, meaning we ignore logarithmic factors.
We now take a look at a variation of this attack. It is not necessarily useful in the classical case, but will make it easier to understand the quantum algorithm we are about to introduce.
- Pick \(2^{\frac{n}{2}}\) values \(x_i\)
- \(h:= H(x_i)\) \(\mathbb{D}:=\{\underline{h_i}\}\), \(\mathcal{L}_{\mathbb{D}}(z):=[H(z)\in\mathbb{D}]\) (The square brackets here mean we evaluate whether it is true)
- For random \(z\), Pr[\(\mathcal{L}_{\mathbb{D}}(z)=1\)]\(\approx\frac{\lvert\mathbb{D}\rvert }{2^n}=\frac{2^{\frac{n}{2}}}{2^n}=2^{-\frac{n}{2}}\)
- Search \(z\) with \(\mathcal{L}_{\mathbb{D}}(z)=1\) by trying random \(z\)’s.
- This succeeds after \(\approx2^{\frac{n}{2}}\) tries. If we found it, then the hash of \(z\) is in \(\mathbb{D}\), which means \(H(z)=h_i=H(x_i)\).
- Find \(x_i\) by table lookup. Output the collision \((z,x_i)\).
In short, we pick a few values out of all possible ones, and hash them. All the hashes go into the set \(\mathbb{D}\). This set is analogous to the room that we pick two people out of in the birthday problem. Only in this case, the hashes of the \(x_i\) are the birthdays, so to speak. We then randomly pick two \(z\)’s to compare. This search is faster the more items are in the set \(\mathbb{D}\). However, generating \(\mathbb{D}\) takes time. Therefore, there is a trade-off. If \(\mathbb{D}\) contains \(2^{cn}\) and not \(2^{n/2}\) many items, then search needs \(2^{(1-c)n}\) iterations. This means, for us, that the number of evaluations is equal to \(O(2^{cn}+2^{(1-c)n})=O(2^{max(c,1-c)\cdot n})\). (The first term being for populating \(\mathbb{D}\), and the second term for searching). This is optimal for \(c=\frac{1}{2}\)
All in all, we found collisions using \(O(2^{\frac{n}{2}})\) evaluations and \(\tilde{O}(2^{n/2})\) time (assuming a suitable datastructure for representing the set \(\mathbb{D}\)) of H.
4.3 BHT algorithm (quantum)
We could now optimize this algorithm. In the quantum world, search is faster. We can use Grover for search. Search being faster, however has lost us our optimum point for the trade-off. Remember, our runtime is the maximum of the two. We therefore adjust by making the set \(\mathbb{D}\) smaller in return. Parametrized with the factor \(c\) our algorithm looks like this.
- Pick \(2^{c\times n}\) values \(x_i\)
- \(h_i:= H(x_i)\) \(\mathbb{D}:=\{\underline{d_i}\}\), \(\mathcal{L}_{\mathbb{D}}(z):=[H(z)\in\mathbb{D}]\)
- For random z, Pr[\(\mathcal{L}_{\mathbb{D}}(z)=1\)]\(\approx\frac{\lvert\mathbb{D}\rvert }{2^n}=\frac{2^{c\times n }}{2^n}=2^{(c-1))\times n}\)
- Search z with \(\mathcal{L}_{\mathbb{D}}=1\) by .
- This succeeds after \(\approx \sqrt{\frac{2^n}{\lvert\mathbb{D}\rvert}}\) tries. If we found it, then the hash of \(z\) is in \(\mathbb{D}\), which means \(H(z)=h_i=H(x_i)\).
- Find \(x_i\) by table lookup. Output the collision \((z,x_i)\).
Our runtime then becomes \(O(2^{cn}+2^{(1-c)/2 \times n})=O(2^{max(c,(1-c)/2)\cdot n})\). The runtime therefore becomes optimal at \(c=\frac{1}{3}\). For that value of \(c\), both the number of elements in \(\mathbb{D}\) and the runtime for rover are balanced. We conclude, that the algorithm takes \(O(2^{n/3})\) queries as well as \(\tilde{O}(2^{n3})\) time. Time here means the number of gates applied and the classical computation steps applied. This is a noticeable speedup over the classical case. We see that quantumly, Hash-functions still make sense, however some hash functions that might have seemed secure become breakable.