10  Code-based cryptography

10.1 Error correcting codes

Rather than assuming our information transmission channels to be perfect, we want to be able to transmit our encrypted messages through noisy channels as well. This can be done by defining a with a limited number of . Even after noise acts on the codeword, we can still recover the original codeword by performing some error-correction procedure.

In this section, we will study . The codewords in such codes are generated by multiplying the message \(m\in\mathbb{F}_2^k\) by a generator matrix \(G\in\mathbb{F}_2^{n\times k}\). We also define a parity check matrix \(H\) such that \(c\in\mathbb{F}_2^n\) is a codeword iff \(Hc = 0\) iff \(c \in \textrm{im } G\) (i.e. \(\ker H = \textrm{im } G\)). Such a matrix \(H\) always exists but is not necessarily unique. Given \(c' = c+e\), a codeword corrupted by an error \(e\) of Hamming weight \(|e|\leq t\), its associated syndrome \(\sigma\) is defined by: \[ \sigma:=Hc'=Hc+He=He \] Note that the syndrome only depends on the error, and not on the codeword! If you recover \(e\) from \(\sigma\), you can then calculate the original codeword \(c = c'-e\).

Given generator matrix \(G\) and \(c+e\) (where \(c\in\textrm{im } G\), \(|e|\leq t\)), find \(c\) (if unique). This problem is NP-hard in general.

Given parity check matrix \(H\) and \(\sigma=He\) where \(|e|\leq t\), find \(e\) (if unique). This is also NP-hard in general.

For \(G\) and \(H\) from the same code, the two problems are equivalent. These problems are of special interest to us since NP-hard problems are where we can expect to find some quantum advantage. NP-hardness alone, however, does not mean that the specific codes used in cryptosystem are hard to decode.

TipExample

We define the repetition code by:

\(m \in \mathbb{F}_2^1\) (1 bit) \(\mapsto c=\begin{pmatrix}m\\ m\\ m \end{pmatrix}\in \mathbb{F}_2^3\), so \(G = \begin{pmatrix}1\\ 1\\ 1 \end{pmatrix}\).

The error correction of a non-codeword \(c'=(c'_1, c'_2, c'_3)^\top\) is given by the majority \(m\) of \(c'_1\), \(c'_2\), \(c'_3\), so that \(c=(m, m, m)^\top\). This error correction procedure corrects up to 1 error, and has the corresponding \(H\) matrix

\[H=\begin{pmatrix}1 & 1 & 0 \\ 0 & 1 & 1\end{pmatrix}, Hc'=0 \textrm{ iff } c'_1=c'_2 \wedge c'_2=c'_3 \textrm{ iff } c' \in \textrm{im} G. \]

10.2 McEliece encryption

How can we make an encryption scheme from this? We could start by, given a code \(G\), encrypting a message \(m\in\mathbb{F}_2^k\) as \(c=Gm\in\mathbb{F}^n_2\). This, however, is easy to decrypt by Gaussian elimination.

We can, on top of that, add a random error \(e\) with \(|e|\leq t\). But either error-correcting \(G\) is easy, which makes attacking it easy, or error-correcting it is hard, making honest decryption hard. Another alternative is to scramble the code, which leads us to the following encryption scheme:

Definition 10.1 (McEliece encryption)  

KeyGen: Enc(\(pk,m\)): Dec(\(sk,c\)):

McEliece is randomized, but this is not enough to make it IND-CPA (e.g. Enc(\(pk,0\))=\(e\) with \(|e|\leq t\) is easy to recognize). We will need another security definition.

Definition 10.2 (OW-CPA) For each \(t\)-time adversary \(A\): \[\Pr[m'=m: (pk,sk)\leftarrow\textrm{KeyGen()}, m\xleftarrow{\$}\mathcal{M}, c\leftarrow\textrm{Enc}(pk,m), m'\leftarrow A(pk,c)]\leq\epsilon.\]

This intuitively means that the adversary cannot decrypt an encryption of a uniformly random message.

McEliece is commonly assumed to be OW-CPA for suitable choice of code. However, OW-CPA is enough for most appllications. We will later discuss ways to fix this.

10.3 Niederreiter encryption

A closely related encryption scheme is the Niederreiter encryption:

Definition 10.3 (Niederreiter)  

KeyGen: Enc(\(pk,m\)): Dec(\(sk,c\)):

If McEliece is \((t, \epsilon)\)-OW-CPA, then Niederreiter is \((t', \epsilon)\)-OW-CPA when using the same family of codes for suitable value of \(t'\) and using parity check matrix \(H\) with linearly independent rows.