To describe a quantum computer mathematically, we can do math similar to the known topic of probabilistic systems. We therefore first look into describing a probabilistic system.
Deterministic possibilities
At first we need to define all the different possible outcomes of our system. For example, for a coin flip this could be heads or tails and for a dice this could be the labels of the different sides. We call these possibilities deterministic possibilities. Note that we will only be using a finite number of possibilities.
Imagine you have a random number generator, which outputs 2-bit numbers. The deterministic possibilities of this generator are \(00\), \(01\), \(10\) and \(11\).
We will always assume the deterministic possibilities to be ordered in some way (even if it is an arbitrary one). In the example above, the deterministic possibilities are \(00\), \(01\), \(10\), \(11\), not \(00\), \(10\), \(01\), \(11\). We will need this to know the order of entries in vectors and matrices later.
Probability distribution
Next, we need to assign each possibility a probability. We write this as \(\Pr[x]=p\) where \(p \in [0,1]\) is the probability of the deterministic possibility \(x\).
For a coin flip the probability of heads would be \(\Pr[\text{heads}] = \frac{1}{2}\) and the probability for tails would be \(\Pr[\text{tails}] = \frac{1}{2}\).
If we combine all probabilities for all the possible outcomes and write them as a vector, we get a probability distribution. Here it comes in handy that we have a ordering on the deterministic possibilities. If the deterministic possibilities are \(x_1, \dots, x_n\), their probabilities will be in the vector in that order.
Definition 2.1 (Probability distribution) A vector \(d \in \mathbb{R}^n\) is a valid probability distribution iff \(\sum d_i = 1\) and \(\forall i\) \(d_i \geq 0\).
This vector has \(n\) entries, where each entry corresponds to a deterministic possibility \(x\) and the probability of \(x\) is \(\Pr[x] = d_i\). The sum over all probabilities has to be \(1\) and each entry needs to be nonnegative in order to be a valid probability.
For a coin flip the probability distribution would be \(d_{\text{coin}} \in \mathbb{R}^2\) with \(d = \begin{pmatrix}\frac{1}{2}\\ \frac{1}{2} \end{pmatrix}\).
Recall your random 2-bit number generator from above. Imagine your generator outputs each deterministic possibility with equal probability, except for the possibility \(00\), which is never generated. The corresponding probability distribution would be \[
d_{\text{2-bit}}
= \begin{pmatrix} 0 \\[2pt] \frac{1}{3}\\[2pt] \frac{1}{3} \\[2pt] \frac{1}{3} \end{pmatrix}.
\]
Probabilistic processes
With a probability distribution, we can only describe the probabilities of possibilities without any knowledge of a prior state. We therefore add another element to our toolbox of probabilistic systems called a probabilistic process.
A probabilistic process is a collection of \(n\) probability distributions, where for each deterministic possibility \(i\) there is a probability distribution \(a_i\). This means that if the system is in deterministic possibility \(i\) before the process is applied, the system will afterwards be distributed according to \(a_i\). We can write this as a matrix, where each column is a probability distribution \(a_i\).
Definition 2.2 (Probabilistic process) A matrix \(A \in \mathbb{R}^{N \times N}\) is a valid probabilistic process iff for every column \(a\) of \(A\), \(a\) is a valid probability distribution.
From Definition 2.1 we know that a valid probability distribution \(a\) has the properties \(\sum a_i = 1\) and \(\forall i\) \(a_i \geq 0\), therefore a matrix \(A\) is a probabilistic process iff \(A \in \mathbb{R}^{N \times N}\) with \(\sum a_i = 1\) and \(\forall i\) \(a_i \geq 0\) . Such a matrix is also called a stochastic matrix.
Imagine a second device, which receives a 2-bit number as an input and flips both bits at the same time with a probability of \(\frac{1}{3}\). The probability distributions for each of the deterministic possibility would then be \[
a_{00} = \begin{pmatrix} \frac{2}{3} \\ 0 \\ 0 \\ \frac{1}{3} \end{pmatrix},
a_{01} =\begin{pmatrix} 0 \\ \frac{2}{3} \\ \frac{1}{3} \\ 0 \end{pmatrix},
a_{10} =\begin{pmatrix} 0 \\ \frac{1}{3} \\ \frac{2}{3} \\ 0 \end{pmatrix} \text{ and }
a_{11} = \begin{pmatrix} \frac{1}{3} \\ 0 \\ 0 \\ \frac{2}{3} \end{pmatrix}.
\] From this we can construct the process as a matrix from these processes as follows: \[
A_{\text{flip}}
= \begin{pmatrix} a_{00} & a_{01} & a_{10} & a_{11} \end{pmatrix}
= \begin{pmatrix}
\frac{2}{3} & 0 & 0 & \frac{1}{3} \\
0 & \frac{2}{3} & \frac{1}{3} & 0 \\
0 & \frac{1}{3} & \frac{2}{3} & 0 \\
\frac{1}{3} & 0 & 0 & \frac{2}{3}
\end{pmatrix}.
\]
Applying a probabilistic process
Having defined probability distributions and probabilistic processes, we can now combine these two elements and apply a probabilistic process on a probability distribution.
Definition 2.3 (Applying a probabilistic process) Given an initial probability distribution \(x \in \mathbb{R}^n\) and a probabilistic process \(A \in \mathbb{R}^{N \times N}\), the result \(y \in \mathbb{R}^n\) of applying the process \(A\) is defined as \[
y = Ax.
\]
Recall the 2-bit number generator and the bit flip from above. Imagine you would first draw a random 2-bit number from the generator and then run the bit flip device. We already know that the probability distribution of the generator is \(d_\text{2-bit}\). Using \(A_\text{flip}\) we can calculate the final probability distribution: \[
A_\text{flip} \cdot d_\text{2-bit}
= \begin{pmatrix}
\frac{2}{3} & 0 & 0 & \frac{1}{3} \\[2pt]
0 & \frac{2}{3} & \frac{1}{3} & 0 \\[2pt]
0 & \frac{1}{3} & \frac{2}{3} & 0 \\[2pt]
\frac{1}{3} & 0 & 0 & \frac{2}{3}
\end{pmatrix}
\begin{pmatrix} 0 \\[2pt] \frac{1}{3} \\[2pt] \frac{1}{3} \\[2pt] \frac{1}{3} \end{pmatrix}
= \begin{pmatrix} \frac{1}{9} \\[2pt] \frac{1}{3} \\[2pt] \frac{1}{3} \\[2pt] \frac{2}{9} \end{pmatrix}.
\]