Normal cases

We consider a tri-diagonal linear system whose size is \(n\):

\[\begin{split}\begin{bmatrix} c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ 0 & l_2 & c_2 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & c_{i-1} & u_{i-1} & 0 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 0 & \cdots & l_{i } & c_{i } & u_{i } & \cdots & 0 & 0 & 0 \\ 0 & 0 & 0 & \cdots & 0 & l_{i+1} & c_{i+1} & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & c_{n-3} & u_{n-3} & 0 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & l_{n-2} & c_{n-2} & u_{n-2} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & l_{n-1} & c_{n-1} \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ \vdots \\ x_{i-1} \\ x_{i } \\ x_{i+1} \\ \vdots \\ x_{n-3} \\ x_{n-2} \\ x_{n-1} \end{bmatrix} = \begin{bmatrix} q_0 \\ q_1 \\ q_2 \\ \vdots \\ q_{i-1} \\ q_{i } \\ q_{i+1} \\ \vdots \\ q_{n-3} \\ q_{n-2} \\ q_{n-1} \end{bmatrix},\end{split}\]

where \(x_i\) is the answer of the system and is to be computed.

For notational simplicity, we concatenate the tri-diagonal matrix and the right-hand-side vector to yield a matrix with \(n\) rows and \(n + 1\) columns:

\[\begin{split}\begin{bmatrix} c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 \\ l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1 \\ 0 & l_2 & c_2 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_2 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & c_{i-1} & u_{i-1} & 0 & \cdots & 0 & 0 & 0 & q_{i-1} \\ 0 & 0 & 0 & \cdots & l_{i } & c_{i } & u_{i } & \cdots & 0 & 0 & 0 & q_{i } \\ 0 & 0 & 0 & \cdots & 0 & l_{i+1} & c_{i+1} & \cdots & 0 & 0 & 0 & q_{i+1} \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & c_{n-3} & u_{n-3} & 0 & q_{n-3} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & l_{n-2} & c_{n-2} & u_{n-2} & q_{n-2} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & l_{n-1} & c_{n-1} & q_{n-1} \end{bmatrix},\end{split}\]

and consider the Gaussian elimination.

Our objective is to convert the tri-diagonal matrix to an identity matrix (i.e., matrix inversion):

\[\begin{split}\begin{bmatrix} 1 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_0 \\ 0 & 1 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_1 \\ 0 & 0 & 1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_2 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 1 & 0 & 0 & \cdots & 0 & 0 & 0 & x_{i-1} \\ 0 & 0 & 0 & \cdots & 0 & 1 & 0 & \cdots & 0 & 0 & 0 & x_{i } \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1 & \cdots & 0 & 0 & 0 & x_{i+1} \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 1 & 0 & 0 & x_{n-3} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 1 & 0 & x_{n-2} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 1 & x_{n-1} \end{bmatrix}.\end{split}\]

Forward Sweep

First we eliminate the lower-diagonal components \(l_1, l_2, \cdots, l_{n-2}, l_{n-1}\). To get started, we look at the top two rows:

\[\begin{split}\begin{bmatrix} c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 \\ l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1 \end{bmatrix}.\end{split}\]

Dividing the first row by \(c_0\) yields

\[\begin{split}\begin{bmatrix} 1 & u_0 / c_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 / c_0 \\ l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1 \end{bmatrix} = \begin{bmatrix} 1 & v_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & r_0 \\ l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1 \end{bmatrix},\end{split}\]

namely

\[v_0 \leftarrow \frac{u_0}{c_0}, r_0 \leftarrow \frac{q_0}{c_0}.\]
src/tri_diagonal_solver.c
63v[0] = u[0] / c[0];
64q[0] = q[0] / c[0];

Note

In the code, q is used to store the solution \(x_i\) as well as the input \(q_i\); namely the input array is overwritten by the solver.

Now we focus on the general \(i-1\)-th and \(i\)-th rows:

\[\begin{split}\begin{bmatrix} 0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\ 0 & 0 & 0 & \cdots & l_i & c_i & u_i & \cdots & 0 & 0 & 0 & q_{i } \end{bmatrix},\end{split}\]

where the upper row (\(i-1\)-th row) has already been updated, while the bottom row (\(i\)-th row) is to be updated now.

To eliminate \(l_i\) from the \(i\)-th row, we subtract the \(i-1\)-th row times \(l_i\) from the \(i\)-th row:

\[\begin{split}\begin{bmatrix} 0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\ 0 & 0 & 0 & \cdots & 0 & c_i - l_i v_{i-1} & u_i & \cdots & 0 & 0 & 0 & q_{i} - l_i r_{i-1} \end{bmatrix}\end{split}\]

and divide the \(i\)-th row by \(c_i - l_i v_{i-1}\):

\[\begin{split}\begin{bmatrix} 0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\ 0 & 0 & 0 & \cdots & 0 & 1 & \frac{u_i}{c_i - l_i v_{i-1}} & \cdots & 0 & 0 & 0 & \frac{q_{i} - l_i r_{i-1}}{c_i - l_i v_{i-1}} \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\ 0 & 0 & 0 & \cdots & 0 & 1 & v_{i} & \cdots & 0 & 0 & 0 & r_{i } \end{bmatrix},\end{split}\]

where

\[v_i \leftarrow \frac{u_i}{c_i - l_i v_{i-1}}, r_i \leftarrow \frac{q_i - l_i r_{i-1}}{c_i - l_i v_{i-1}}.\]

This is repeated from \(i = 1\) to \(n - 2\):

src/tri_diagonal_solver.c
66for (size_t i = 1; i < size - 1; i++) {
67  // assume positive-definite system
68  //   to skip zero-division checks
69  const double val = 1. / (c[i] - l[i] * v[i - 1]);
70  v[i] = val * u[i];
71  q[i] = val * (q[i] - l[i] * q[i - 1]);
72}

Basically we do the same thing for the last row, expect the treatment of the singularity; namely the denominator

\[c_{n-1} - l_{n-1} v_{n-2}\]

can be \(0\), or equivalently the rank of the matrix is \(n-1\) (degeneracy). This is expected when dealing with Poisson equations subject to Neumann boundary conditions.

To take into account the singularity and to avoid the resulting zero divisions, we need a special treatment:

src/tri_diagonal_solver.c
74// NOTE: do the same thing but consider singularity (degeneracy)
75const double val = c[size - 1] - l[size - 1] * v[size - 2];
76if (DBL_EPSILON < fabs(val)) {
77  q[size - 1] = 1. / val * (q[size - 1] - l[size - 1] * q[size - 2]);
78} else {
79  // singular
80  q[size - 1] = 0.;
81}

Backward Substitution

After the forward sweep, we are left with the following system:

\[\begin{split}\begin{bmatrix} 1 & v_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ 0 & 1 & v_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 \\ 0 & 0 & 0 & \cdots & 0 & 1 & v_i & \cdots & 0 & 0 & 0 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 1 & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 1 & v_{n-3} & 0 \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 1 & v_{n-2} \\ 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ \vdots \\ x_{i-1} \\ x_{i } \\ x_{i+1} \\ \vdots \\ x_{n-3} \\ x_{n-2} \\ x_{n-1} \end{bmatrix} = \begin{bmatrix} r_0 \\ r_1 \\ r_2 \\ \vdots \\ r_{i-1} \\ r_{i } \\ r_{i+1} \\ \vdots \\ r_{n-3} \\ r_{n-2} \\ r_{n-1} \end{bmatrix}.\end{split}\]

The last row:

\[x_{n-1} = r_{n-1}\]

has already been computed in the forward sweep and thus we do not have to do anything. For the rest, since we have

\[x_i = r_i - v_i x_{i+1},\]

\(x_i\) can be computed sequentially from \(i = n-2\) to \(i = 0\), which is known as the backward substitution.

src/tri_diagonal_solver.c
83for (size_t i = size - 2; ; i--) {
84  q[i] -= v[i] * q[i + 1];
85  if (0 == i) {
86    break;
87  }
88}

Note again that q is shared among \(x_i\) (output) and \(q_i\) (input) in the code.

Note

Although l[0] and u[n-1] are not used in this case, l, c, and u all have the length \(n\) for simplicity. In particular these values are used for \(periodic systems <sherman_morrison>\).