Normal cases
We consider a tri-diagonal linear system whose size is \(n\):
\[\begin{split}\begin{bmatrix}
c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
0 & l_2 & c_2 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & c_{i-1} & u_{i-1} & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & 0 & \cdots & l_{i } & c_{i } & u_{i } & \cdots & 0 & 0 & 0 \\
0 & 0 & 0 & \cdots & 0 & l_{i+1} & c_{i+1} & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & c_{n-3} & u_{n-3} & 0 \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & l_{n-2} & c_{n-2} & u_{n-2} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & l_{n-1} & c_{n-1}
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
\vdots \\
x_{i-1} \\
x_{i } \\
x_{i+1} \\
\vdots \\
x_{n-3} \\
x_{n-2} \\
x_{n-1}
\end{bmatrix}
=
\begin{bmatrix}
q_0 \\
q_1 \\
q_2 \\
\vdots \\
q_{i-1} \\
q_{i } \\
q_{i+1} \\
\vdots \\
q_{n-3} \\
q_{n-2} \\
q_{n-1}
\end{bmatrix},\end{split}\]
where \(x_i\) is the answer of the system and is to be computed.
For notational simplicity, we concatenate the tri-diagonal matrix and the right-hand-side vector to yield a matrix with \(n\) rows and \(n + 1\) columns:
\[\begin{split}\begin{bmatrix}
c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 \\
l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1 \\
0 & l_2 & c_2 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_2 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & c_{i-1} & u_{i-1} & 0 & \cdots & 0 & 0 & 0 & q_{i-1} \\
0 & 0 & 0 & \cdots & l_{i } & c_{i } & u_{i } & \cdots & 0 & 0 & 0 & q_{i } \\
0 & 0 & 0 & \cdots & 0 & l_{i+1} & c_{i+1} & \cdots & 0 & 0 & 0 & q_{i+1} \\
\vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & c_{n-3} & u_{n-3} & 0 & q_{n-3} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & l_{n-2} & c_{n-2} & u_{n-2} & q_{n-2} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & l_{n-1} & c_{n-1} & q_{n-1}
\end{bmatrix},\end{split}\]
and consider the Gaussian elimination.
Our objective is to convert the tri-diagonal matrix to an identity matrix (i.e., matrix inversion):
\[\begin{split}\begin{bmatrix}
1 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_0 \\
0 & 1 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_1 \\
0 & 0 & 1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & x_2 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 1 & 0 & 0 & \cdots & 0 & 0 & 0 & x_{i-1} \\
0 & 0 & 0 & \cdots & 0 & 1 & 0 & \cdots & 0 & 0 & 0 & x_{i } \\
0 & 0 & 0 & \cdots & 0 & 0 & 1 & \cdots & 0 & 0 & 0 & x_{i+1} \\
\vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 1 & 0 & 0 & x_{n-3} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 1 & 0 & x_{n-2} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 1 & x_{n-1}
\end{bmatrix}.\end{split}\]
Forward Sweep
First we eliminate the lower-diagonal components \(l_1, l_2, \cdots, l_{n-2}, l_{n-1}\).
To get started, we look at the top two rows:
\[\begin{split}\begin{bmatrix}
c_0 & u_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 \\
l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1
\end{bmatrix}.\end{split}\]
Dividing the first row by \(c_0\) yields
\[\begin{split}\begin{bmatrix}
1 & u_0 / c_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_0 / c_0 \\
l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1
\end{bmatrix}
=
\begin{bmatrix}
1 & v_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & r_0 \\
l_1 & c_1 & u_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & q_1
\end{bmatrix},\end{split}\]
namely
\[v_0 \leftarrow \frac{u_0}{c_0},
r_0 \leftarrow \frac{q_0}{c_0}.\]
src/tri_diagonal_solver.c
63v[0] = u[0] / c[0];
64q[0] = q[0] / c[0];
Note
In the code, q
is used to store the solution \(x_i\) as well as the input \(q_i\); namely the input array is overwritten by the solver.
Now we focus on the general \(i-1\)-th and \(i\)-th rows:
\[\begin{split}\begin{bmatrix}
0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\
0 & 0 & 0 & \cdots & l_i & c_i & u_i & \cdots & 0 & 0 & 0 & q_{i }
\end{bmatrix},\end{split}\]
where the upper row (\(i-1\)-th row) has already been updated, while the bottom row (\(i\)-th row) is to be updated now.
To eliminate \(l_i\) from the \(i\)-th row, we subtract the \(i-1\)-th row times \(l_i\) from the \(i\)-th row:
\[\begin{split}\begin{bmatrix}
0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\
0 & 0 & 0 & \cdots & 0 & c_i - l_i v_{i-1} & u_i & \cdots & 0 & 0 & 0 & q_{i} - l_i r_{i-1}
\end{bmatrix}\end{split}\]
and divide the \(i\)-th row by \(c_i - l_i v_{i-1}\):
\[\begin{split}\begin{bmatrix}
0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\
0 & 0 & 0 & \cdots & 0 & 1 & \frac{u_i}{c_i - l_i v_{i-1}} & \cdots & 0 & 0 & 0 & \frac{q_{i} - l_i r_{i-1}}{c_i - l_i v_{i-1}}
\end{bmatrix}
=
\begin{bmatrix}
0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 & r_{i-1} \\
0 & 0 & 0 & \cdots & 0 & 1 & v_{i} & \cdots & 0 & 0 & 0 & r_{i }
\end{bmatrix},\end{split}\]
where
\[v_i \leftarrow \frac{u_i}{c_i - l_i v_{i-1}},
r_i \leftarrow \frac{q_i - l_i r_{i-1}}{c_i - l_i v_{i-1}}.\]
This is repeated from \(i = 1\) to \(n - 2\):
src/tri_diagonal_solver.c
66for (size_t i = 1; i < size - 1; i++) {
67 // assume positive-definite system
68 // to skip zero-division checks
69 const double val = 1. / (c[i] - l[i] * v[i - 1]);
70 v[i] = val * u[i];
71 q[i] = val * (q[i] - l[i] * q[i - 1]);
72}
Basically we do the same thing for the last row, expect the treatment of the singularity; namely the denominator
\[c_{n-1} - l_{n-1} v_{n-2}\]
can be \(0\), or equivalently the rank of the matrix is \(n-1\) (degeneracy).
This is expected when dealing with Poisson equations subject to Neumann boundary conditions.
To take into account the singularity and to avoid the resulting zero divisions, we need a special treatment:
src/tri_diagonal_solver.c
74// NOTE: do the same thing but consider singularity (degeneracy)
75const double val = c[size - 1] - l[size - 1] * v[size - 2];
76if (DBL_EPSILON < fabs(val)) {
77 q[size - 1] = 1. / val * (q[size - 1] - l[size - 1] * q[size - 2]);
78} else {
79 // singular
80 q[size - 1] = 0.;
81}
Backward Substitution
After the forward sweep, we are left with the following system:
\[\begin{split}\begin{bmatrix}
1 & v_0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
0 & 1 & v_1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & 1 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 1 & v_{i-1} & 0 & \cdots & 0 & 0 & 0 \\
0 & 0 & 0 & \cdots & 0 & 1 & v_i & \cdots & 0 & 0 & 0 \\
0 & 0 & 0 & \cdots & 0 & 0 & 1 & \cdots & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 1 & v_{n-3} & 0 \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 1 & v_{n-2} \\
0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
\vdots \\
x_{i-1} \\
x_{i } \\
x_{i+1} \\
\vdots \\
x_{n-3} \\
x_{n-2} \\
x_{n-1}
\end{bmatrix}
=
\begin{bmatrix}
r_0 \\
r_1 \\
r_2 \\
\vdots \\
r_{i-1} \\
r_{i } \\
r_{i+1} \\
\vdots \\
r_{n-3} \\
r_{n-2} \\
r_{n-1}
\end{bmatrix}.\end{split}\]
The last row:
\[x_{n-1} = r_{n-1}\]
has already been computed in the forward sweep and thus we do not have to do anything.
For the rest, since we have
\[x_i = r_i - v_i x_{i+1},\]
\(x_i\) can be computed sequentially from \(i = n-2\) to \(i = 0\), which is known as the backward substitution.
src/tri_diagonal_solver.c
83for (size_t i = size - 2; ; i--) {
84 q[i] -= v[i] * q[i + 1];
85 if (0 == i) {
86 break;
87 }
88}
Note again that q
is shared among \(x_i\) (output) and \(q_i\) (input) in the code.
Note
Although l[0]
and u[n-1]
are not used in this case, l
, c
, and u
all have the length \(n\) for simplicity.
In particular these values are used for \(periodic systems <sherman_morrison>\).