Skip to content

Why Krylov Subspace?

This note serves to remind me why the Krylov subspace is incredbly popular when it comes to numerical linear algebra.

We want to solve the linear system Ax=bAx = b where AA is n×nn\times n and bRnb\in\mathbb{R}^n. Let xx^* be a solution to this linear system. The goal is to claim that xK=defspan({b,Ab,A2b,,An1b})x^*\in \mathcal{K}\stackrel{\mathrm{def}}{=}\text{span}(\{b, Ab, A^2b,\ldots, A^{n-1}b\}).

To prove this statement, we separate it into two cases:

  • Case 1: AA is invertible
  • Case 2: AA is not invertible

But by the end of this note, we will realize that the argument can be simply reduced to Case 2. The reason why I still put the Case 1 here is because it makes the whole thought more intuitive to me.


Case 1: AA is invertible

In this case, x=A1bx^* = A^{-1}b. Let f(x)f(x) be the characteristic polynomial of AA, by the Cayley-Hamilton theorem, AA satisfies f(x)f(x), i.e, f(A)=Of(A) = O, therefore xx^* can be expressed as the linear combination of a set of vectors b,Ab,,An1bb, Ab,\ldots, A^{n-1}b, implying that xKx^*\in\mathcal{K}.

Case 2: AA is not invertible

In this case, we still want to proceed with the similar trick we've used in Case 1. So it urges me to express xx^* as x=Abx^* = A^{\dagger}b where AA^{\dagger} denotes the pseudo-inverse of AA. Let A=UΣVA=U\Sigma V^* be the singular value decomposition (SVD) of AA where UU and VV and unitary, Σ\Sigma the matrix whose diagonal entries σ1,,σr\sigma_1,\ldots, \sigma_r (rr being the rank of AA) the singular values of AA sorted in descending order. Then A=VΣ1UA^{\dagger}=V\Sigma^{-1}U^*. Write b=jnβjujb=\sum_{j\le n}\beta_j u_j where {uj}\{u_j\} are the columns of UU. Rewerite xx^* in vector forms ({vj}\{v_j\} the columns of VV below), we have

x=VΣ1Ub=(jrσj1vjuj)(jnβjuj)=jrβjσj1vjx^*=V\Sigma^{-1}U^*b=\left(\sum_{j\le r}\sigma_j^{-1}v_ju_j^*\right)\left(\sum_{j\le n}\beta_j u_j\right)=\sum_{j\le r}\beta_j\sigma_j^{-1}v_j

What is Akb?A^kb?

Akb=(jrσjkujvj)b=jrσjk(vjb)ujA^kb = \left(\sum_{j\le r}\sigma_j^k u_j v_j^*\right)b=\sum_{j\le r}\sigma_j^k(v_j^*b)u_j

To prove that xKx^*\in\mathcal{K}, it suffices to show that βjσj1vj=cσjk(vjb)uj\beta_j\sigma_j^{-1}v_j = c\sigma_j^k(v_j^*b)u_j for some cRc\in\mathbb{R} for every jrj\le r. Or to simplify the notation, vj=cujv_j = cu_j for some cc, but it is easily found as we can dot product both sides with uju_j and evaluate cc.

We can immediately recognize that Case 2 is actually a generalization of Case 1 since every matrix admits the singular value decomposition.

We are now gaining more familiarity with the Krylov subspaces, and we are able to absorb the ideas of the Conjugate Gradient, MINRES, and GMRES algorithms more easily.