















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The optimality and convergence properties of the conjugate gradients (cg) algorithm, which is an iterative process for minimizing quadratic functions. The cg algorithm is a popular method for solving large-scale linear systems and can be interpreted as an optimization algorithm. How the cg iteration minimizes the quadratic function φ(x) over the krylov subspace kn at each step, and how the choice of the search direction pn ensures that it minimizes the function over all of kn. The document also discusses the analogy between the cg iteration and the lanczos iteration, and the connection between krylov subspace iteration and polynomials of matrices. The rate of convergence of the cg iteration is determined by the location of the spectrum of a, and the document provides two theorems that give insights into the convergence behavior of the cg iteration for matrices with distinct eigenvalues or large 2-norm condition numbers.
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
Ming-Hsuan Yang
Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang
Lecture 21
Conjugate gradient Convergence rate of conjugate gradient Preconditioning
It implies that ‖e‖^2 A = e> n Aen + (∆x)>A(∆x) Only the second term depends on ∆x and since A is positive definite, the first term is larger or equal to 0 The second term is 0 if and only if ∆x = 0 , i.e., xn = x Thus ‖e‖A is minimal if and only if xn = x as claimed The monotonicity property is a consequence of the inclusion Kn ⊆ Kn+1, and since Kn is a subset of IRm^ of dimension n as long as convergence has not yet been achieved, convergence must be achieved in at most m steps That is, each step of conjugate direction cuts down the error term component by component
The guarantee that the CG iteration converges in at most m steps is void in floating point arithmetic For arbitrary matrices A on a real computer, no decisive reduction in ‖en‖A will necessarily be observed at all when n = m In practice, however, CG is used not for arbitrary matrices but for matrices whose spectra are well behaved (partially due to preconditioning) that convergence to a desired accuracy is achieved for n m The theoretical exact convergence at n = m has no relevance to this use of the CG iteration in scientific computing
Cannot use ‖e‖A or ‖e‖^2 A as neither can be evaluated without knowing x∗ On the other hand, given A and b and x ∈ IRm, the quantity
φ(x) =
x>Ax − x>b
can certainly be evaluated as
‖en‖^2 A = e> n Aen = (x∗ − xn)>A(x∗ − xn) = x> n Axn − 2 x> n Ax∗ + x>∗ Ax∗ = x> n Axn − 2 x> n b + x>∗ b = 2φ(xn) + constant
Like ‖e‖^2 A, it must achieve its minimum uniquely at x = x∗
The CG iteration can be interpreted as an iterative process for minimizing the quadratic function φ(x) of x ∈ IRm At each step, an iterate xn = xn− 1 + αnpn− 1 is computed that minimizes φ(x) over all x in the one dimensional space xn− 1 + 〈pn− 1 〉 It can be readily confirmed that the formula
αn =
r> n− 1 rn− 1 p> n− 1 Apn− 1
ensures αn is optimal in the sense among all step lengths α What makes the CG iteration remarkable is the choice of the search direction pn− 1 , which has the special property that minimizing φ(x) over xn− 1 + 〈pn− 1 〉 actually minimizes it over all of Kn
Connection between Krylov subspace iteration and polynomials of matrices The Arnoldi and Lanczos iterations solve the Arnoldi/Lanczos approximation problem Find pn^ ∈ Pn^ such that ‖pn(A)b‖ = minimum The GMRES iteration solves the GMRES approximation problem Find pn ∈ Pn such that ‖pn(A)b‖ = minimum For CG, the appropriate approximation problem involves the A-norm of the error Find pn ∈ Pn such that ‖pn(A)e 0 ‖A = minimum (1) where e 0 denotes the initial error e 0 = x∗ − x 0 = x∗, and Pn is again defined as GMRES (i.e., the set of polynomials p of degree ≤ n with p(0) = 1)
If the CG iteration has not already converged before step n (i.e., rn− 1 6 = 0 ), then ‖pn(A)e 0 ‖A has a unique solution pn ∈ Pn, and the iterate xn has error en = pn(A)e 0 for this same polynomial pn. Consequently, we have ‖en‖A ‖e 0 ‖A = inf p∈Pn
‖p(A)e 0 ‖A ‖e 0 ‖A ≤ inf p∈Pn max λ∈Λ(A) |p(λ)|
where Λ(A) denotes the spectrum of A
From theorem in the last lecture, it follows that en = p(A)e 0 for some p ∈ Pn The equality is a consequence of this and monotonic convergence
First, we suppose that the eigenvalues are perfectly clustered but assume nothing about the locations of these clusters
If A has only n distinct eigenvalues, then the CG iteration converges in at most n steps
This is a corollary of (1), since a polynomial p(x) =
∏n j=1(1^ −^ x/λj^ )^ ∈^ Pn^ exists that is zero at any specified set of n points {λj } At the other extreme, suppose we know nothing about any clustering of the eigenvalues but only that their distances from the origin vary by at most a factor κ ≥ 1 In other words, suppose we know only the 2-norm condition number κ = λmax /λmin, where λmax and λmin are the extreme eigenvalues of A
Let the CG iteration be applied to a symmetric positive definite matrix problem Ax = b, where A has 2-norm condition number κ. Then the A-norm of the errors satisfy
‖en‖A ‖e 0 ‖A
κ + 1 √ κ − 1
)n
κ + 1 √ κ − 1
)−n] ≤ 2
κ − 1 √ κ + 1
)n
Since (^) √ κ − 1 √ κ + 1
k as κ → ∞, it implies that if κ is large but not too large, convergence to a specified tolerance can be expected in O(
κ) iterations An upper bound, and convergence may be faster for special right hand sides or if the spectrum is clustered
For τ = 0.01, A has 3,092 nonzero entries and κ ≈ 1 .06, the CG convergence takes place in 9 steps For τ = 0.05, A has 13,062 nonzero entries with κ ≈ 1 .83, and convergence takes place in 19 steps For τ = 0.1, A has 25,526 nonzero entries with κ ≈ 10 .3 and the process converges in 20 steps For τ = 0.2 with 50,834 nonzero entries, there is no convergence at all For this example, the CG beats Cholesky factorization by a factor of about 700 in terms of operation counts
The convergence of a matrix iteration depends on the properties of the matrix - the eigenvalues, the singular values, or sometimes other information In many cases, the problem of interest can be transformed so that the properties of the matrix are improved drastically The process of preconditioning is essential to most successful applications of iterative methods
Two extreme cases: I (^) If M = A, then (4) is the same as (2), and nothing has been gained I (^) If M = I , then (3) is the same as (2), and then it is a trivial solution Between these two extremes lie the useful preconditioners, I (^) structured enough (4) can be solved quickly I (^) but close enough to A in some sense that an iteration for (3) converges more quickly than an iteration for (2) What does it mean for M to be “close enough” to A? If the eigenvalues of M−^1 A are close to 1 and ‖M−^1 A − I ‖ 2 is small, then any of the iterations we have discussed can be expected to converge quickly However, preconditioners that do not satisfy such a strong condition may also perform well A simple rule of thumb: preconditioner M is good if M−^1 A is not too far from normal and its eigenvalues are clustered
What we have described may be more precisely terms as left preconditioner Another idea is to transform Ax = b into AM−^1 y = b with x = M−^1 y in which case M is called a right preconditioner If A is Hermitian positive definite, then it is usual to preserve this property in preconditioning Suppose M is also Hermitian positive definite, with M = CC ∗^ for some C , then (2) is equivalent to
[C −^1 AC −∗]C ∗x = C −^1 b
The matrix in brackets is Hermitian positive definite, so this equation can be solved by conjugate gradient or related iterations At the same time, since C −^1 AC −∗^ is similar to C −∗C −^1 A = M−^1 A, it is enough to examine the eigenvalues of the non-Hermitian matrix M−^1 A to investigate convergence