






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The image registration problem and proposes a more general measure of image difference. It compares exhaustive search methods with coarse-fine strategies and presents an algorithm that uses spatial intensity gradients to find the best match between two images. The document also covers linear approximations and weighted averages to improve the accuracy of the algorithm.
What you will learn
Typology: Essays (university)
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Bruce D. Lucas Takeo Kanade
Computer Science Department Carnegie-Mellon University Pittsburgh, Pennsylvania 15213
Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is faster because it examines far fewer potential matches between the images than existing techniques. Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show show our technique can be adapted for use in a stereo vision system.
Image registration finds a variety of applications in computer vision, such as image matching for stereo vision, pattern recognition, and motion analysis. Untortunately, existing techniques for image registration tend to be costly. Moreover, they generally fail to deal with rotation or other distortions of the images.
In this paper we present a new image registration technique that uses spatial intensity gradient information to direct the search for the position that yields the best match. By taking more information about the images into account, this technique is able to find the best match between two images with far fewer comparisons of images than techniques which examine the possible positions of registration in some fixed order. Our technique takes advantage of the fact that in many applications the two images are already in approximate registration. This technique can be generalized to deal with arbitrary linear distortions of the image, including rotation. We then describe a stereo vision system that uses this registration technique, and suggest some further avenues for research toward making effective use of this method in stereo image understanding.
The translational image registration problem can be characterized as follows: We are given functionsF(x) and G(x) which give the respective pixel values at each location x in two images, wherex is a vector. We wish to find the disparity vectorh which minimizes some measure of the difference betweenF(x +h) andG(x), forx in some region of interestR. (See figure 1).
Figure 1 : The image registration problem
Typical measures of the difference betweenF(x +h) and G(x) are:
/
1 2 ε
x R
x R X R
/ /
ε
ε ε
2 1 2 2 1 2
We will propose a more general measure of image difference, of which both theL 2 norm and the correlation are special cases. TheL 1 norm is chiefly of interest as an inexpensive approximation to theL2 norm.
From Proceedings of Imaging Understanding Workshop , pp. 121-130 (1981).
An obvious technique for registering two images is to calculate a measure of the difference between the images at all possible values of the disparity vector h—that is, to exhaustively search the space of possible values ofh. This technique is very time consuming: if the size of the picture G(x) isNXN, and the region of possible values ofh is of size MXM, then this method requiresO(M^2 N^2 ) time to compute.
Speedup at the risk of possible failure to find the besth can be achieved by using a hill-climbing technique. This technique begins with an initial estimateh 0 of the disparity. To obtain the next guess from the current guessh (^) k , one evaluates the difference function at all points in a small (say, 3X3) neighborhood ofhk and takes as the next guessh (^) k+ that point which minimizes the difference function. As with all hill-climbing techniques, this method suffers from the problem of false peaks: the local optimum that one attains may not be the global optimum. This technique operates in O(M^2 N) time on the average, forM andN as above.
Another technique, known as the sequential similarity detection algorithm (SSDA) [2], only estimates the error for each disparity vectorh. In SSDA, the error function must be a cumulative one such as theL 1 orL 2 norm. One stops accumulating the error for the currenth under investigation when it becomes apparent that the currenth is not likely to give the best match. Criteria for stopping include a fixed threshold such that when the accumulated error exceeds this threshold one goes on to the nexth, and a variable threshold which increases with the number of pixels inR whose contribution to the total error have been added. SSDA leaves unspecified the order in which theh’s are examined.
Note that in SSDA if we adopt as our threshold the minimum error we have found among theh examined so far, we obtain an algorithm similar to alpha-beta pruning in min- max game trees [7]. Here we take advantage of the fact that in evaluating minh ∑ x d(x,h), whered(x,h) is the contribution of pixelx at disparityh to the total error, the ∑ x can only increase as we look at morex’s (more pixels).
Some registration algorithms employ a coarse-fine search strategy. See [6] for an example. One of the techniques discussed above is used to find the best registration for the images at low resolution, and the low resolution match is then used to constrain the region of possible matches examined at higher resolution. The coarse-fine strategy is adopted implicitly by some image understanding systems which work with a "pyramid" of images of the same scene at various resolutions.
It should be nated that some of the techniques mentioned so far can be combined because they concern orthogonal aspects of the image registration problem. Hill climbing and exhaustive search concern only the order in which the algorithm searches for the best match, and SSDA specifies
only the method used to calculate (an estimate of) the difference function. Thus for example, one could use the SSDA technique with either hill climbing or exhaustive search, in addition a coarse-fine strategy may be adopted.
The algorithm we present specifies the order in which to search the space of possibleh's. In particular, our technique starts with an initial estimate ofh, and it uses the spatial intensity gradient at each point of the image to modify the current estimate ofh to obtain anh which yields a better match. This process is repeated in a kind of Newton- Raphson iteration. If the iteration converses, it will do so in O(M^2 log N) steps on the average. This registration technique can be combined with a coarse-fine strategy, since is requires an initial estimate of the approximate disparityh.
In this section we first derive an intuitive solution to the one dimensional registration problem, and then we derive an alternative solution which we generalize to multiple dimensions. We then show how our technique generalizes to other kinds of registration. We also discuss implementation and performance of the algorithm.
4.1. One dimensional case In the one-dimensional registration problem, we wish to find the horizontal disparityh between two curvesF(x) and G(x) =F(x +h). This is illustrated in Figure 2.
Figure 2 : Two curves to be matched
Our solution to this problem depends on a linear approximation to the behavior ofF(x) in the neighborhood of x, as do all subsequent solutions in this paper. In particular, for smallh,
so that
Since lowpass filtered images can be sampled at lower resolution with no loss of information, the above observation suggests that we adopt a coarse-fine strategy. We can use a low resolution smoothed version of the image to obtain an approximate match. Applying the algorithm to higher resolution images will refine the match obtained at lower resolution.
While the effect of smoothing is to extend the range of convergence, the weighting function serves to improve the accuracy of the approximation, and thus to speed up the convergence. Without weighting, i.e. withw(x) = 1, the calculated disparityh 1 of the first iteration of (10) withf(x) = sinx falls off to zero as the disparity approaches one-half wavelength. However, withw(x) as in (5), the calculatian of disparity is much more accurate, and only falls off to zero at a disparity very near one-half wavelength. Thus withw(x) as in (5) convergence is faster for large disparities.
4.4. Implementation Implementing (10) requires calculating the weighted sums of the quantitiesF'G,F'F, and (F')^2 over the region of interest R. We cannot calculateF'(x) exactly, but for the purposes of this algorithm, we can estimate it by
and similarly forG'(x), where we choose ∆x appropriately small (e.g. one pixel). Some more sophisticated technique could be used for estimating the first derivatives, but in general such techniques are equivalent to first smoothing the function, which we have proposed doing for other reasons, and then taking the difference.
4.5. Generalization to multiple dimensions The one-dimensional registration algorithm given above can be generalized to two or more dimensions. We wish to minimize theL 2 norm measure of error:
E = (^) ∑ (^) x R ε [ F x ( + h ) − G x ( )] 2 ,
wherex andh aren-dimensional row vectors. We make a linear approximation analogous to that in (8),
where ∂/∂ x is the gradient operator with respect tox, as a column vector:
1 2
Using this approximation, to minimizeE, we set
≈ (^) ∑ + −
= (^) ∑ 2 + −
∂ ∂
∂ ∂
from which
T x
T
∑ ∑
−
1 ,
which has much the same form as the one-dimensional version in (9).
The discussions above of iteration, weighting, smoothing, and the coarse-fine technique with respect to the one- dimensional case apply to then-dimensional case as well. Calculating our estimate ofh in the two-dimensional case requires accumulating the weighted sum of five products ((G
4.6. Further generalizations Our technique can be extended to registration between two images related not by a simple translation, but by an arbitrary linear transformation, such as rotation, scaling, and shearing. Such a relationship is expressed by
G(x) =F(xA +h),
where A is a matrix expressing the linear spatial tranformation betweenF(x) and G(x). The quantity to be minimized in this case is
E = (^) ∑ x [ F xA ( + h ) − G x ( )] 2.
To determine the amount ∆A to adjustA and the amount ∆h to adjusth, we use the linear approximation
When we use this approximation the error expression again becomes quadratic in the quantities to be minimized with respect to. Differentiating with respect to these quantities and setting the results equal to zero yields a set of linear equations to be solved simultaneously.
This generalization is useful in applications such as stereo vision, where the two different views of the object will be diff-
erent views, due to the difference of the viewpoints of the cameras or to differences in the processing of the two images. If we model this difference as a linear transformation, we have (ignoring the registration problem tor the moment)
F(x) = αG(x) + β.
where α may be thought of as a contrast adjustment and β as a brightness adjustment. Combining this with the general linear transformation registration problem, we obtain
E = (^) ∑ (^) x [ F xA ( + h ) − ( α G x ( ) +β)] 2
as the quantity to minimize with respect to α, β,A, andh. The minimization of this quantity, using the linear approximation in equation (11), is straightforward. This is the general form promised in section 2. If we ignoreA, minimizing this quantity is equivalent to maximizing the correlation coefficient (see, for example, [3]); if we ignore α and β as well, minimizing this form is equivalent to minimizing theL (^2) norm.
In this section we show how the generalized registration algorithm described above can be applied to extracting depth information from stereo images.
5.1. The stereo problem The problem of extracting depth information from a stereo pair has in principle four components: finding objects in the pictures, matching the objects in the two views, determining the camera parameters, and determining the distances from the camera to the objects. Our approach is to combine object matching with solving for the camera parameters and the distances of the objects by using a form of the fast registration technique described above.
Techniques for locating objects include an interest operator [6], zero crossings in bandpass-filtered images [5], and linear features [1]. One might also use regions found by an image segmentation program as objects.
Stereo vision systems which work with features at the pixel level can use one of the registration techniques discussed above. Systems whose objects are higher-level features must use some difference measure and some search technique suited to the particular feature being used. Our registration algorithm provides a stereo vision system with a fast method of doing pixel-level matching.
Many stereo vision systems concern themselves only with calculating the distances to the matched objects. One must also be aware that in any real application of stereo vision the relative positions of the cameras will not be known with perfect accuracy. Gennery [4] has shown how to simul-
taneously solve for the camera parameters and the distances of objects.
5.2. A mathematical characterization The notation we use is illustrated in figure 3. Letc be the vector of camera parameters that describe the orientation and position of camera 2 with respect to camera 1's coordinate system. These parameters are azimuth, elevation, pan, tilt, and roll, as defined in [4]. Letx denote the position of an image in the camera 1 film plane of an object. Suppose the object is at a distancez from camera 1. Given the position in picture 1x and distancez of the object, we could directly calculate the positionp(x,z) that it must have occupied in three-space. We expressp with respect to camera 1's coordinate system so thatp does not depend on the orientation of camera 1. The object would appear on camera 2's film plane at a positionq(p,c) that is dependent on the object's position in three-spacep and on the camera parametersc. LetG(x) be the intensity value of pixelx in picture 1, and letF(q) the intensity value of pixelq in picture
Figure 3 : Stereo vision
5.3. Applying the registration algorithm First consider the case where we know the exact camera parametersc, and we wish to discover the distancez of an object. Suppose we have an estimate of the distancez. We wish to see what happens to the quality of our match betweenF andG as we varyz by an amount ∆z. The linear approximation that we use here is
∂ ∂
where
∂ ∂
∂ ∂
∂ ∂
∂ ∂
This equation is due to the chain rule of the gradient operator; ∂ q/∂ p is a matrix of partial derivatives of the components ofq with respect to the components ofp, and ∂ F/∂ q is the spatial intensity gradient of the imageF(q). To update our estimate ofz, we want to find the ∆z which
Acknowledgements
We would fike to thank Michael Horowitz, Richard Korf, and Pradeep Sindhu for their helpful comments on early drafts of this paper.
R e f e r e n c e s
Figure 4.
Figure 5.
Figure 6.
Figure 9.
Figure 10.