



















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
any thing but not thissss.. only for points
Typology: Exercises
1 / 59
This page cannot be seen from the preview
Don't miss anything!
Master’s Thesis in Numerical Analysis (30 ECTS credits) at the Scientific Computing International Master Program Royal Institute of Technology year 2007 Supervisors at CSC were Alireza Tavakoli Targhi and Babak Rasolzadeh Examiner was Axel Ruhe
TRITA-CSC-E 2007: 139 ISRN-KTH/CSC/E-- 07 / 139 --SE ISSN- 1653 - 5715
Royal Institute of Technology School of Computer Science and Communication KTH CSC SE- 1 00 44 Stockholm, Sweden URL: www.csc.kth.se
Acknowledgement
This research would not have been started at all without the great response in the early days from Alireza Tavakoli. Without the encouraging support from my super- visors, Alireza Tavakoli and Babak Rasolzadeh at the Royal Institute of Technology (KTH-CVAP), there would have been very little to write. To this end, the great interest shown by my examiner Axel Ruhe at the Royal Institute of Technology (KTH-NA) has been encouraging.
1.1 Face Detection and Recognition
Finding faces in an arbitrary scene and successfully recognizing them have been an active topics in Computer Vision for decades. A general statement of the face recognition problem (in computer vision) can be formulated as follows: Given still or video images of a scene, identify or verify one or more persons in the scene using a stored database of faces. Although face detection and recognition is still an unsolved problem meaning there is no 100% accurate face detection and recognition system, however during the past decade, many methods and techniques have been gradually developed and applied to solve the problem. Basically, there are three types of methods in automatic face recognition: veri- fication, identification and watch-list. In the verification method, a comparison of only two images is considered. The comparison is positive if the two images are matched. In the identification method, more than one comparison should be done to return the closest match of the input image. The watch-list method works sim- ilar to the identification method with a difference that the input face can also be rejected (no match). The method presented in this thesis consists of three steps: skin detection, face detection, and face recognition. The novelty of the proposed method is using a skin detection filter as a pre-processing step for face detection. A scheme of main tasks is shown in Figure 1.1.
CHAPTER 1. INTRODUCTION
Figure 1.1. General scheme for our system
is the output of step 1. In the case of no skin detecting, the entire original image will be used as input for face detection.
1.2 Review of Recent Work
A primitive face detection method can be finding faces in images with controlled background by using images with a plain monocolor background, or using them with a predefined static background. The drawback of such methods is that removing the background will always yield face boundaries. When color exists, another method can be finding faces with the help of color. In case of having access to color images, one might use the typical skin color to find face segments. The process is carried out in two main steps. The first step is skin filtering by detecting regions which are likely to contain human skin in the color image. The result of this step followed by thresholding is a binary skin map which shows us the skin regions. The second step is face detection by extracting informa- tion from regions which might indicate the location of a face in the image by taking the marked skin regions (from first step) and removing the darkest and brightest re- gions from the map. The removed regions have been shown through empirical tests to correspond to those regions in faces which are usually the eyes and eyebrows, nostrils, and mouth. Thus, the [1] skin detection is performed using a skin filter which relies on color and texture information. The face detection is performed on a
CHAPTER 1. INTRODUCTION
hmod ( A, B ) =
∑
a∈A
M inb∈B ‖a − b‖.
By taking the average of the single point distances, this version decreases the im- pact of outliers, making it more suitable for pattern recognition purposes. Now let A and B be the image and the object respectively, the goal is to find the transforma- tion parameter such that HD between the transformed model and A is minimized. The detection optimization problem can be formulated as:
dp− = M inp∈P H ( A, Tp ( B ))
When Tp ( B ) is the transformed model, h ( Tp ( B ) , A ) and h ( A, Tp ( B )) are the for- ward and reverse distance, respectively. The value of dp− is the distance value of the best matching position and scale. The implemented face detection system consists of a coarse detection and a refinement phase, containing segmentation and localiza- tion step. Coarse Detection: An AOI (Area Of Interest) with preset width/height rate is defined for an incoming image. This AOI will then be resampled to a fixed size which is independent of the dimension of the image.
Refinement phase: given a dp− , a second AOI is defined covering the expected area of the face. This AOI is resampled from the original image resulting in a greyscale image of the face area. Then segmentation and localization are like pre- vious phase with modified box reverse distance hbox ( A′−, Tp′ ( B′ )). Validation is based on the distance between the expected and the estimated eye positions: the so called (normalized) relative error with definition,
dbox = M ax ( dl, dr ) ‖Cl − Cr‖ ,
where dl and dr are the distance between the true eye centers and the estimated positions. In [3], a face is found if deye < 0_._ 25. Two different databases are utilized. The first one contains 1180 color images of 295 test persons (360 × 288). The
1.3. THESIS OUTLINE
second one contains 1521 images of 23 persons with larger variety of illumination, background and face size (384 x 288). The value of 98.4% on the first test set and 91.8% on the second one is obtained as the robustness of the method. The average processing time per frame on a PIII 850 M Hz system is 23_._ 5 ms for the coarse detection step and an additional 7.0 ms for the refinement step, which allows the use in real time video applications ( > 30 f ps ) [3]. A major problem of the Hausdorff Distance method is the actual creation of a proper face model ( Tp ( B )). While a simple "hand-drawn" model will be sufficient for the detection of simple objects, a general face model must cover the broad variety of different faces. In order to optimize the method, finding of a well-suited model for HD based face localization can be formulated as a discrete global optimization problem is interested. For this issue The General Algorithm (GA) is employed as a standard approach for multi-dimensional global optimization problems, namely the simple Genetic Algorithm (SGA) described by Goldberg [4]. A Genetic Algorithm (GA) approach is presented for obtaining a binary edge model that allows localization of a wide variety of faces with the HD method. The GA performs better when starting from scratch than from a hand-drawn model. Three different initializations of the population are tested in [3]: blank model, average edge model, and hand-drawn model. An improvement from 60 % to 90% is achieved for localization performance. Therefore, GA is a powerful tool that can help in finding an appropriate model for face localization. Face localization can be improved by a multi-step detection approach that uses more than one model in different grades of details. Each of these models can then be optimized separately. This does not only speed up the localization procedure but also produces more exact face coordinates [6].
1.3 Thesis Outline
The rest of the thesis consists of three main parts, namely color, face detection and face recognition. Each single part has been described in a seperate chapter. In Chapter 2, the color has been discussed. Chapter 3 and 4 explain basic principles of face detection and recognition respectively. In Chapter 5, the utilizing method in this work has been discussed including thee main parts. The experimental evalua- tion is then presented in Chapter 6, and finally the conclusions and proposed future works in Chapter 7.
Skin color has proved to be a useful and robust cue for face detection. Image content filtering and image color balancing applications can also benefit from automatic detection of skin regions in images. Numerous techniques for skin color modeling and recognition have been proposed in the past years. The face detection methods, that use skin color as a detection cue have gained strong popularity among other techniques. Color allows fast processing and is highly robust to geometric variations of the skin pattern. The experience suggests that human skin has a characteristic color, which is easily recognized by humans. So trying to employ skin color modeling for face detection was an idea suggested both by task properties and common sense. In this paper, we discuss pixel-based skin detection methods, which classify each pixel as skin or non-skin individually. Our goal in this work is to evaluate two most important color spaces and try to find out and summarize the advantages.
2.1 Skin modeling
The major goal of skin modeling is to discriminate between skin and non-skin pixels. This is usually accomplished by introducing a metric, which measures distances (in general sense) of pixel color to skin tone. The type of this metric is defined by the skin color modeling methods. A classification of skin-color modeling is accomplished by [11]. In this work, the Gaussian model will be discussed.
The Gaussian model is the most popular parametric skin model. The model perfor- mance directly depends on the representativeness of the training set, which is going to be more compact for certain applications in skin model representation.
Skin color distribution can be modeled by an elliptical Gaussian joint probability density function (pdf), defined as:
CHAPTER 2. COLOR: AN OVERVIEW
P ( c|skin ) =
2 π| ∑ s |^
12^ .e^
− 1 2 ( c−μs ) T^
∑ − 1 s ( c−μs ).
Here, c is a color vector and μs and Σ s are the distribution parameters (mean vector and covariance matrix respectively). The model parameters are estimated from the training data by:
μs =
n
∑^ n
j =
cj and
∑ s
n − 1
∑^ n
j =
( cj − μs )( cj − μs ) T^ ,
when j = 1 ,... , n , and n is the total number of skin color samples cj. The P ( c|skin ) probability can be used directly as the measure of how "skin-like" the c color is [16], or alternatively, the Mahalanobis distance from the c color vector to mean vector μs , given the covariance matrix σs can serve for the same purpose [17]:
λs ( c ) = ( c − μs ) T
∑^ −^1 s
( c − μs )
A more sophisticated model, capable of describing complex shaped distributions is the Gaussian mixture model. It is the generalization of the single Gaussian, the pdf in this case is:
P ( c|skin ) =
∑^ k
i =
πi.Pi ( c|skin ) ,
where k is the number of mixture components, Pi are the mixing parameters, obeying the normalization constraint
∑ k i =1 πi^ = 1, and^ Pi ( c|skin )^ are Gaussian pdfs, each with their own mean and covariance matrix. Model training is performed with a well-known iterative technique called the Expectation Maximization (EM) algo- rithm, which assumes the number of components k to be known beforehand. The details of training Gaussian mixture model with EM can be found, see for example in [18]. The classification with a Gaussian mixture model is done by comparing the p ( c|skin ) value to some threshold. The choice of the component number k is important here. The model needs to explain the training data reasonably well with the given model on one hand, and avoid data over-fitting on the other. The number of components used by different researchers varies significantly: from 2 in [18] to 16 in [19]. A bootstrap test for justification of k = 2 hypothesis was performed in [20]. In [17], k = 8 was chosen as a "good compromise between the accuracy of of estimation of the true distributions and the computational load for thresholding".
CHAPTER 2. COLOR: AN OVERVIEW
and the computation of "brightness" (lightness, value), which conflicts badly with the properties of color vision.
H = arccos
(^12) (( R−G )+( R−B )) √ (( R−G )^2 +( R−B )( G−B )) S = 1 − (^3) R ( R,G,B + G + B ) V = 13 ( R + G + B )
An alternative way of Hue-Saturation computation using log opponent values was introducing additional logarithmic transformation of RGB values aimed to re- duce the dependance of chrominance on the illumination level. The polar coordinate system of Hue-Saturation spaces, resulting in cyclic nature of the color space makes it inconvenient for parametric skin color models that need tight cluster of skin col- ors for best performance. Here, different representations of Hue-Saturation using Cartesian coordinates can be used [8]:
X = S cos H ; Y = S sin H.
Y CrCb is an encoded nonlinear RGB signal, commonly used by European televi- sion studios and for image compression work. Color is represented by luma (which is luminance, computed from nonlinear RGB [22]), constructed as a weighted sum of the RGB values, and two color difference values Cr and Cb that are formed by subtracting luma from the red and blue components in RGB [9].
Cr = R − Y Cb = B − Y
The simplicity of the transformation and explicit separation of luminance and chrominance components makes this color space attractive for skin color modelling.
One of the major questions in using skin color in skin detection is how to choose a suitable color space. A wide variety of different color spaces has been applied to the problem of skin color modelling. From a recent research [8], a briefly review of the most popular color spaces and their properties is presented. For real world applications and dynamic scenes, color spaces that separate the chrominance and luminance components of color are typically preferable. The main reason for this is that chrominance-dependent components of color are considered, and increased
robustness to illumination changes can be achieved. Since for example, HSV seems to be a good alternative, but HSV family presents lower reliability when the scenes are complex and they contain similar colors such as wood textures [10]. Moreover, in order to transform a frame, it would be necessary to change each pixel to the new color space which can be avoided if the camera provides RGB images directly, as most of them do. Therefore, for the purpose of this thesis, a choice between RGB and Y CrCb color spaces is considered.
Face detection is a useful task in many applications such as video conferencing, human-machine interfaces, Content Based Image Retrieval (CBIR), surveillance systems etc. It is also often used in the first step of automatic face recognition by determining the presence of faces (if any) in the input image (or video sequence). The face region including its location and size is the output of a face detection step. In general, the face recognition problem (in computer vision) can be formulated as follows: Given still or video images of a scene, determine the presence of faces and then identify or verify one or more faces in the scene using a stored database of faces. Thus, the accuracy of a face recognition system is depended on the accuracy of the face detection system. But, the variability of the appearance in the face patterns makes it a difficult task. A robust face detector should be able to find the faces regardless of their number, color, positions, occlusions, orientations, an facial expressions, etc. Although this issue is still an unsolved problem, many methods have been proposed for detecting faces. Additionally, color and motion, when avail- able, may be characteristics in face detection. Even if the disadvantages of color based methods like sensitivity on varying lighting conditions make them not as ro- bust methods, they can still be easily used as a pre-processing step in face detection.
Most of the robust face detecting methods can be classified into two main cat- egories: feature based and image based techniques. The feature based techniques make explicit use of face knowledge. They start by deriving low-level features and then apply knowledge based analysis. The image based techniques rely on a face in 2D. By using training schemes and learning algorithms the data can be classified into face or non-face groups. Here, a brief summary of feature and image based techniques will be presented.
3.1 Feature Based Techniques
As the title declared the focus in this class is on extracting facial features. The foun- dation of face detection task in feature based methods is the facial feature searching problem. Even these techniques are quite old and had been active up to the middle 90’s. However, some feature extraction is still being utilized e.g. facial features using Gabor filters. The advantages of the feature based methods are their relative insensitivity to illumination conditions, occlusions and viewpoint whereas complex analysis (because computationally heavy) and the difficulties with low-quality im- ages are the main drawbacks of these methods.
3.2 Image Based Techniques
Basically, these methods scan an input image at all possible locations and scale and then classify the sub-windows either as face or non-face. In fact, the techniques rely on training sets to capture the large variability in facial appearances instead of extracting the visual facial features (i.e. previous techniques). Since the face detection step will be strictly affected on the performance of the whole system, a robust face detector should be employed. The accuracy and speed up of the face detectors have been studied in previous works. In this thesis, the chosen face detector is an efficient detector scheme presented by Viola and Jones (2001) using Haar-like features and Adaboost as training algorithm. In the next section, a brief description of the chosen scheme is given.
3.3 Face Detection based on Haar-like features and
AdaBoost algorithm
This technique relies on the use of simple Haar-like features with a new image rep- resentation (integral image). Then AdaBoost is used to select the most prominent features among a large number of extracted features. Finally, a strong classifier from boosting a set of weak classifiers would be extracted. This approach has proven to be an effective algorithm to visual object detection and also one of the first real- time frontal-view face detectors. The effectiveness of this approach is based on four particular facts[12].