







































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Project report on face detection. and this is a project reprot
Typology: Study Guides, Projects, Research
Uploaded on 09/27/2021
3
(1)1 document
1 / 47
This page cannot be seen from the preview
Don't miss anything!
Ajay Bhojak Mr. Rishi Raj Vyas Assistant Professor
Engineering College Bikaner Computer Science and Engineering
This is to certify that the Project Report Titled “Face Emotion Recognition” has been submitted by “Ajay Bhojak” in partial in fulfilment for the requirement of the degree of Bachelor of Technology, Final Year for the academic Session 2020- This project work is carried out under the supervision and guidance of Mr. Rishi Raj Vyas (Assistant Professor) and he has been undergone the requisite work as prescribed by Rajasthan Technical University, Kota. Mr. Ajay Bhojak Roll No: 17EEBCS Branch: CSE E-mail: ajaybhojak2000@gmail.com ECB, Bikaner Date: 20 Jun 2021 Place: Bikaner
Chapter No. Content Page No. Abstract i Project Summery ii List of figures iii List of Tables iv List of Abbreviations v
References 28 Appendix 29-
[A] CNN Classifier Model & Results [B] Code Link [C] How to Run [D] List of Files
Project Title Face Emotion Recognition Project Team Members (Name with Register No) Ajay Bhojak 17EEBCS Guide Name/Designation Mr. Rishi Raj Vyas Assistant Professor, Department of Computer Science and Engineering Program Concentration Area Deep Learning Technical Requirements Python, CV2, Jupyter Notebook or Google Colab, Pycharm or any other IDE Engineering standards and realistic constraints in these areas Area Codes & Standards / Realistic Constraints Tick Economic This project is developed using open source software free of cost Sustainability The project ensures sustainability as after deployment it doesn’t need much change. Social This project is useful for a general audience of any age Ethical This project is designed keeping in mind the need for people of all age groups
ii
Fig No Title of Figures Page No Figure 1.1 A Model of CNN 2 Figure 1.2 Max Pooling 4 Figure 3.1 Training Phase 10 Figure 3.2 Testing Phase 10 Figure 3.3 Training, Testing and Validation Data distribution 11 Figure 3.4 Architecture of CNN 12 Figure 3.5 System Flowchart of FER 16 Figure 3.6 Sequence Diagram 18 Figure 3.7 Face Detection 19 Figure 3.8 Precision 20 Figure 3.9 Recall 20 Figure 3.10 F-Score 20 Figure 4.1 Training Loss Graph 22 Figure 4.2 Training Accuracy Graph 23
Table No Title of Table Page No iii
CNN Convolution Neural Network FACS Facial Action Coding System FER Facial Expression Recognition ReLU Rectified Linear Unit SIANN Space Invariant Artificial Neural Network LDA Linear Discriminant Analysis PCA Principle Component Analysis v
A Facial expression is the visible manifestation of the affective state, cognitive activity, intention, personality and psychopathology of a person and plays a communicative role in interpersonal relations. Human facial expressions can be easily classified into 7 basic emotions: happy, sad, surprise, fear, anger, disgust, and neutral. Our facial emotions are expressed through activation of specific sets of facial muscles. These sometimes subtle, yet complex, signals in an expression often contain an abundant amount of information about our state of mind. Automatic recognition of facial expressions can be an important component of natural human machine interfaces; it may also be used in behavioral science and in clinical practice. It have been studied for a long period of time and obtaining the progress recent decades. Though much progress has been made, recognizing facial expression with a high accuracy remains to be difficult due to the complexity and varieties of facial expressions [1]. On a day to day basics humans commonly recognize emotions by characteristic features, displayed as a part of a facial expression. For instance happiness is undeniably associated with a smile or an upward movement of the corners of the lips. Similarly other emotions are characterized by other deformations typical to a particular expression. Research into automatic recognition of facial expressions addresses the problems surrounding the representation and categorization of static or dynamic characteristics of these deformations of face pigmentation [2]. In machine learning, a convolutional neural network (CNN, or ConvNet) is a type of feedforward artificial neural network in which the connectivity pattern between its neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. The receptive fields of different neurons partially overlap such that they tile the visual field. The response of an individual neuron to stimuli within its receptive field can be approximated mathematically by 1
feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. Now, there will be an entire set of filters in each convolution layer (e.g. 20 filters), and each of them will produce a separate 2-dimensional activation map. The 2-dimensional convolution between image A and Filter B can be given as: 𝐶(𝑖,𝑗) = ∑ ∑ 𝐴(𝑚, 𝑛) ∗ 𝐵(𝑖 − 𝑚,𝑗 − 𝑛) 𝑁𝑎−1 𝑛=0 𝑀𝑎−1 𝑚=0 (2.1) where size of A is (Ma x Na), size of B is (Mb x Nb), 0 ≤ 𝑖 < 𝑀𝑎 + 𝑀𝑏 −1 ∧ 0 ≤ 𝑗 < 𝑁𝑎 + 𝑁𝑏 − 1 A filter convolves with the input image to produce a feature map. The convolution of another filter over the same image gives a different feature map. Convolution operation captures the local dependencies in the original image. A CNN learns the values of these filters on its own during the training process (although parameters such as number of filters, filter size, architecture of the network etc. still needed to specify before the training process). The more number of filters, the more image features get extracted and the better network becomes at recognizing patterns in unseen images. The size of the Feature Map (Convolved Feature) is controlled by three parameters Depth: Depth corresponds to the number of filters we use for the convolution operation. Stride: Stride is the size of the filter, if the size of the filter is 5x5 then stride is 5. Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around the border, so that filter can be applied to bordering elements of input image matrix. Using zero padding size of the feature map can be controlled. Rectified Linear Unit: An additional operation called ReLU has been used after every Convolution operation. A Rectified Linear Unit (ReLU) is a cell of a neural network which uses the following activation function to calculate its output given x: R(x) = Max(0,x) (2.2) Using these cells is more efficient than sigmoid and still forwards more information compared to binary units. When initializing the weights uniformly, half of the weights are negative. This helps creating a sparse feature representation. Another positive aspect is the relatively cheap computation. No exponential function has to be calculated. This function also prevents the 3
vanishing gradient error, since the gradients are linear functions or zero but in no case nonlinear functions. Pooling (sub-sampling) Spatial Pooling (also called subsampling or down sampling) reduces the dimensionality of each feature map but retains the most important information. Spatial Pooling can be of different types: Max, Average, Sum etc. In case of Max Pooling, a spatial neighborhood (for example, a 2×2 window) is defined and the largest element is taken from the rectified feature map within that window. In case of average pooling the average or sum of all elements in that window is taken. In practice, Max Pooling has been shown to work better. Max Pooling reduces the input by applying the maximum function over the input xi. Let m be the size of the filter, then the output calculates as follows: M(𝑥𝑖) = 𝑚𝑎𝑥 {𝑥𝑖+𝑘,+𝑙 |𝑘| ≤ 𝑚/2 , |𝑙| ≤ 𝑚/2 𝑘, 𝑙𝜖 ℕ} Fig 1.2: Max Pooling The function of Pooling is to progressively reduce the spatial size of the input representation. In particular, pooling Makes the input representations (feature dimension) smaller and more manageable Reduces the number of parameters and computations in the network, therefore, controlling over-fitting Makes the network invariant to small transformations, distortions and translations in the input image (a small distortion in input will not change the output of Pooling. 4
creation and way of creating the system for accurate and reliable facial expression recognition system. As a result I am highly motivated to develop a system that recognizes facial expression and track one person’s activity.
Human emotions and intentions are expressed through facial expressions and deriving an efficient and effective feature is the fundamental component of facial expression system. Face recognition is important for the interpretation of facial expressions in applications such as intelligent, man-machine interface and communication, intelligent visual surveillance, teleconference and real-time animation from live motion images. The facial expressions are useful for efficient interaction Most research and system in facial expression recognition are limited to six basic expressions (happy, sad, anger, disgust, netural, fear, surprise). It is found that it is insufficient to describe all facial expressions and these expressions are categorized based on facial actions [7]. Detecting face and recognizing the facial expression is a very complicated task when it is a vital to pay attention to primary components like: face configuration, orientation location where the face is set.
The scope of this system is to tackle with the problems that can arise in day to day life. Some of the scopes are: 1.The system can be used to detect and track a user’s state of mind.
to estimate general parameters for movement and displacement. Therefore, ending up with robust decisions for facial actions under these varying conditions becomes to be difficult. Rather than tracking spatial points and using positioning and movement parameters that vary within time, color (pixel) information of related regions of face are processed in Appearance Based Parameterizations; in order to obtain the parameters that are going to form the feature vectors. Different features such as Gabor, Haar wavelet coefficients, together with feature extraction and selection methods such as PCA, LDA, and Adaboost are used within this framework. For classifier problem we use algorithms like Machine learning, Neural Network, Support Vector Machine, Deep learning, Naive Bayes. The formation of histogram by using any of facial feature representation will use Support Vector Machine (SVM) for expression recognition. SVM builds a hyperplane to separate the high dimensional space. An ideal separation is achieved when the distance between the hyper plane and the training data of any class is the largest. 9
The facial emotion recognition system is implemented using convolutional neural network. Facial images are classified into seven facial expression categories namely Anger, Disgust, Fear, Happy, Sad, Surprise and Neutral. Kaggle dataset is used to train and test the classifier. The block diagram of the system is shown in following figures Fig 3.1: Training Phase Fig 3.2: Testing Phase During training, the system received a training data comprising grayscale images of faces with their respective expression label and learns a set of weights for the network. The training step took as input an image with a face. Thereafter, an intensity normalization is applied to the image. The normalized images are used to train the Convolutional Network. To ensure that the training performance is not affected by the order of presentation of the examples, validation dataset is used to choose the final best set of weights out of a set of trainings performed with samples presented in different orders. The output of the training step is a set of weights that achieve the best result with the training data. During test, the system received a grayscale image of a face from test dataset, and output the predicted expression by using the final network weights learned during training. Its output is a single number that represents one of the seven basic expressions. 10