Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Project report on face detection., Study Guides, Projects, Research of Computer Science

Project report on face detection. and this is a project reprot

Typology: Study Guides, Projects, Research

2020/2021

Uploaded on 09/27/2021

unknown user
unknown user 🇮🇳

3

(1)

1 document

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A
Project Report
On
Face Emotion Recognition
Submitted
in partial fulfillment
for the award of the Degree of
Bachelor of Technology
in Department of Computer Science Engineering
Submitted By: Submitted To:-
Ajay Bhojak Mr. Rishi Raj Vyas
Assistant Professor
Engineering College Bikaner
(Rajasthan Technical University, Kota)
Session 2020-21
Engineering College
Bikaner Computer
Science and Engineering
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Project report on face detection. and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

A

Project Report

On

Face Emotion Recognition

Submitted

in partial fulfillment

for the award of the Degree of

Bachelor of Technology

in Department of Computer Science Engineering

Submitted By: Submitted To:-

Ajay Bhojak Mr. Rishi Raj Vyas Assistant Professor

Engineering College Bikaner

(Rajasthan Technical University, Kota)

Session 2020-

Engineering College Bikaner Computer Science and Engineering

CERTIFICATE

This is to certify that the Project Report Titled “Face Emotion Recognition” has been submitted by “Ajay Bhojak” in partial in fulfilment for the requirement of the degree of Bachelor of Technology, Final Year for the academic Session 2020- This project work is carried out under the supervision and guidance of Mr. Rishi Raj Vyas (Assistant Professor) and he has been undergone the requisite work as prescribed by Rajasthan Technical University, Kota. Mr. Ajay Bhojak Roll No: 17EEBCS Branch: CSE E-mail: ajaybhojak2000@gmail.com ECB, Bikaner Date: 20 Jun 2021 Place: Bikaner

TABLE OF CONTENTS

Chapter No. Content Page No. Abstract i Project Summery ii List of figures iii List of Tables iv List of Abbreviations v

  1. Introduction 1.1 Motivation 1.2 Problem Statement 1.3 Objective 1.4 Scope & Applications
  1. Literature Review 8-
  2. Methodology 3.1 Dataset 3.2 Architect of CNN 3.2.1 Input layer 3.2.2 Convolution & Pooling (Conv Pool) 3.2.3 Fully Connected Layer 3.2.4 Output Layer 3.3 System Flowchart of FER 3.4 Emoji used 3.5 Sequence Diagram 3.6 Phases in FER 3.6.1 Face Detection 3.6.2 Feature Extraction 3.6.3 System Evaluation 3.7 Software & Hardware Requirements
  1. Result and Analysis 22-
  2. Conclusion and Future Scope 5.1 Conclusion 5.2 Future Scope

References 28 Appendix 29-

[A] CNN Classifier Model & Results [B] Code Link [C] How to Run [D] List of Files

PROJECT SUMMARY

Project Title Face Emotion Recognition Project Team Members (Name with Register No) Ajay Bhojak 17EEBCS Guide Name/Designation Mr. Rishi Raj Vyas Assistant Professor, Department of Computer Science and Engineering Program Concentration Area Deep Learning Technical Requirements Python, CV2, Jupyter Notebook or Google Colab, Pycharm or any other IDE Engineering standards and realistic constraints in these areas Area Codes & Standards / Realistic Constraints Tick Economic This project is developed using open source software free of cost Sustainability The project ensures sustainability as after deployment it doesn’t need much change. Social This project is useful for a general audience of any age Ethical This project is designed keeping in mind the need for people of all age groups

LIST OF FIGURES

ii

Fig No Title of Figures Page No Figure 1.1 A Model of CNN 2 Figure 1.2 Max Pooling 4 Figure 3.1 Training Phase 10 Figure 3.2 Testing Phase 10 Figure 3.3 Training, Testing and Validation Data distribution 11 Figure 3.4 Architecture of CNN 12 Figure 3.5 System Flowchart of FER 16 Figure 3.6 Sequence Diagram 18 Figure 3.7 Face Detection 19 Figure 3.8 Precision 20 Figure 3.9 Recall 20 Figure 3.10 F-Score 20 Figure 4.1 Training Loss Graph 22 Figure 4.2 Training Accuracy Graph 23

LIST OF TABLES

Table No Title of Table Page No iii

CNN Convolution Neural Network FACS Facial Action Coding System FER Facial Expression Recognition ReLU Rectified Linear Unit SIANN Space Invariant Artificial Neural Network LDA Linear Discriminant Analysis PCA Principle Component Analysis v

CHAPTER 3

CHAPTER 1

Introduction

A Facial expression is the visible manifestation of the affective state, cognitive activity, intention, personality and psychopathology of a person and plays a communicative role in interpersonal relations. Human facial expressions can be easily classified into 7 basic emotions: happy, sad, surprise, fear, anger, disgust, and neutral. Our facial emotions are expressed through activation of specific sets of facial muscles. These sometimes subtle, yet complex, signals in an expression often contain an abundant amount of information about our state of mind. Automatic recognition of facial expressions can be an important component of natural human machine interfaces; it may also be used in behavioral science and in clinical practice. It have been studied for a long period of time and obtaining the progress recent decades. Though much progress has been made, recognizing facial expression with a high accuracy remains to be difficult due to the complexity and varieties of facial expressions [1]. On a day to day basics humans commonly recognize emotions by characteristic features, displayed as a part of a facial expression. For instance happiness is undeniably associated with a smile or an upward movement of the corners of the lips. Similarly other emotions are characterized by other deformations typical to a particular expression. Research into automatic recognition of facial expressions addresses the problems surrounding the representation and categorization of static or dynamic characteristics of these deformations of face pigmentation [2]. In machine learning, a convolutional neural network (CNN, or ConvNet) is a type of feedforward artificial neural network in which the connectivity pattern between its neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. The receptive fields of different neurons partially overlap such that they tile the visual field. The response of an individual neuron to stimuli within its receptive field can be approximated mathematically by 1

CHAPTER 3

feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. Now, there will be an entire set of filters in each convolution layer (e.g. 20 filters), and each of them will produce a separate 2-dimensional activation map. The 2-dimensional convolution between image A and Filter B can be given as: 𝐶(𝑖,𝑗) = ∑ ∑ 𝐴(𝑚, 𝑛) ∗ 𝐵(𝑖 − 𝑚,𝑗 − 𝑛) 𝑁𝑎−1 𝑛=0 𝑀𝑎−1 𝑚=0 (2.1) where size of A is (Ma x Na), size of B is (Mb x Nb), 0 ≤ 𝑖 < 𝑀𝑎 + 𝑀𝑏 −1 ∧ 0 ≤ 𝑗 < 𝑁𝑎 + 𝑁𝑏 − 1 A filter convolves with the input image to produce a feature map. The convolution of another filter over the same image gives a different feature map. Convolution operation captures the local dependencies in the original image. A CNN learns the values of these filters on its own during the training process (although parameters such as number of filters, filter size, architecture of the network etc. still needed to specify before the training process). The more number of filters, the more image features get extracted and the better network becomes at recognizing patterns in unseen images. The size of the Feature Map (Convolved Feature) is controlled by three parameters  Depth: Depth corresponds to the number of filters we use for the convolution operation.  Stride: Stride is the size of the filter, if the size of the filter is 5x5 then stride is 5.  Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around the border, so that filter can be applied to bordering elements of input image matrix. Using zero padding size of the feature map can be controlled. Rectified Linear Unit: An additional operation called ReLU has been used after every Convolution operation. A Rectified Linear Unit (ReLU) is a cell of a neural network which uses the following activation function to calculate its output given x: R(x) = Max(0,x) (2.2) Using these cells is more efficient than sigmoid and still forwards more information compared to binary units. When initializing the weights uniformly, half of the weights are negative. This helps creating a sparse feature representation. Another positive aspect is the relatively cheap computation. No exponential function has to be calculated. This function also prevents the 3

CHAPTER 3

vanishing gradient error, since the gradients are linear functions or zero but in no case nonlinear functions. Pooling (sub-sampling) Spatial Pooling (also called subsampling or down sampling) reduces the dimensionality of each feature map but retains the most important information. Spatial Pooling can be of different types: Max, Average, Sum etc. In case of Max Pooling, a spatial neighborhood (for example, a 2×2 window) is defined and the largest element is taken from the rectified feature map within that window. In case of average pooling the average or sum of all elements in that window is taken. In practice, Max Pooling has been shown to work better. Max Pooling reduces the input by applying the maximum function over the input xi. Let m be the size of the filter, then the output calculates as follows: M(𝑥𝑖) = 𝑚𝑎𝑥 {𝑥𝑖+𝑘,+𝑙 |𝑘| ≤ 𝑚/2 , |𝑙| ≤ 𝑚/2 𝑘, 𝑙𝜖 ℕ} Fig 1.2: Max Pooling The function of Pooling is to progressively reduce the spatial size of the input representation. In particular, pooling  Makes the input representations (feature dimension) smaller and more manageable  Reduces the number of parameters and computations in the network, therefore, controlling over-fitting  Makes the network invariant to small transformations, distortions and translations in the input image (a small distortion in input will not change the output of Pooling. 4

CHAPTER 3

creation and way of creating the system for accurate and reliable facial expression recognition system. As a result I am highly motivated to develop a system that recognizes facial expression and track one person’s activity.

1.2 Problem Statement

Human emotions and intentions are expressed through facial expressions and deriving an efficient and effective feature is the fundamental component of facial expression system. Face recognition is important for the interpretation of facial expressions in applications such as intelligent, man-machine interface and communication, intelligent visual surveillance, teleconference and real-time animation from live motion images. The facial expressions are useful for efficient interaction Most research and system in facial expression recognition are limited to six basic expressions (happy, sad, anger, disgust, netural, fear, surprise). It is found that it is insufficient to describe all facial expressions and these expressions are categorized based on facial actions [7]. Detecting face and recognizing the facial expression is a very complicated task when it is a vital to pay attention to primary components like: face configuration, orientation location where the face is set.

1.3 Objectives

  1. To develop a facial expression recognition system.
  2. To experiment machine learning algorithm in computer vision fields.
  3. To detect emotion thus facilitating Intelligent Human-Computer Interaction.

1.4 Scope and Applications

The scope of this system is to tackle with the problems that can arise in day to day life. Some of the scopes are: 1.The system can be used to detect and track a user’s state of mind.

  1. The system can be used in mini-marts, shopping center to view the feedback of the customers to enhance the business 6

CHAPTER 3

  1. The system can be installed at busy places like airport, railway station or bus station for detecting human faces and facial expressions of each person. If there are any faces that appeared suspicious like angry or fearful, the system might set an internal alarm.
  2. The system can also be used for educational purpose such as one can get feedback on how the student is reacting during the class.
  3. This system can be used for lie detection amongst criminal suspects during interrogation
  4. This system can help people in emotion related research to improve the processing of emotion data.
  5. Clever marketing is feasible using emotional knowledge of a person which can be identified by this system. 7

CHAPTER 3

to estimate general parameters for movement and displacement. Therefore, ending up with robust decisions for facial actions under these varying conditions becomes to be difficult. Rather than tracking spatial points and using positioning and movement parameters that vary within time, color (pixel) information of related regions of face are processed in Appearance Based Parameterizations; in order to obtain the parameters that are going to form the feature vectors. Different features such as Gabor, Haar wavelet coefficients, together with feature extraction and selection methods such as PCA, LDA, and Adaboost are used within this framework. For classifier problem we use algorithms like Machine learning, Neural Network, Support Vector Machine, Deep learning, Naive Bayes. The formation of histogram by using any of facial feature representation will use Support Vector Machine (SVM) for expression recognition. SVM builds a hyperplane to separate the high dimensional space. An ideal separation is achieved when the distance between the hyper plane and the training data of any class is the largest. 9

CHAPTER 3

CHAPTER 3

Methodology

The facial emotion recognition system is implemented using convolutional neural network. Facial images are classified into seven facial expression categories namely Anger, Disgust, Fear, Happy, Sad, Surprise and Neutral. Kaggle dataset is used to train and test the classifier. The block diagram of the system is shown in following figures Fig 3.1: Training Phase Fig 3.2: Testing Phase During training, the system received a training data comprising grayscale images of faces with their respective expression label and learns a set of weights for the network. The training step took as input an image with a face. Thereafter, an intensity normalization is applied to the image. The normalized images are used to train the Convolutional Network. To ensure that the training performance is not affected by the order of presentation of the examples, validation dataset is used to choose the final best set of weights out of a set of trainings performed with samples presented in different orders. The output of the training step is a set of weights that achieve the best result with the training data. During test, the system received a grayscale image of a face from test dataset, and output the predicted expression by using the final network weights learned during training. Its output is a single number that represents one of the seven basic expressions. 10