Abstract—Face detection has been becoming viral research field for more than two decades in Computer vision, the application using this technology are many, ranging from biometrical security to automatically detecting face, since it has high practical value in detecting and recognizing images containing human. Based on research the traditional method for face detection has some obstacles that bring some uncontrolled environments which influence the accuracy of face detection. On the other hand, there is modern approach using Deep learning. In this paper, researchers have discussed and proved the practical advantages using modern approach of neural network by running the method to FDDB face detection dataset and showing experimental and qualitative results that a Convolutional Neural Network (CNN) as the proposed method has a better performance.
Keywords—Face detection; Traditional method for face detection; Deep learning; CNN
With the progressive development technology of image processing, computer vision has a primary purpose for making useful decisions about objects real physical and image views that are sourced or obtained from the sensor. The recognition of human face is one of the most basic activities that humans perform with ease on a daily basis. Before recognizing human face, first we have to analyze by detecting an object, especially face. Face detection has been an active research area in the Computer vision field for more than two decades mainly due to the countless number of applications that require face detection as a first step 9 and it is one of the early stages of various detections containing human. Face detection has an important role, since there are many applications that have been using this technology, particularly biometric identification technology which usually combined with other verification methods, automatic border control, or crowd surveillance as well as national information security that is used as a rigorous security system which is discussed in this paper. Traditional method of face detection is used to detect a face. However, face detection technology using traditional method extracted and classified which SVM, DTree as classifiers has some gaps, face detection sometimes will not be placed according to the face area and it has some minor changes, for example the error detection caused the image contains many faces. Finally, information acquisition will affect the effect of the acquisition itself, thus are affecting the accuracy of identification. In this paper, researchers presented experimental results and describe the qualitative result for both the traditional method and modern method for face detection.
II. RELATED WORK
In the face detection technology, many neural networks have been proposed in various commercial products like digital cameras or smartphones. On the other hand, the traditional method for face detection is still used in some technologies due to the speed and simplicity. The original Viola-Jones detector used Haar cascade and is fast to evaluate but also it has shortcoming in detecting faces from different angles. Some methods such as parallel cascade 5 and pyramid cascade 6 address this issue by using one classi?er cascade for each speci?c facial view, while in 4 a decision tree is used for pose estimation and then the corresponding cascade is used to verify the detection. The common face detection methods are divided into feature-based detection and statistical model-based detection. The statistical model-based detection is the mainstream of the research. Among them, Adaboost algorithm 1 is the fastest method in speed. In 2 used an Adaboost based algorithm which selects critical features from large sets by combining increasingly complex classifiers in cascade.
As face detection can be mainly formulated as a pattern recognition problem, numerous algorithms have been proposed to learn their generic templates (e.g., eigenface and statistical distribution) or discriminant classifiers (e.g., neural networks, Fisher linear discriminant, sparse network of Winnows, decision tree, Bayes classifiers, support vector machines, and AdaBoost. Typically, a good face detection system needs to be trained with several iterations. One common method to further improve the system is to bootstrap a trained face detector with test sets, and re-train the system with the false positive as well as negatives 3.
In this research, researchers organized this paper as follows. First, introducing the problem. Second, expanding the paradigm utilizing CNN to sort the problem out and finally make an experiment to show the comparison of qualitative result between traditional and the modern approach.
Face detection has some shortcomings in an object, particularly an image contains many faces was considered to be challenging problems in the field of Computer vision (CV), so the Deep learning is adopted. With the help of modern Computer vision algorithms and Deep learning there has been a significant decline of those problems.
Classical computer vision (CV) algorithm follows relation limited pipeline, and relatively linear with the simpler classifier, mostly use simple and fast classifiers that reject the most common negative samples and then they use progressively more complex classifiers to deal with the more difficult and odd negative samples, it assigns detecting similar content.
Modern computer vision (CV) algorithm is followed for face detection and it is more flexible. The modern algorithm assigns not only for face detection, but also for any task. Neural network is used for this process, it is useful for the possibility to analyze and handle all kinds of data in a fast and reliable way, particularly for processing a raw image and it is complex signal processing which lured more people into using it.
A. Traditional method for face detection
The use of traditional mathematic calculation method is necessary, it is used for representing an object to be considered for classifiers and scan all the pixels in the rectangle. The Haar feature classifier uses the rectangle integral to calculate the value of a feature. The rectangle feature is the difference between the sum of the black pixels and the sum white pixels. The Haar feature classifier multiplies the weight of each rectangle by its area and the results are added together. In this technique our data set is based on the OpenCV. For the based design of face detection, we use seminal work of Viola and Jones. Viola-Jones detector adopted Adaboost classifier with cascade structure to achieve real-time face detection 7.
Fig. 1 The integral of the point (x, y)
Fig. 2 Rectangle calculation of Haar feature classifier.
As shown in Fig. 1, A (x, y) represents the integral of (x, y) and S (x, y) represents the sum of all the original images in the y direction of the point (x, y) and calculating the area of rectangle R is done using the corner of rectangle: L4-L3-L2+L1 as shown in Fig. 1. In order to calculate integral image of Haar feature classifiers, there are four points of the rectangles of the Haar feature classifer is calculated by the method as shown in Fig. 2. Integral image value of each rectangle multiplies with its weight. Integral image at location of x, y as shown in Fig. 1 contains the sum of the pixel values above and left x, y inclusive:
A (x, y) = ?i (x’, y’)
x’ ?, y’ ? y
However, the rectangle features are only related to the integral graph. By calculating the integral graph, the eigenvalues can be obtained. Therefore, the improvement of calculation speed feature is enable and then simply adding and subtracting them.
B. Deep learning
One of the key aspects in most machine learning methods is the way data is represented which features to use. Given an image, it is capable of learning concepts such as cars, cats or human by combining sets of basic features, such as corners and edges. This process is done through successive layers that increase the complexity of the learned concepts. In the modern computer vision, machine learning is used as an integral part of framework. The most vital component of Deep learning is Neural networks. Neural networks are composed by layers of interconnected neurons that is called the depth of a network.
Neural networks are proper for problem solving that has complex data or has a nonlinear feature 8. Neural network is a technique in the Machine learning that mimics the human nerve that is a fundamental part of its brain. Neural network consists of the input layer and the output layer. Each layer consists of one or several units of neurons that have an activation function that determines the output of the unit. We can add a hidden layer to increase the ability of the Neural network. Neural network can be trained using training data. The more training data the better the performance of the Neural network. However, Neural network ability is also limited to the number of layers, the more the number of layers the higher the Neural network capacity. The more layers also bring the shortage and the more number of iterations or training required. To overcome this, the Deep learning Convolutional Neural Network technique is developed.
Convolutional Neural Network for face detection
Among the Neural network, the ones that are most widely used in Computer vision problems are the Convolutional Neural Networks. CNN follows the same principles as human visual perception system. CNN is a special family of neural network designed 10 specially to handle image or 2D spatial data.
Fig. 3 Convolutional Neural Network Architecture Visualization
ConvNet is composed mainly of3 different kind of layers namely the most important ‘Convolutional layers’, an optional and configurable ‘Pooling layers’ and a ‘Fully connected layer’10 . The target of the network. As shown in Fig. 3 as an example of starting a 224 x 224 pixel image, apply convolution and max pooling twice, apply convolution 3 more times, apply max pooling and then have two fully-connected layers. The end result is that the image is classified into one of 1000 categories.
Fig. 4 Example of Convolutional Neural Network
As the target output of the network has to be a binary mask, the last layer is proposed to be consisting of sigmoid on-linearity because 10
Sigmoid (x) = ??????? € (0, 1)
1 + exp(-x)
The stride parameter has the potential of performing some reductions. Therefore, as the image is further propagated through the network it gets smaller as shown in Fig. 4 it means each neuron of the last layers will process larger patches from the original images. The fully connected layer can potentially be an entire multi-layer perception (MLP). As shown in Fig. 4 here we can see each of layers, having 6, 6, 16 layers each and soon. It also shows how each neuron only processes a patch of its input. The image size keeps reducing until reaching the Fully Connected Layers 11.
The central purpose of this research is to discover, compare, make qualitative results and describe the suitable method by depicting the result between the old approach and the modern approach of face detection. The details of the dataset and results of our experimentation is below.
In this section, we focus on the result of modern approach by FDDB database. FDDB database consists of 2845 images with 5171 face annotations collected from journalistic articles and is one of the most commonly used benchmark for face detection 12. In this case, FDDB contains a large image that is displayed inside the collection of the pose, lighting, background and appearance. There are some variation of face images are due to factors such as motion, expression face and occlusion that are characteristics from the setting which is not limited for image acquisition. FDDB is created a data set that contains images and associated captions extracted from news articles. Annotated face in a data collection is selected based on the output from face detector automatically. Evaluation from the face detection algorithm on the existing set of annotation set will favor the approach with the output which has correlation with detector algorithm. The richness of the images included in this collection, however, motivated us to build an index of all of the faces present in a subset of images from this collection.
Fig. 5 Sample data from FDDB dataset
Fig. 6 Output using the old method
Fig. 7 Comparison studies depicted
Experiments on the FDDB database as the representative of modern method and The Haar using seminal work of Viola and Jones as the representative of the traditional method as shown in Fig. 7 represent that the proposed method is robust in handling illuminations, occlusions and pose-variations, achieving much better performance. Besides, the proposed face detector is able to make an output of local facial components as well landmarks. On the other hand, the traditional method has some gaps that is not able to detect in a proper area of the face images as shown in Fig. 6 since the original Viola-Jones detector used Haar features is fast to evaluate but it fails in detecting faces from different angles 12. So, in this research, researchers showed some qualitative result to prove that modern approach is much better than the old approach. We are extremely passionate about going into depth for this research and we may use quantitative result for the efficiency of modern approach which may be helpful in landmark detection initialization and pose estimation in the future.