An example image on which my face detector ran.
This project implements a face detection pipeline. Face detection is one of the most common applications of computer vision. If you try and snap a picture from a smartphone, you'll often get suggestions for face outlines. Photo editing applications, like photoshop, will find faces for its users. In this project, I implement a sliding window face detector to create my own face detector. My detector overall works very well with an average precision of 90.8%.
A sliding window face detector is straightforward to understand: a trained classifier looks at every image patch and determines whether or not there is a face in that window. In order for this to be effective, the image must be resized in order to detect faces at different scales. My implementation is modeled on the Dalal-Triggs paper of 2005, which focuses on representing faces through the SIFT-like Histogram of Gradients.
The first step in training the classifier is to get positive and negative examples. In order to train my classifier with positive examples, I used a database of faces from the Caltech Face database that were cropped to 36x36 pixels. I retrieved the HOG descriptor for each image. In order to obtain negative examples, I had a database of non-face scenes which include scenes from Wu et al. and the SUN scene database. I randomly chose squares of size 36x36 and then extracted the HOG descriptor for that pixel patch.
A representation of a face in the HOG feature space
Once I had a collection of 6,713 positive training examples and over 13,000 negative training examples (which I increased in the attempt of decreasing my false positive rate), I had to train my classifier to recognize face and non-face examples. I used a linear SVM (support vector machine)classifier to train the classifier. This classifier was labeled with binary labels for face or non-face examples.
As a result of training the classifier, I then had a representation of a face in the HOG feature space. This is a model for a face and this is what the sliding window compares image patches to. Using this model, I was able to compute confidences on each patch of the image.
I then ran my detector on input images from the CMU + MIT database of 130 images, which contained 511 faces including both human and non-human faces. For each image, I try to detect faces at six different scales: original, .9 times the image, .7, .5, .3, and .1. I convert each resized image to a HOG feature space and then performed a sliding window comparison stepping through each pixel. I evaluated the confidence of having a face at each window based on the output from my linear SVM classifier. I kept track of the information regarding the bounding boxes and confidences for every box that was above a certain threshold. Because many boxes overlap and some faces are detected at different scales, I perform non-maximum suppression on all my potential faces.
There are several key components to my face detector pipeline. The major design choices that I made were to decrease my hog_cell_size to 3 from 6, which works with a finer HOG template. This improved my accuracy by 5%. I also decided to step through every single pixel in my HOG feature space to make sure that I was able to detect all faces at any given point. Finally, my multi-scale detector in images works at six different scales. Compared to my single scale implementation, my multiscale detector performs 10% better.
A graph of my precision recall.
I have an average precision rate of 90.8%, detecting a vast majority of all faces in input images. This is higher than the 89.5% accuracy achieved by the Viola-Jones 2001 algorithm. There was a direct relationship between the precision and the number of false positives detected. I was able to get my program to run at 91.5% accuracy, but each image had dozens of bounding boxes making it impractical. Conversely, I was also able to reduce the number of false positives in each image to only one or two but at the expense of several percentage points of accuracy. In the end, I compromised with a face detector that achieves around 90.8% accuracy and has a handful of false positives.
My detector works well with detecting all the faces in a group.
My detector seems to work the best with images where faces are the dominant part of the picture
Even when a picture is distorted, my detector will find it.
An example of detecting faces at different scales.
My detector tends to find nearly all faces in an image.
Face detection even works on smaller images.
My detector even finds non-human faces.
My detector even works on playing card faces
Since the detector is feature based, it finds many different kinds of faces-even sketches.
Another example of finding faces at many different scales.
Faces-alien and human-from a sci-fi show I used to watch.
Even faces closed to each other are detected individually.
Different types of faces are detected as well.
Even faces closed to each other are detected individually.
My best performance occurs when a face dominates the screen.