Face Detection

Face Detection with a Sliding Window

An example image on which my face detector ran.

This project implements a face detection pipeline. Face detection is one of the most common applications of computer vision. If you try and snap a picture from a smartphone, you'll often get suggestions for face outlines. Photo editing applications, like photoshop, will find faces for its users. In this project, I implement a sliding window face detector to create my own face detector. My detector overall works very well with an average precision of 90.8%.

A sliding window face detector is straightforward to understand: a trained classifier looks at every image patch and determines whether or not there is a face in that window. In order for this to be effective, the image must be resized in order to detect faces at different scales. My implementation is modeled on the Dalal-Triggs paper of 2005, which focuses on representing faces through the SIFT-like Histogram of Gradients.

Retrieving features

The first step in training the classifier is to get positive and negative examples. In order to train my classifier with positive examples, I used a database of faces from the Caltech Face database that were cropped to 36x36 pixels. I retrieved the HOG descriptor for each image. In order to obtain negative examples, I had a database of non-face scenes which include scenes from Wu et al. and the SUN scene database. I randomly chose squares of size 36x36 and then extracted the HOG descriptor for that pixel patch.

Training the classifier

A representation of a face in the HOG feature space

Once I had a collection of 6,713 positive training examples and over 13,000 negative training examples (which I increased in the attempt of decreasing my false positive rate), I had to train my classifier to recognize face and non-face examples. I used a linear SVM (support vector machine)classifier to train the classifier. This classifier was labeled with binary labels for face or non-face examples.

As a result of training the classifier, I then had a representation of a face in the HOG feature space. This is a model for a face and this is what the sliding window compares image patches to. Using this model, I was able to compute confidences on each patch of the image.

Detecting images

I then ran my detector on input images from the CMU + MIT database of 130 images, which contained 511 faces including both human and non-human faces. For each image, I try to detect faces at six different scales: original, .9 times the image, .7, .5, .3, and .1. I convert each resized image to a HOG feature space and then performed a sliding window comparison stepping through each pixel. I evaluated the confidence of having a face at each window based on the output from my linear SVM classifier. I kept track of the information regarding the bounding boxes and confidences for every box that was above a certain threshold. Because many boxes overlap and some faces are detected at different scales, I perform non-maximum suppression on all my potential faces.

Optimizations

There are several key components to my face detector pipeline. The major design choices that I made were to decrease my hog_cell_size to 3 from 6, which works with a finer HOG template. This improved my accuracy by 5%. I also decided to step through every single pixel in my HOG feature space to make sure that I was able to detect all faces at any given point. Finally, my multi-scale detector in images works at six different scales. Compared to my single scale implementation, my multiscale detector performs 10% better.

Performance and Results

A graph of my precision recall.

I have an average precision rate of 90.8%, detecting a vast majority of all faces in input images. This is higher than the 89.5% accuracy achieved by the Viola-Jones 2001 algorithm. There was a direct relationship between the precision and the number of false positives detected. I was able to get my program to run at 91.5% accuracy, but each image had dozens of bounding boxes making it impractical. Conversely, I was also able to reduce the number of false positives in each image to only one or two but at the expense of several percentage points of accuracy. In the end, I compromised with a face detector that achieves around 90.8% accuracy and has a handful of false positives.