Rapid Hand Detection with Adaboost Classifiers Based on

Download Report

Transcript Rapid Hand Detection with Adaboost Classifiers Based on

Hand Detection with a Cascade of Boosted
Classifiers Using Haar-like Features
Qing Chen
Discover Lab, SITE, University of Ottawa
May 2, 2006
Outline






1. Introduction
2. Haar-like features
3. Adaboost
4. The Cascade of Classifiers
5. Preliminary Results
6. Future Work
2
1. Introduction
 Hand-based Human Computer Interface (HCI) should
meet the requirements of real-time, accuracy and
robustness.
 The purpose of Haar-like features is to meet the real-time
requirement.
 The purpose of the cascade of Adaboosted (Adaptive
boost) classifiers is to achieve both accuracy and speed.
 The algorithm has been used for face detection which
achieved high detection accuracy and approximately 15
times faster than any previous approaches.
 The algorithm is a generic objects detection/recognition
method.
3
2. Haar-Like Features

Each Haar-like feature consists of two or three jointed “black” and “white”
rectangles:
Figure 1: A set of basic Haar-like features.
Figure 2: A set of extended Haar-like features.


The value of a Haar-like feature is the difference between the sum of the
pixel gray level values within the black and white rectangular regions:
f(x)=Sumblack rectangle (pixel gray level) – Sumwhite rectangle (pixel gray level)
Compared with raw pixel values, Haar-like features can reduce/increase
the in-class/out-of-class variability, and thus making classification easier.
4
2. Haar-Like Features (cont’d)
 The rectangle Haar-like features can be computed rapidly using
“integral image”.
 Integral image at location of x, y contains the sum of the pixel
values above and left of x, y, inclusive:
P( x, y) 
i( x' , y' )
P (x, y)
x ' x , y ' y
 The sum of pixel values within “D”:
P1  A, P2  A  B, P3  A  C, P4  A  B  C  D
P1  P4  P2  P3  A  A  B  C  D  A  B  A  C  D
A
P1
B
P2
D
C
P3
P4
5
2. Haar-Like Features (cont’d)

To detect the hand, the image is scanned by a sub-window containing a
Haar-like feature.

Based on each Haar-like feature fj , a weak classifier hj(x) is defined as:
where x is a sub-window, and θ is a threshold. pj indicating the direction
of the inequality sign.
6
3. Adaboost

The computation cost using Haar-like features:
Example: original image size: 320X240,
sub-window size: 24X24,
frame rate: 15 frame/second,
The total number of sub-windows with one Haar-like feature per second:
(320-24+1)X(240-24+1)X15=966,735


Considering the scaling factor and the total number of Haar-like features,
the computation cost is huge.
AdaBoost (Adaptive Boost) is an iterative learning algorithm to construct
a “strong” classifier using only a training set and a “weak” learning
algorithm. A “weak” classifier with the minimum classification error is
selected by the learning algorithm at each iteration.
AdaBoost is adaptive in the sense that later classifiers are tuned up in
favor of those sub-windows misclassified by previous classifiers.
7
3. Adaboost (cont’d)

The algorithm:
8
3. Adaboost (cont’d)
 Adaboost
starts
with
a
uniform
distribution of “weights” over training
examples. The weights tell the learning
algorithm the importance of the example.
 Obtain a weak classifier from the weak
learning algorithm, hj(x).
 Increase the weights on the training
examples that were misclassified.
 (Repeat)
 At the end, carefully make a linear
combination of the weak classifiers
obtained at all iterations.
ffinal (x)   final,1h1 (x)    final,n hn (x)
9
4. The Cascade of Classifiers




A series of classifiers are applied to every sub-window.
The first classifier eliminates a large number of negative sub-windows and pass
almost all positive sub-windows (high false positive rate) with very little
processing.
Subsequent layers eliminate additional negatives sub-windows (passed by the
first classifier) but require more computation.
After several stages of processing the number of negative sub-windows have
been reduced radically.
10
4. The Cascade of Classifiers (cont’d)

Negative samples: non-object
images. Negative samples are
taken from arbitrary images.
These images must not contain
object representations.

Positive samples: images contain
object (hand in our case). The
hand in the positive samples must
be marked out for classifier
training.
11
5. Preliminary Results






Number of pos. samples: 144
Number of neg. samples: 3142
Sample Resolution: 640X480
Initial sub-window size: 15X30
Scale factor: 1.3
Cascade obtained: 12 grades
12
6. Future Work




Extended Haar-like features? Will
extended Haar-like features improve
the detection accuracy? (Still an Open
Problem) The performance tradeoff?
Parallel cascades for multiple hand
gestures. How to select the hand
gesture configurations which can be
detected more effectively with the
employed Haar-like feature set?
Improve the robustness against hand
rotation.
How much improvement can be
achieved with more training samples?
Intel face detection classifier: 5000 Pos.
10000 Neg. Accuracy: 98%
13
References:








Wu Bo, et al., “A Multi-View Face Detection Based on Real Adaboost Algorithm,” Computer
Research and Development, 42 (9):pp.1612-1621,2005.
Paul Viola and Michael J. Jones, “Robust Real-time Object Detection,” Technical Report,
Cambridge Research Lab, Compaq. 2001.
Cynthia Rudin, Robert E. Schapire, Ingrid Daubechies, “Analysis of Boosting Algorithms
using the Smooth Margin Function: A Study of Three Algorithms,” 2004.
Rainer Lienhart, Alexander Kuranov, Vadim Pisarevsky, “Empirical Analysis of Detection
Cascades of Boosted Classifiers for Rapid Object Detection,” MRL Technical Report, May
2002.
Andre L. C. Barczak, Farhad Dadgostar, “Real-time Hand Tracking Using a Set of
Cooperative Classifiers and Haar-Like Features,” Research Letters in the Information and
Mathematical Sciences, ISSN 1175-2777, Vol. 7, pp 29-42, 2005.
Mathias Kölsch and Matthew Turk, “Robust Hand Detection,” Proc. IEEE Intl. Conference on
Automatic Face and Gesture Recognition, May 2004.
Intel OpenCV Documents.
Acknowledgement goes to Urtho’s training data for eye detection and F. Dadgostar’s hand
palm database.
14
Thank you and Any Questions?
15