Automatic Face Recognition for Film Character Retrieval in

Download Report

Transcript Automatic Face Recognition for Film Character Retrieval in

Automatic Face Recognition for
Film Character Retrieval in
Feature-Length Films
Ognjen Arandjelović
Andrew Zisserman
The objective
Retrieve all shots in a video, e.g. a feature length film, containing
a particular person
Visually defined search – on faces
“Groundhog Day”
[Ramis, 1993]
Applications:
• intelligent fast forward on characters
• pull out all videos of “x” from 1000s of digital camera mpegs
The difficulty of face recognition
Image variations due to:
• pose/scale
• lighting
• expression
• partial occlusion
Previous work
There’s been significant progress in face recognition in the
recent years:
1.
Pose/illumination invariant recognition (e.g. The 3D Morphable
Model – [Blanz et al., 2002])
2.
Local feature-based approaches (e.g. Elastic Bunch Graph
Matching – [Bolme, 2003], Sivic et al., 2005)
3.
Appearance manifold-based methods and online appearance
model building (e.g. see previous talk)
4. Etc.
System overview
Five key steps:
SVM Classifiers
Features Training
Data
Probabilistic Model of
Face Outline
Detected Face
Normalized Pose
1.
Feature localization
2.
Affine warping
3.
Face outline detection
4.
Refine registration
5.
Robust distance
Warp
Features
Background Removal
Filter
Face Signature
Image
Normalized
Illumination
Background
Clutter Removed
Facial feature detection
Train support vector machines to detect the eyes and the mouth
(similar to “Names and Faces in the News” [Berg et al., 2004])
Independent Gaussian priors on feature locations
Example training data:
Learn invariance to:
• pose
• expression
Detected eyes and mouths
Successful detections in spite of large pose and expression variation
Warped faces using detected features
Original detected faces
Faces after affine warping
Background removal
Key features and ideas:
• we do not use colour
• only gradient information is
used
• faces are smooth with
limited shape variability
• model boundary traversal as
a Markov chain
Significant clutter in
images of detected faces
Background removal
Radial mesh
Solved using dynamic programming
Image intensity – threshold
gradient to find interest points
Background removal – examples
Registered
Segmented
Registration refinement
• faces already affine registered using
3 facial features
Salient regions
• feature localization errors amount to
a significant registration error
• refinement using appearance – normalized
cross-correlation of salient regions
Face 1
Face 2
Face 1 registered to 2
Occlusion detection
Key points:
•
occlusion detected when a
pair of images is compared
•
from a training corpus learn
the intra/intra-personal variance
of each location/pixel
Two faces being
compared
High occlusion probability
•
occlusion = pixels with low
intra/inter-personal probability
•
contribution of occlusions to
distance limited by BlakeZisserman function
Grimace
Hand
Evaluation - querying
The protocol:
1. faces are automatically detected
2. query consists of one or more faces of the reference
actor
and, optionally
3. images of non-reference actors
Evaluation - distances
Other
Three matching methods:
• K-min distance
• Linear subspace
(reference only)
• Nearest linear subspace
(reference and other)
Correct person
Query
Evaluation - performance
Google-like retrieval, faces are ordered in decreasing
similarity
Performance measure:
• operates on sequences of recalled images
• rank-ordering score S
> in the range [0,1]
> = 1 indicates all N true positives are recalled first
> = 0.5 indicates a random ordering
Results - data
Method evaluated on
several films:
• Groundhog Day
• Pretty Woman
• Run, Lola Run
• Fawlty Towers
Typical input data
Results – rank ordering score
Rank ordering score for 35 retrievals of Basil and Sybil
Basil
Sybil
Results – example recalls
Fawlty Towers
(John Cleese)
Pretty Woman
(Julia Roberts)
Results – example recalls
Groundhog day
(Andie MacDowel)
Groundhog day
(Bill Murray)
Conclusions
Future work:
•
Use of sequence information for disambiguation in
recognition (see “Person spotting: video shot retrieval
for face sets” [Sivic et al., CIVR 2005])
•
Use of photometric models for improved illumination
normalization