Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 1EECS U.C.

Download Report

Transcript Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 1EECS U.C.

Describing People: A Poselet-Based
Approach to Attribute Classification
Lubomir Bourdev1,2
Subhransu Maji1
Jitendra Malik1
1EECS
U.C. Berkeley
2Adobe
Systems Inc.
Goal: Extract attributes from
images of people
Who has long hair?
Who has short pants?
Male or female?
Prior work
on poselets and on attributes
Prior work on Poselets
•
•
•
Introduced by [Bourdev and Malik, ICCV09]
Detection with poselets [Bourdev et al, ECCV10]
Applications
•
•
•
•
•
Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]
Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11]
Human parsing [Wang et al, CVPR11]
Semantic contours [Hariharan et al, ICCV11]
Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets
•
•
•
Introduced by [Bourdev and Malik, ICCV09]
Detection with poselets [Bourdev et al, ECCV10]
Applications
•
•
•
•
•
Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]
Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11]
Human parsing [Wang et al, CVPR11]
Semantic contours [Hariharan et al, ICCV11]
Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets
•
•
•
Introduced by [Bourdev and Malik, ICCV09]
Detection with poselets [Bourdev et al, ECCV10]
Applications
•
•
•
•
•
Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]
Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11]
Human parsing [Wang et al, CVPR11]
Semantic contours [Hariharan et al, ICCV11]
Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets
•
•
•
Introduced by [Bourdev and Malik, ICCV09]
Detection with poselets [Bourdev et al, ECCV10]
Applications
•
•
•
•
•
Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]
Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11]
Human parsing [Wang et al, CVPR11]
Semantic contours [Hariharan et al, ICCV11]
Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Poselets
•
•
•
Introduced by [Bourdev and Malik, ICCV09]
Detection with poselets [Bourdev et al, ECCV10]
Applications
•
•
•
•
•
Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11]
Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11]
Human parsing [Wang et al, CVPR11]
Semantic contours [Hariharan et al, ICCV11]
Subordinate level categorization [Farrell et al, ICCV11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
Joint learning of classes & attributes
Image retrieval with attributes
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Prior work on Attributes
Attributes as intermediate parts
Discovering attributes from text
Discovering attributes from images
Attributes from motion capture
Joint learning of classes & attributes
Image retrieval with attributes
Attributes and actions
Active learning with attributes
Attributes of people
Gender attribute
[Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02]
[Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08]
[Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al,
BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10]
[Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al,
ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11]
[Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11]
[Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11]
Poselets
for Attribute Classification
Male or female?
Gender recognition is easier if we
factor out the pose
Poselets
[Bourdev & Malik ICCV09]
Poselets
Examples may differ visually but have common semantics
How do we train a poselet?
Finding correspondences at training time
Given part of a human
pose
How do we find a similar
pose configuration in the
training set?
Finding correspondences at training time
Left Shoulder
Left Hip
We use keypoints to annotate the joints, eyes, nose,
etc. of people
Finding correspondences at training time
Residual Error
Training poselet classifiers
Residual
Error:
1.
2.
3.
4.
0.15
0.20
0.10
0.85
0.15
0.35
Given a seed patch
Find the closest patch for every other person
Sort them by residual error
Threshold them
Training poselet classifiers
1.
2.
3.
4.
5.
Given a seed patch
Find the closest patch for every other person
Sort them by residual error
Threshold them
Use them as positive training examples to train
a linear SVM with HOG features
Attribute Classification Algorithm
at Test Time
Goal: Extract attributes of this person
Goal: Extract attributes of this person
Input:
Target person bounds
Bounds of other nearby people
Step 1: Detect poselet activations
[Bourdev et al, ECCV10]
Step 2: Cluster the activations
[Bourdev et al, ECCV10]
Step 3: Predict person bounds
[Bourdev et al, ECCV10]
Step 4: Identify the correct cluster
Max-flow in bipartite graph
Start with its poselet activations
Poselet
Activations
Features
•
Pyramid HOG
•
LAB histogram
•
Skin features
•
Hands-skin
•
Legs-skin
Features
Poselet
Activations
Poselet
patch
Skin
mask
Arms
mask
B .* C
Attribute Classification Overview
Poselet-level
Attribute
Classifiers
Features
Poselet
Activations
Attribute Classification Overview
Person-level
Attribute
Classifiers
Poselet-level
Attribute
Classifiers
Features
Poselet
Activations
Attribute Classification Overview
Context-level
Attribute
Classifiers
Person-level
Attribute
Classifiers
Poselet-level
Attribute
Classifiers
Features
Poselet
Activations
Results
Our dataset
•
Source: VOC 2010 trainval for Person + H3D
•
~8000 annotations (4000 train + 4000 test)
•
9 binary attributes specified by 5 independent annotators via AMT
•
Ground truth label: If 4 of the 5 agree
•
Dataset will be made publicly available
Visual search on our test set
“Wears hat”
“Female”
“Has long hair”
“Wears glasses”
“Wears shorts”
“Has long sleeves”
“Doesn’t have long sleeves”
Our baseline
•
Canny-modulated HOG with SPM kernel [Lazebnik et al CVPR06]
•
To help the baseline trained separate SPM for four viewpoints:
Full view
•
Head zoom
Upper body
Legs
For each attribute we pick the best SPM as our baseline
Precision/recall on our test set
Label
frequency
SPM
No
context
Full
Model
---
___
___
___
State-of-the-art Gender Recognition
• We outperform Cognitec (top-notch face
recognizer)
• We outperform any gender recognizer based on
frontal faces (are there others?)
• 61% of our test have frontal faces.
• Even with perfect classification of frontal faces,
max AP=80.5% vs. our AP of 82.4%
Confusions
Men most confused as women
Women most confused as men
long hair
baseball hat
hair hidden
Non-T-shirt most confused to be T-shirt
annotation
errors
Short pants most confused to be long pants
Are these pants short?
wrong person
occlusion
Best poselets per attribute
Gender:
Long Hair:
Wears glasses:
We can describe a picture of a person
“A woman with long hair,
glasses and long pants”(??)
Conclusion
How poselets help in high-level vision
The image is a complex
function of the viewpoint,
pose, appearance, etc.
Poselets decouple pose and
camera view from
appearance
Google “poselets” to get:
•
•
•
•
•
The set of published poselet papers
H3D data set + Matlab tools
Java3D annotation tool + video tutorial
Matlab code to detect people using poselets
Our latest trained poselets
Poselets website
Failure mode
http://eecs.berkeley.edu/~lbourdev/poselets
woman
“A“Aman
with with
shortlong hair,
•
•
•
•
•
“Aglasses,
man with
short
hair,
“A
person
short
with
short
sleeves
and
hair and long sleeves”hair,
glasses,
short
sleeves
The set of published poseletno
papers
long
hat pants”
and
long
sleeves”
shorts”with
H3D data set + Matlab toolsand
“A person
Java3D annotation tool + videolong
tutorial
pants” vision
“A computer
Matlab code to detect people using poselets
professor who likes
Our latest trained poselets
machine learning”