Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 1EECS U.C.
Download ReportTranscript Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 1EECS U.C.
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev1,2 Subhransu Maji1 Jitendra Malik1 1EECS U.C. Berkeley 2Adobe Systems Inc. Goal: Extract attributes from images of people Who has long hair? Who has short pants? Male or female? Prior work on poselets and on attributes Prior work on Poselets • • • Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications • • • • • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11] Prior work on Poselets • • • Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications • • • • • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11] Prior work on Poselets • • • Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications • • • • • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11] Prior work on Poselets • • • Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications • • • • • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11] Prior work on Poselets • • • Introduced by [Bourdev and Malik, ICCV09] Detection with poselets [Bourdev et al, ECCV10] Applications • • • • • Segmentation [Brox et al, ECCV10] [Maire et al, ICCV 11] Actions [Yang et al, CVPR10] [Maji et al, CVPR11] [Yao et al, ICCV11] Human parsing [Wang et al, CVPR11] Semantic contours [Hariharan et al, ICCV11] Subordinate level categorization [Farrell et al, ICCV11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Attributes and actions Active learning with attributes Attributes of people Gender attribute Joint learning of classes & attributes Image retrieval with attributes [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Prior work on Attributes Attributes as intermediate parts Discovering attributes from text Discovering attributes from images Attributes from motion capture Joint learning of classes & attributes Image retrieval with attributes Attributes and actions Active learning with attributes Attributes of people Gender attribute [Cottrell and Medcalfe, NIPS90] [Golomb et al, NIPS90] [Moghaddam & Yang, PAMI02] [Ferrari & Zisserman, NIPS07] [Kumar et al, ECCV08] [Gallagher and Chen, CVPR08] [Cao et al, ACM08] [Lampert et al, CVPR09] [Farhadi et al, CVPR 09] [Wang et al, BMVC09] [Wang and Forsyth, ICCV09] [Kumar et al, ICCV09] [Farhadi et al, CVPR10] [Berg et al, ECCV10] [Wang and Mori, ECCV10] [Sigal et al, ECCV10] [Branson el al, ECCV10] [Hwang et al, CVPR11] [Parikh and Grauman, CVPR11] [Douze et al, CVPR11] [Kovashka et al, ICCV11] [Liu et al, CVPR11] [Qiu et al, ICCV11] [Yao et al, ICCV11] [Dhar et al, CVPR11] [Parikh and Grauman, ICCV11] [Siddiquie et al, CVPR11] Poselets for Attribute Classification Male or female? Gender recognition is easier if we factor out the pose Poselets [Bourdev & Malik ICCV09] Poselets Examples may differ visually but have common semantics How do we train a poselet? Finding correspondences at training time Given part of a human pose How do we find a similar pose configuration in the training set? Finding correspondences at training time Left Shoulder Left Hip We use keypoints to annotate the joints, eyes, nose, etc. of people Finding correspondences at training time Residual Error Training poselet classifiers Residual Error: 1. 2. 3. 4. 0.15 0.20 0.10 0.85 0.15 0.35 Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them Training poselet classifiers 1. 2. 3. 4. 5. Given a seed patch Find the closest patch for every other person Sort them by residual error Threshold them Use them as positive training examples to train a linear SVM with HOG features Attribute Classification Algorithm at Test Time Goal: Extract attributes of this person Goal: Extract attributes of this person Input: Target person bounds Bounds of other nearby people Step 1: Detect poselet activations [Bourdev et al, ECCV10] Step 2: Cluster the activations [Bourdev et al, ECCV10] Step 3: Predict person bounds [Bourdev et al, ECCV10] Step 4: Identify the correct cluster Max-flow in bipartite graph Start with its poselet activations Poselet Activations Features • Pyramid HOG • LAB histogram • Skin features • Hands-skin • Legs-skin Features Poselet Activations Poselet patch Skin mask Arms mask B .* C Attribute Classification Overview Poselet-level Attribute Classifiers Features Poselet Activations Attribute Classification Overview Person-level Attribute Classifiers Poselet-level Attribute Classifiers Features Poselet Activations Attribute Classification Overview Context-level Attribute Classifiers Person-level Attribute Classifiers Poselet-level Attribute Classifiers Features Poselet Activations Results Our dataset • Source: VOC 2010 trainval for Person + H3D • ~8000 annotations (4000 train + 4000 test) • 9 binary attributes specified by 5 independent annotators via AMT • Ground truth label: If 4 of the 5 agree • Dataset will be made publicly available Visual search on our test set “Wears hat” “Female” “Has long hair” “Wears glasses” “Wears shorts” “Has long sleeves” “Doesn’t have long sleeves” Our baseline • Canny-modulated HOG with SPM kernel [Lazebnik et al CVPR06] • To help the baseline trained separate SPM for four viewpoints: Full view • Head zoom Upper body Legs For each attribute we pick the best SPM as our baseline Precision/recall on our test set Label frequency SPM No context Full Model --- ___ ___ ___ State-of-the-art Gender Recognition • We outperform Cognitec (top-notch face recognizer) • We outperform any gender recognizer based on frontal faces (are there others?) • 61% of our test have frontal faces. • Even with perfect classification of frontal faces, max AP=80.5% vs. our AP of 82.4% Confusions Men most confused as women Women most confused as men long hair baseball hat hair hidden Non-T-shirt most confused to be T-shirt annotation errors Short pants most confused to be long pants Are these pants short? wrong person occlusion Best poselets per attribute Gender: Long Hair: Wears glasses: We can describe a picture of a person “A woman with long hair, glasses and long pants”(??) Conclusion How poselets help in high-level vision The image is a complex function of the viewpoint, pose, appearance, etc. Poselets decouple pose and camera view from appearance Google “poselets” to get: • • • • • The set of published poselet papers H3D data set + Matlab tools Java3D annotation tool + video tutorial Matlab code to detect people using poselets Our latest trained poselets Poselets website Failure mode http://eecs.berkeley.edu/~lbourdev/poselets woman “A“Aman with with shortlong hair, • • • • • “Aglasses, man with short hair, “A person short with short sleeves and hair and long sleeves”hair, glasses, short sleeves The set of published poseletno papers long hat pants” and long sleeves” shorts”with H3D data set + Matlab toolsand “A person Java3D annotation tool + videolong tutorial pants” vision “A computer Matlab code to detect people using poselets professor who likes Our latest trained poselets machine learning”