Transcript Week 2
Data Driven Attributes for
Action Detection
Week 2
Presented by Christina Peterson
Background
Liu et al. [1] propose a unified framework for action recognition where
manually specified attributes are:
Selected discriminatively to account for intra-class variability
Integrated with data-driven attributes to make the attribute set more descriptive
Yu et al. [2] propose a framework for an attribute-based query by using a
large pool of weak attributes composed of automatic classifier scores that
are easily acquired with no human labor.
Query attributes are acquired by human labeling process
Weak attributes are generated automatically by machine
Query attributes are mapped to weak attributes
Background
Malisiewicz et al. [3] propose a method for object detection which combines a
discriminative object detector with a nearest-neighbor approach.
A separate linear SVM classifier is trained for each exemplar in the dataset
Each exemplar is represented using a rigid HOG template
This results in a large collection of simple individual Exemplar-SVM detectors rather
than a single complex category detector
Farhadi et al. [4] propose an attribute based approach to object detection.
Semantic and discriminative attributes
Feature selection method for learning attributes that can be generalized across
categories
Base feature definition
Background
Tian et. al. [5] proposes a spatiotemporal deformable part model (SDPM)
that stays true to the structure of the original deformable part model (DPM).
SDPM has volumetric parts that displace in both time and space
Root filter used to capture the overall information of the action cycle and is
obtained by applying an SVM on the HOG3D features of the action cycle
Low Level Features
STIP
Histogram of Oriented Gradient (HOG)
72 element descriptor
Histogram of Optical Flow (HOF)
90 element descriptor
Color
Texture
Bag of Words
Concatenate Low Level Features for each video
Cluster Features by Kmeans
128 for color, 256 for texture, 1000 for STIP
Each feature type will be clustered separately by Kmeans
3 x 3 x 3 + 1 = 28 cells
Features collected for each cell
Create Histogram of cluster centers per feature for each cell in bounding box
(128 + 256 + 1000) x 28
Normalize based on size of bounding box
Exemplar SVM
Train a separate linear SVM classifier for each exemplar in the dataset with
a single positive example and many negative examples
This results in a large collection of simple individual Exemplar-SVM detectors
rather than a single complex category detector
Example: The action Diving-side will have multiple linear SVM classifiers
each based on a positive example within this action class
Test set will need to run all Exemplar-SVM detectors for the respective
action class to calculate label prediction accuracy
Goals
Implement the Exemplar SVM classifiers in matlab
Label Propagation
Finding relationship between labels and prediction results
Conditional Probability
References
[1] J. Liu, B. Kuipers, and S. Savarese. Recognizing Human Actions by Attributes. In
CVPR, 2011.
[2] F. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang. Weak Attributes for Large-Scale
Image Retrieval. In CVPR, 2012.
[3] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of Exemplar SVMS for
Object Detection and Beyond. In Proc. ICCV, 2011.
[4] H. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their
attributes. In CVPR, 2009.
[5] Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal Deformable Part Models for
Action Detection. In CVPR, 2013.
[6] Y. Wang and G. Mori. Hidden Part Models for Human Action Recognition:
Probabilistic vs. Max-Margin. In PAMI, 2011.