Transcript Week 2

Data Driven Attributes for
Action Detection
Week 2
Presented by Christina Peterson
Background
 Liu et al. [1] propose a unified framework for action recognition where
manually specified attributes are:
 Selected discriminatively to account for intra-class variability
 Integrated with data-driven attributes to make the attribute set more descriptive
 Yu et al. [2] propose a framework for an attribute-based query by using a
large pool of weak attributes composed of automatic classifier scores that
are easily acquired with no human labor.
 Query attributes are acquired by human labeling process
 Weak attributes are generated automatically by machine
 Query attributes are mapped to weak attributes
Background
 Malisiewicz et al. [3] propose a method for object detection which combines a
discriminative object detector with a nearest-neighbor approach.
 A separate linear SVM classifier is trained for each exemplar in the dataset
 Each exemplar is represented using a rigid HOG template
 This results in a large collection of simple individual Exemplar-SVM detectors rather
than a single complex category detector
 Farhadi et al. [4] propose an attribute based approach to object detection.
 Semantic and discriminative attributes
 Feature selection method for learning attributes that can be generalized across
categories
 Base feature definition
Background
 Tian et. al. [5] proposes a spatiotemporal deformable part model (SDPM)
that stays true to the structure of the original deformable part model (DPM).
 SDPM has volumetric parts that displace in both time and space
 Root filter used to capture the overall information of the action cycle and is
obtained by applying an SVM on the HOG3D features of the action cycle
Low Level Features
 STIP
 Histogram of Oriented Gradient (HOG)
 72 element descriptor
 Histogram of Optical Flow (HOF)
 90 element descriptor
 Color
 Texture
Bag of Words
 Concatenate Low Level Features for each video
 Cluster Features by Kmeans
 128 for color, 256 for texture, 1000 for STIP
 Each feature type will be clustered separately by Kmeans
 3 x 3 x 3 + 1 = 28 cells
 Features collected for each cell
 Create Histogram of cluster centers per feature for each cell in bounding box
 (128 + 256 + 1000) x 28
 Normalize based on size of bounding box
Exemplar SVM
 Train a separate linear SVM classifier for each exemplar in the dataset with
a single positive example and many negative examples
 This results in a large collection of simple individual Exemplar-SVM detectors
rather than a single complex category detector
 Example: The action Diving-side will have multiple linear SVM classifiers
each based on a positive example within this action class
 Test set will need to run all Exemplar-SVM detectors for the respective
action class to calculate label prediction accuracy
Goals
 Implement the Exemplar SVM classifiers in matlab
 Label Propagation
 Finding relationship between labels and prediction results
 Conditional Probability
References
[1] J. Liu, B. Kuipers, and S. Savarese. Recognizing Human Actions by Attributes. In
CVPR, 2011.
[2] F. Yu, R. Ji, M.-H. Tsai, G. Ye, and S.-F. Chang. Weak Attributes for Large-Scale
Image Retrieval. In CVPR, 2012.
[3] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of Exemplar SVMS for
Object Detection and Beyond. In Proc. ICCV, 2011.
[4] H. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their
attributes. In CVPR, 2009.
[5] Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal Deformable Part Models for
Action Detection. In CVPR, 2013.
[6] Y. Wang and G. Mori. Hidden Part Models for Human Action Recognition:
Probabilistic vs. Max-Margin. In PAMI, 2011.