Pyramidal Spatiotemporal Relationship Match (PSRM)

Download Report

Transcript Pyramidal Spatiotemporal Relationship Match (PSRM)

Real-time Action Recognition by Spatiotemporal Semantic
and Structural Forest
Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla
Machine Intelligence Laboratory, Engineering Department, University of Cambridge
Introduction and Motivations
• A novel real-time solution for action recognition
• utilises local-appearance and structural information.
Main features / major contributions:
Continuous /
frame-by-frame
recognition
High run-time
performances
Real-time feature
extraction and
classification
Pyramidal
Local
Short
spatiotemporal
appearance
response
relationship match
+ structural
time
information
(PSRM)
Main objective:
efficiency
A short demo
Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for
the full demo video.
Related Work
“Bag of words” model
Sophisticated
spatiotemporal
features
Learned
classifier
K-means
codebook
• Many current methods focus on:
[Schuldt et al. ICPR2004, Niebles et al. BMVC06, Ryoo and Aggarwal ICCV09, Willems
BMVC09, Riemenschneider et al. BMVC09]
Accuracy
Action representation model (Feature design)
• Some achieve high accuracies, but take a long time to recognise
• How can we improve efficiency?
• Can we improve codebook learning and feature matching?
Related Work
• Vector quantisation by random forest [Moosmann et al.
ECCV06]
• For image segmentation
[Shotton et al. CVPR08]
• Can we apply it in video analysis?
• Pyramid match kernel [Graumann and Darrell. ICCV05]
• Image recognition [Graumann and Darrell. ICCV05] , scene
classification [Lazebnik et al. CVPR06], etc.
• Spatiotemporal relationship match [Ryoo and Aggarwal
ICCV09]
RyooMoosmann
and Aggarwal
NIPS2006
ICCV09
Graumann
and Darrell.
ICCV05
S. Lazebnik C. Schmid J. Ponce “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories” ,
CVPR 2006
K. Grauman and T. Darrell “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features” ICCV2005
F. Moosmann, B. Triggs, and F. Jurie. “Fast discriminative visual codebooks using randomized clustering forests” NIPS2006
J. Shotton, M. Johnson, and R. Cipolla. “Semantic texton forests for image categorization and segmentation” CVPR2008
M. S. Ryoo and J. K. Aggarwal. “Spatio-temporal relationship match: Video structure comparison for recognition of copmlex human
activities” ICCV2009
Our Contributions
• Our contribution is three-fold:
Local
appearance
+ structural
information
Efficient
codebook
learning
High run-time
performance
Comparison with existing approaches
Typical
Our Approaches
Method
Semantic
Texton Forest
K-means
Clustering
Feature
Encoding
Feature
Matching
Efficient
Slow for Large Codebook
Robust
The “Bag of Words” (BOW) Model
PSRM
Structural
Lacks Information
Structural Information
Quantisation
Hierarchical
Matching Error
Overview
V-FAST
Corner
Spatiotemporal
Cuboids
Spatiotemporal
Semantic
Texton Forest
PSRM
K-means Forest
Results
BOST
Feature
detection
Feature
extraction
Random Forest
Classifier
Feature
matching
Classification
Feature detection
Feature
detection
V-FAST
Corner
Spatiotemporal
Cuboids
Spatiotemporal
Semantic
Texton Forest
PSRM
K-means Forest
Results
BOST
Random Forest
Classifier
V-FAST: Spatiotemporal Feature Detection
• A novel spatiotemporal interest point detector
• Inspired from FAST [Rosten and Drummond ECCV2006]
• A cascade of three FAST detectors.
• Consider three orthogonal Bensenham circles
• Features:
• Very fast!
E. Rosten and T. Drummond. “Machine learning for high-speed corner detection” ECCV 2006
Feature extraction
Feature
extraction
V-FAST
Corner
Spatiotemporal
Cuboids
Spatiotemporal
Semantic
Texton Forest
PSRM
K-means Forest
Results
BOST
Random Forest
Classifier
Building a codebook using STF
• Extract small video cuboids at detected keypoints
• Visual codebook using STF:
Random forest based
codebook
• Efficient visual codebook
• One feature → multiple
codewords.
• Quantisation and partial matching
“Textonises” patches
recursively
• Work on pixels directly
• Hierarchical splits
Feature extraction
Feature
matching
V-FAST
Corner
Spatiotemporal
Cuboids
Spatiotemporal
Semantic
Texton Forest
PSRM
K-means Forest
Results
BOST
Random Forest
Classifier
Pyramidal Spatiotemporal Relationship Match (PSRM)
A set of “rules” (in different
colours) are designed to
describe spatiotemporal
structure of features.
Pyramidal Spatiotemporal Relationship Match (PSRM)
TREE N
TREE N
Pyramidal Spatiotemporal Relationship Match (PSRM)
Apply on each tree in the STF
Typical pyramid match kernel
• Apply on all each “association
rules”
Pyramid match kernel:
• We apply it semantically but not
spatially
• Assumption: neighbouring
codewords are similar
• Merging the ajacent nodes, instead
of merging ajacent spatial bins
Our Pyramid Match Kernel
Pyramidal Spatiotemporal Relationship Match (PSRM)
Pyramid
Match Kernel (PMK)
Multiple Structural
Relationship Histograms
Continuous action recognition
Our Approach
Classification
Classification
Classification
Classification
Classification
Classification
Classification
Classification
Classification
Features
Features
FeaturesFeatures
Features
Features
Features
Features
Features
Features
Classification
Typical Methods
Classification
Classification!
V-FAST
Corner
Spatiotemporal
Cuboids
Spatiotemporal
Semantic
Texton Forest
PSRM
K-means Forest
Results
BOST
Random Forest
Classifier
Combined Classification
• PSRM and BOST (bag of spatiotemporal
textons) are classified indenpendently:
• PSRM: k-means forest
Originally uses for
NN approximation
Use PSRM as the
matching kernel
Combined with the
BOST model for
final results
M.Muja and D. G. Lowe. “Fast approximate nearest neighbors with automatic algorithm” VISAPP2009
K-means tree figure courtesy of David Aldavert Miró : http://www.cvc.uab.cat/~aldavert/plor/
Experiments
• Short video sequences (50 frames ~ 2 seconds) are extracted from the
input video.
• Sampling frequency is 5 frames for experiment and 1 frame for the
laptop demo. (so it is a frame-by-frame recognition)
• Two datsets are used for performance evaluation:
KTH dataset
• The standard benchmark
• Six classes, with viewpoint changes, illumination changes, zoom ,
etc.
UT dataset (for ICPR contest on Semantic Description of Human Activities 2010)
• Human interactions, 6 classes of actions, cluttered background
Hardware specifications
UT interaction dataset
• Intel Core i7 920 (for accuracy and speed tests)
• Core 2KTH
Duodataset
P9400 (for laptop demo)
Experiments: Results (KTH dataset)
Comparison with recent state-of-the-art
snippet: subsequence
level recognition
93.55
our method (snippets)
95.67
our method (sequence)
Point clouds (CVPR2009)
93.17
Vocabulary Forest(CVPR2008)
93.17
sequence: major voting
of subsequence labels
93.43
Shape-motion tree (ICCV2009)
leave-of-out-crossvalidation
94.15
Info. Max. (CVPR2008)
94.53
Neighbourhood (CVPR2010)
95.33
CCA (CVPR2007)
96.7
Mined features (ICCV2009)
90
92
94
96
98
100
Leave-of-out-crossvalidation
• Comparable to most state-of-the-art.
• Around ~3% slower than the top performer
• Is it a sensible trade-off?
• Useful for many more practical applications. (surveillance, robotics, etc.)
Experiments: Results
• Results: UT interaction dataset
~20% performance improved by
simply combining the class labels!
PSRM and BOST gave low accuracies
when applied separately.
• Run time performance
Can be further optimised (e.g.
GPU, mult-core processing)
< 25 fps, but enough for most
real-time applications
Demo video
•
Frame-level recognition
•
Potential improvement:
•
•
Delay (~1s) in recognition results (Depends on the subsequence length )
Please visit: “http://www.youtube.com/watch?v=eD5b8d7hV6E” on the Internet for the full demo
video.
Conclusions
THANK YOU VERY MUCH
THE END
Extra slide
• Formulation of V-FAST
Extra slide
• Formulation of STF
• Split function model:
• Split criteria --- Information gain:
Extra slide
• Formulation of STF
Extra slide
• Formulation of PSRM
• Step 1 Feature matching:
• Step 2 Semantic PMK over histogram
Extra slide
• Formulation of Classifier training
• Optimising the clusters of feature which maximise the PMK with the
mean.
Extra slide
• Experiment parameters
Extra slide
• Confusion matrix:
Extra slide
PSRM
BOST
Kernel kmeans forest
Random
forest
Weighted combination
Action recognition results
(class labels)