PASCAL_Presentation_..

Transcript PASCAL_Presentation_..

Pascal Grand
Challenge
Felix Vilensky
19/6/2011
1
Outline
• Pascal VOC challenge framework.
• Successful detection methods
Object Detection with Discriminatively Trained Part Based Models
(P.F.Felzenszwalb et al.)-”UoC/TTI” Method.
o Multiple Kernels for Object Detection (A.Vedaldi et al.)”Oxford\MSR India” method.
o
• A successful classification method
o Image Classification using Super-Vector Coding of Local Image
Descriptors (Xi Zhou et al)-NEC/UIUC Method.
• Discussion about bias in datasets.
• 2010 Winners Overview.
2
Pascal VOC
Challenge
Framework
The PASCAL Visual Object Classes
(VOC) Challenge
Mark Everingham · Luc Van Gool ·
Christopher K. I. Williams · John Winn ·
Andrew Zisserman
3
Pascal VOC Challenge
•
•
•
•
•
Classification Task.
Detection Task.
Pixel-level segmentation.
“Person Layout” detection.
Action Classification in still images.
4
Classification Task
100%
At least one
bus
5
Detection Task
100%
Predicted bounding box should overlap by at least 50% with ground
truth!!!
6
Detections “near misses”
Didn’t fulfill the BB overlap criterion
7
Pascal VOC Challenge-The Object Classes
8
Pascal VOC Challenge-The Object Classes
Images retrieved from flicker website.
9
Pixel Level Segmentation
Image
Object
segmentation
Class
segmentation
10
Person Layout
11
Action Classification
• Classification among 9 action classes.
100%
Speaking on the
phone
100%
Playing the
guitar
12
Annotation
•
•
•
•
•
Class.
Bounding Box.
Viewpoint.
Truncation.
Difficult (for classification\detection).
13
Annotation Example
14
Evaluation
A way to compare
between different
methods.
Recall 
# True Positives
# False Negatives + #True Positives
Precision=
# True Positives
# False Positives + #True Positives
• Precision\Recall Curves.
• Interpolated Precision.
• AP(Average Precision)
15
Evaluation-Precision\Recall Curves(1)
• Practical Tradeoff between precision and recall.
Rank
1
2
3
4
5
6
7
8
9
10
g.t.
Yes
No
Yes
No
Yes
No
No
No
No
No
Precision
1/1
1/2
2/3
2/4
3/5
3/6
3/7
3/8
3/9
3/10
Recall
0.2
0.2
0.4
0.4
0.6
0.6
0.6
0.6
0.6
0.6
p( r )
• Interpolated Precision- Pinterp ( r )  max
r :r  r
16
Evaluation-Precision\Recall Curves(2)
17
Evaluation-Average Precision(AP)
AP 
1
 Pinterp (r )
11 r{0,0.1,.....,1}
AP is for determining who’s the best.
18
Successful Detection
Methods
19
UoC/TTI Method Overview
(P.Felzenszwalb et al.)
• Joint winner in 2009 Pascal VOC challenge with the
Oxford Method.
• Award of "lifetime achievement“ in 2010.
• Mixture of deformable part models.
• Each component has global template +
deformable parts
o HOG feature templates.
• Fully trained from bounding boxes alone.
20
UoC/TTI Method – HOG Features(1)
• [-1 0 1] and its transpose
Gradient.
• Gradient orientation is discretized into one of p values.
 p ( x, y ) 
Contrast sensitive: B1 ( x, y )  round 
 mod p
 2 
 p ( x, y ) 
Contrast insensitive: B2 ( x, y )  round 
 mod p



 r ( x, y ) if b  B( x, y )
F ( x, y )b  
0 otherwise
• Pixel-level features
Cells of size k.
• 8-pixel cells(k=8).
• 9 bins contrast sensitive +18 bins contrast insensitive =total 27
bins!
Soft binning
21
UoC/TTI Method – HOG Features(2)
…27
22
UoC/TTI Method – HOG Features(3)
• Normalization.
N  , (i, j )  ( C (i, j )  C (i   , j )  C (i, j   )  C (i   , j   ) )
2
2
2
2
1
2
 ,   {1,1}
• Truncation.
• 27 bins X 4 normalization factors= 4X27 matrix.
• Dimensionality Reduction to 31.
sum over bins
N1 : 27 bins 
V1
sum over NFs
B1 : 4 NFs 
V5
sum over bins
N 2 : 27 bins 
V2
sum over NFs
B2 : 4 NFs 
V6
sum over bins
N 3 : 27 bins 
V3
.....
sum over bins
N 4 : 27 bins 
V4
sum over NFs
B27 : 4 NFs 
V31
23
UoC/TTI Method – Deformable Part Models
• Coarse root.
• High-Resolution deformable parts.
• Part (Anchor position, deformation cost, Res. Level)
24
UoC/TTI Method – Mixture Models(1)
• Diversity of a rich object category.
• Different views of the same object.
• A mixture of deformable part models for each class.
• Each deformable part model in the mixture is called
a component.
25
UoC/TTI Method – Object Hypothesis
Slide taken from the methods presentation
26
UoC/TTI Method –Models(1)
6 component person model
27
UoC/TTI Method –Models(2)
6 component bicycle model
28
UoC/TTI Method – Score of a Hypothesis
Slide taken from method's presentation
29
UoC/TTI Method – Matching(1)
Best part location
Root location
score( p0 )  max ( score( p0 ,......, pn ))
p1 ,...., pn
• “Sliding window approach” .
• High scoring root locations define detections.
• Matching is done for each component separately.
30
UoC/TTI Method – Matching(2)
31
UoC/TTI Method – Post Processing &
Context Rescoring
Slide taken from method's presentation
32
UoC/TTI Method – Training & DM
• Weakly labeled data in Training set.
• Latent SVM(LSVM) training with z  (c, p0 ,......, pn ) as
latent value.
• Training and Data mining in 4 stages:
c
Optimize z
Add hard negative
examples
Optimize β
Remove easy negative
examples
33
UoC/TTI Method – Results(1)
34
UoC/TTI Method – Results(2)
35
Oxford Method Overview
(A.Vedaldi et al.)
Regions with different
scales and aspect ratios
6 feature channels
3 level spatial pyramid
Cascade :3 SVM classifiers
with 3 different kernels
Post Processing
36
Oxford Method – Feature Channels
• Bag of Visual Words- SIFT descriptors are extracted and
quantized in a vocabulary of 64 words.
• Dense words (PhowGray, PhowColor)- Another set of SIFT
Descriptors are then quantized in 300 visual words.
• Histogram of oriented edges (Phog180, Phog360)-Similar to
the HOG descriptor used by the ”UoC/TTI” Method with 8
orientation bins.
• Self-similarity features (SSIM).
37
Oxford Method – Spatial Pyramids
38
Oxford Method – Feature Vector
Chart is taken from the methods presentation
39
Oxford Method – Discriminant Function(1)
M
C (h )   yii K (h R , hi ).
R
i 1
h i , i  1,........, M are the histogram collections acting
as support vectors for a SVM.
yi  {1,1}
K is a positive definite kernel.
h R is the collection of normalized feature histograms {h Rfl }. f is the feature channel.
l is the level of the spatial pyramid.
40
Oxford Method – Discriminant Function(2)
• The kernel of the discriminant function 𝐶(ℎ𝑅 is a linear
combination of histogram kernels:
K (h R , hi )   d fl K (h Rfl , hifl )
fl
• The parameters i and the weights d fl  0 (total 18)are learned
using MKL(Multiple Kernel Learning).
• The discriminant function 𝐶(ℎ𝑅 is used to rank candidate
regions R by the likelihood of containing an instance of the
object of interest.
41
Oxford Method – Cascade Solution(1)
• Exhaustive search of the best candidate regions R , requires a
number of operations which is O(MBN):
o
o
o
N – The number of regions.
R
M – The number of support vectors in C ( h ) .
B – The dimensionality of the histograms.
N  105 , B  104 , M  103
• To reduce this complexity a cascade solution is applied.
• The first stage uses a “cheap” linear kernel to evaluate C(h R ) .
• The second uses a more expensive and powerful quasi-linear
kernel.
• The Third uses the most powerful non-linear kernel.
• Each stage evaluates the discriminant function on a smaller
number of candidate regions.
42
Oxford Method – Cascade Solution(2)
Type
Evaluation Complexity
Linear
O(N)
Quasi-Linear
O( BN)
Non-Linear
O(MBN)
Stage 1- Linear
Stage 2- Quasi-linear
Stage 3- Non linear
43
Oxford Method – Cascade Solution(3)
Chart is taken from the methods presentation
44
Oxford Method – The Kernels
• All the before mentioned kernels are of the following form:
 B

K ( h, h )  f   g (hb , hb' ) 
 b1

i
f :RR
g : R2  R
b is a histogram bin index.
• For Linear kernels both f and g are linear. For quasi-linear
kernels only f is linear.
45
Oxford Method – Post-Processing
• The output of the last stage is a ranked list of 100
candidate regions per image.
• Many of these regions correspond to multiple
detections.
• Non- Maxima Suppression is used.
• Max 10 regions per image remain.
46
Oxford Method – Training/Retraining(1)
• Jittered\flipped instances are used as positive samples.
Addition
Error(Overlap <20%)
Testing
Training Data
Training
Classifier
• Training images are partitioned into two subsets.
• The classifiers are tested on each subset in turn adding new
hard negative samples for retraining.
47
Oxford Method – Results(1)
48
Oxford Method – Results(2)
49
Oxford Method – Results(3)
Training and testing on VOC2009.
Training and testing on VOC2007.
Training and testing on VOC2008.
Training on VOC2008 and testing on VOC2007.
50
Oxford Method – Summary
51
A Successful
Classification
Method
52
NEC/UIUC Method Overview
(Xi Zhou Kai Yu et al.)
• A winner in the 2009 Pascal VOC classification challenge.
• A framework for classification is proposed.
Descriptor Coding:
Super Vector Coding
The important
part!
Spatial Pyramid
Pooling
Classification:
Linear SVM
53
NEC/UIUC Method – Notation
X  Descriptor Vector.
 ( X )  Coding function.
f ( X )  Unknown function on local features.
fˆ ( X )  Approximating function.
Y  Set of descriptor vectors.
54
NEC/UIUC Method – Descriptor Coding(1)
Vector Quantization Coding
fˆ ( X )  W T  ( X )
W  [W1 ,W2 ,.....,WK ]T
 ( X ) is the code of X.
55
NEC/UIUC Method – Descriptor Coding(2)
Super Vector Coding
W  [W1T ,W2T ,.....,WK T ]T
 ( X ) =[C1 ( X )  X T ,C2 ( X )  X T ,.....,C K ( X )  X T ]T
Ck ( X )  1 if X belongs to cluster k, otherwise C K ( X )  0.
T
fˆ ( X )  W T  ( X )   Ck ( X )  Wk X
k
56
NEC/UIUC Method – Spatial Pooling
2X2
1X1
1
 (Y ) 
N
C

k 1
1
pk
 ( X )
X Yk
1
 (Y ) 
N
C

k 1
3X1
1
pk
 ( X )
X Yk
1
 (Y ) 
N
C

k 1
1
pk
 ( X )
X Yk
N-The size of a set of local
descriptors.
Y-The set of local descriptors.
 s (Y )  [  (Y111 ), (Y112 ), (Y122 ), (Y212 ), (Y222 ), (Y113 ), (Y123 ), (Y133 )]
To linear SVM classifier on
𝛹𝑠 (𝑌 . . .
57
NEC/UIUC Method – Results(1)
•
•
•
•
SIFT
PCA
128-dimentional vectors over a grid
with spacing of 4 pixels on three patch
levels (16x16,25x25 and 31x31).
Reduction of
dimensionality
to 80.
Comparison of non- linear coding methods.
Comparison with other methods.
Impact of codebook size(tested on validation set).
Images and visualization of patch- level score(using 𝑔(𝑋 ).
58
NEC/UIUC Method – Results(2)
|C|=512
59
NEC/UIUC Method – Results(3)
|C|=2048
60
NEC/UIUC Method – Results(4)
61
Bias in Datasets
Unbiased Look at Dataset Bias
Antonio Torralba
Massachusetts Institute of Technology
Alexei A. Efros
Carnegie Mellon University
62
Name The Dataset
• People were asked to guess, based on three images, the
dataset they were taken from.
• People, who worked in the field got more than 75% correct.
63
Name The Dataset - The Dataset Classifier
• 4 classifiers were trained to play the “Name The
Dataset” game.
• Each classifier used different image descriptoro 32X32 thumbnail grayscale and color.
o Gist.
o Bag of HOG visual words.
• 1000 images were randomly sampled from the
training portions of 12 datasets.
• The classifier was tested on 300 random images
from each of the test sets repeated 20 times.
64
Name The Dataset - The Dataset Classifier
• The best classifier performs at 39% (chance is about
8%)!!!
Recog. Performance vs.
Number of training
examples per class
Confusion Table
65
Name The Dataset - The Dataset Classifier
• Performance is 61% on car images from 5 different
datasets (chance is 20%).
Car images from different datasets
66
Cross - Dataset Generalization(1)
• Training on one dataset while testing on another.
• Dalal&Triggs detector(HOG detector + linear SVM) for the
detection task.
• Bag of Words approach with a Gaussian kernel SVM for the
classification task.
• The “car” and “person” objects are used.
• Each classifier(for each dataset) was trained with 500 positive
images and 2000 negative ones.
• Each detector (for each dataset) was trained with 100 positive
images and 1000 negative ones.
• Testing classification with 50 positive and 1000 negative
examples.
• Testing detection 10 positive and 20000 negative examples.
• Each classifier\detector ran 20 times and the results
averaged.
67
Cross - Dataset Generalization(2)
68
Cross - Dataset Generalization(3)
Logarithmic dependency on the amount of training samples.
69
Types Of Dataset Biases
•
•
•
•
Selection Bias.
Capture Bias.
Label Bias.
Negative Set Bias-What the dataset considers to be
“the rest of the world”.
70
Negative Set Bias-Experiment(1)
• Evaluation of the relative bias in the negative sets of
different datasets.
• Training detectors on positives and negatives of a
single dataset.
• Testing on positives from the same dataset and on
negatives from all 6 datasets combined.
• The detector was trained with 100 positives and
1000 negatives.
• For testing, multiple runs of 10 positive examples for
20,000 negatives were performed.
71
Negative Set Bias-Experiment(2)
72
Negative Set Bias-Experiment(3)
• A large negative train set is important for
discriminating object with similar contexts in images.
73
Dataset’s Market Value(1)
• A measure of the improvement in performance
when adding training data from another dataset.
α is the shift in number of training samples between different datasets to
achieve the same average precision
APj j ( n )  APi j ( n /  )
APi j ( n ) is obtained when training on dataset i
and testing on dataset j.
74
Dataset’s Market Value(2)
This table shows the sample value (“market value”) for a “car” sample across datasets.
A sample from another dataset worth more than a sample from the original dataset!!!
75
Bias In Datasets- Summary
• Datasets, though gathered from the internet, have
distinguishable features of their own.
• Methods performing well on a certain dataset can
be much worse on another.
• The Negative set has at least the same importance
as the positive samples in the dataset.
• Every dataset has it own “Market Value”.
2010 Winners
Overview
77
Pascal VOC 2010-Winners
Classification
Winner:
Detection
Winner:
Honourable Mentions:
NUSPSL_KERNELREGFUSING
Qiang Chen1, Zheng Song1, Si Liu1, Xiangyu Chen1,
Xiaotong Yuan1, Tat-Seng Chua1, Shuicheng Yan1, Yang
Hua2, Zhongyang Huang2, Shengmei Shen2
1National University of Singapore; 2Panasonic Singapore
Laboratories
NLPR_HOGLBP_MC_LCEGCHLC
Yinan Yu, Junge Zhang, Yongzhen Huang, Shuai Zheng,
Weiqiang Ren, Chong Wang, Kaiqi Huang, Tieniu Tan
National Laboratory of Pattern Recognition, Institute of
Automation, Chinese Academy of Sciences
MITUCLA_HIERARCHY
Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille,
Antonio Torralba
MIT, UCLA
NUS_HOGLBP_CTX_CLS_RESCORE_V2
Zheng Song, Qiang Chen, Shuicheng Yan
National University of Singapore
UVA_GROUPLOC/UVA_DETMONKEY
Jasper Uijlings, Koen van de Sande, Theo Gevers, Arnold
Smeulders, Remko Scha
University of Amsterdam
78
NUS-SPL Classification Method
79
NLPR Detection Method
80
Thank You….
81

PASCAL_Presentation_..

Transcript PASCAL_Presentation_..

Directory