Transcript PPT
New Features and Insights for
Pedestrian Detection
Stefan Walk, Nikodem Majer, Konrad Schindler, Bernt Schiele
1
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
2
Authors (1/4)
• Stefan Walk
– Experience
• 2007-, PhD Candidate in Computer Science, Technische
Universität Darmstadt
• 2003-2007, Diploma in Physics, Technische Universität
Darmstadt, Germany 2007
– Research interest
• People Detection
• Detecting from video data (utilizing motion information)
– Papers
• Multi-cue Onboard Pedestrian Detection (CVPR09)
3
Authors (2/4)
• Nikodem Majer
– Experience
• 2007-, PhD Candidate in Computer Science, Technische
Universität Darmstadt
– Research interest
• …
– Papers
• …
4
Authors (3/4)
• Konrad Schindler
– Experience
• 2009-: assistant professor, TU Darmstadt, Germany
• 2007-2008: post-doc, ETH Zurich
• 2004-2006: post-doc, Monash University,
Melbourne/Australia
• 2001-2003: research assistant, Graz University of Technology, Austria
– Research interest
• computer vision (3D scene analysis, biologically inspired vision,
tracking)
• image processing, pattern recognition, machine learning,
photogrammetry
– Papers
• PAMI10, CVPR10, ICCV10…
5
Authors (4/4)
• Bernt Schiele
– Experience
• 1999-2004, Assistant Professor, ETH Zurich, Switzerland
• 1997-2000, Postdoctoral Associate and Visiting Assistant Professor,
MIT and Cambridge, MA, USA
• 1994, Visiting researcher at CMU
• AE of PAMI, IJCV, AC of ECCV’08, CVPR’09, ICCV’09,
PC of ICCV 2011
– Research interest
• Perceptual computing, human-computer interfaces
– Papers
• …
6
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
7
Abstract (1/2)
• Despite impressive progress in people detection the
performance on challenging datasets like Caltech
Pedestrians or TUD-Brussels is still unsatisfactory
• In this work we show that motion features derived from
optic flow yield substantial improvements on image
sequences, if implemented correctly—even in the case of
low-quality video and consequently degraded flow fields
• Furthermore, we introduce a new feature, self-similarity
on color channels, which consistently improves
detection performance both for static images and for
video sequences, across different datasets. In combination
with HOG, these two features outperform the state-of-theart by up to 20%.
8
Abstract (2/2)
• Finally, we report two insights concerning detector
evaluations, which apply to classifier-based object
detection in general
• First, we show that a commonly under-estimated detail of
training, the number of bootstrapping rounds, has a
drastic influence on the relative (and absolute)
performance of different feature/classifier combinations
• Second, we discuss important intricacies of detector
evaluation and show that current benchmarking
protocols lack crucial details, which can distort
evaluations
9
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
10
Main contribution
• First, we introduce a new feature based on selfsimilarity of low level features, in particular color
histograms from different sub-regions within the detector
window
• The second main contribution is to establish a standard
what pedestrian detection with a global descriptor can
achieve at present, including a number of recent advances
which we believe should be part of the “best practice”, but
have not yet been included in systematic evaluations
• Our third main contribution are two important insights
that apply not only to pedestrian detection, but more
generally to classifier-based object detection.
(1)Bootstrapping is very important. (2)The existing
evaluation protocol is insufficient
11
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
12
Outline
• 本文的风格与该实验室文章一贯的风格类似
– 在自己提出的两个数据库上(Caltech Pedestrian, TUD-Brussel)测试
当前人体检测领域不同的特征与不同的分类器,评价这些算法的优
劣(性能越高的算法关注度越高)
– 自己提出新特征并通过实验给出结论——“在原始方法的基础上引
入我们的特征可以进一步提升人体检测系统的性能”
• Related Features
– Haar-like, VJ 2001年成功用于人脸检测领域
– HOG (Histogram of Oriented Gradient), Dalal 2005年成功用于人
体检测领域
– HOF (Histogram of Flow), Dalal 2006年提出,应用于视频人体检测
– HOG-LBP 王晓宇 2009年应用于人体检测领域,高性能
– CSS (Color Self-similarity), 本文提出
• Related Classifiers
– SVM
– MPLBoost (Multiple Pose Boosting), Dollar 2008年提出
13
Haar-like feature (1/2)
• Haar-like feature
– 图像内部特定模式的两个矩型内部像素和之差
– 采用积分图可以快速计算Haar特征响应值
• Haar特征的变种
– 45, 22.5, 11.25度…,仍然受限于“矩形”
– 任意多边形区域形状的Haar特征(CVPR10)
传统Haar特征
Haar特征的积分图计算
14
Haar-like feature (2/2)
• 任意形状的Haar特征
– 任意多边形区域的像素和可以等价为一系列梯形区域的像素和
– 梯形区域的像素和等价于两个直角三角形的像素差
– 算法关键是计算直角三角形区域的积分图,参数(x,y,斜率)
15
HOG feature (1/1)
• HOG feature-梯度方向直方图
– 输入图像的Gamma校正
– 计算输入图像各像素的梯度幅值与方向
– 梯度幅值高斯加权,使用三线形插值计算各个单元梯度方向的直方
图
– 相邻的单元直方图归一化得到最终的特征向量
HOG特征计算流程
HOG特征的三线性插值
16
HOF feature (1/1)
• HOF feature-光流直方图
– 计算输入图像的x、y方向的光流 (例如LK算法等等)
– 对于特定区域对,根据对应像素点的x、y方向光流差异,计算光流
梯度幅值与方向
– 根据光流梯度方向使用光流梯度幅值构建直方图
Original 3x3 IMHwd (Internal Motion Boundary wavelet diff.)
17
HOG-LBP (1/1)
• HOG-LBP feature:将HOG与LBP串联起来
– HOG:将三线性插值与高斯加权替换为卷积
– LBP (Local Binary Pattern):局部区域的二值模式
– 该特征在INRIA人体数据库上取得了迄今为止的最好结果
LBP特征示意
18
CSS (1/1)
• CSS feature:颜色自相似度
– 对于8x8的图像区域,采用三线性插值计算颜色直方图
– We experimented with different color spaces, including 3x3x3
histograms in RGB, HSV, HLS and CIE Luv space, and 4x4
histograms in normalized rg, HS and uv, discarding the intensity
and only keeping the chrominance. Among these, HSV worked
best, and is used in the following
– 利用这些直方图之间的相似度作为特征向量,作者尝试了L1-norm
,L2-norm, Chi-square distance与直方图交,发现直方图交性能作
为优秀
– 在实现中,对于64x128的窗口划分为8x16=128个8x8区域,得到128
个直方图,直方图相似度一共有128x127/2=8,128个
• Furthermore, second order image statistics, especially
co-occurrence histograms, are gaining popularity,
pushing feature spaces to extremely high dimensions
19
Classifiers
• SVMs
– Linear SVM
– Histogram Intersection Kernel SVM (HIKSVM)
• MPLBoost: Multiple Pose Boosting (In ECCV08 workshop)
– 将初始训练样本分成K个子集,同时训练K个强分类器,分类器输出
值是这K个分类器响应值的最大值
– 在训练过程中,只有被所有强分类器错分的样本权值保持不变,否
则该样本权值降低
– 在检测过程中,对于一个扫描窗口,如果有一个强分类器认为是
positive就是positive,如果所有强分类器认为是negative才是
negative
20
Evaluation protocol (1/4)
• 人体检测系统衡量标准的不合理之处
– 现阶段用于确定“一个检测窗口是否命中一个人体”依据VOC准则
,交并比>50%
– 没有明确规定如何应对人群中人体与检测框的匹配问题
21
Evaluation protocol (2/4)
• We split the set of annotations and detections into
considered and ignored sets
• Annotations can fall into the ignored set because of size,
position, occlusion level, aspect ratio or non-pedestrian
label in the Caltech setting
• Detections can fall into the ignored set because of size. E.g.
if we wish to evaluate on 50-pixel-or-taller, unoccluded
pedestrians, any annotation labeled as occluded and any
annotation or detection <50 pixels falls in the ignored set
22
Evaluation protocol (3/4)
• For considered detections
– If they match a considered annotation they count as true positive
– If they match no annotation, or only one that has already been
matched to another detection, they count as false positive
– If they match an ignored annotation they are discarded
• For ignored detections
– If an ignored detection matches an ignored annotation, it should be
discarded
– If an ignored detection matches no annotation, it seems reasonable
to discard it, but this may introduce a bias
– If an ignored detection matches a considered annotation, count it
as a true positive
23
Evaluation protocol (4/4)
• To summarize, there is no single correct way how to
evaluate on a subset of annotations, and all choices have
undesirable side effects
• It is therefore imperative that published results are
accompanied by detections, and that evaluation scripts are
made public
• As there are boundary effects in almost any setting (all
realistic datasets have a minimum annotation size), it must
be possible for others to verify that differences are not
artifacts of the evaluation
24
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
25
Database
• INRIA人体数据库
• CalTech人体数据库
–
–
–
–
2009年Dollar提出
视频序列
训练集包括192k人体,测试集155k人体
各种困难的情况,光照、遮挡、小尺度(人体高度3像素的都有)、人
群…
– 标注非常完善,方便测试检测器的各种特性
• TUD-Brussel数据库
– 2009年Wojek提出
– 视频序列
– 仅有训练集,包括1,326个人,各种尺度各种视角
• 所有实验训练样本尺寸统一64x128,人体大小48x96,对齐
26
Experiment1 – HOG-LBP (1/1)
INRIA
TUD
• However, while we were able to reproduce their good
results on INRIA Person, we could not gain anything
with LBPs on other datasets. They seem to be affected
when imaging conditions change (in our case, we suspect
demosaicing artifacts to be the issue)
27
Experiment2 – Color information (1/2)
TUD
TUD
• More than 1fppi is usually not acceptable in any practical
application
• Self-similarity of colors is more appropriate than using the
underlying color histograms directly as feature
• On the contrary, adding the color histogram values directly
even hurts the performance of HOG
28
Experiment2 – Color information (2/2)
• Why CSS is effective?
– Self-similarity encodes relevant parts like clothing and visible skin
regions
• Why directly using color information shows no
improvements?
– The training data was recorded with a different camera and in
different lighting conditions than the test data, so that the weights
learned for color do not generalize from one to the other. (Similar
reason to Haar feature)
29
Experiment3 – Bootstrap (1/2)
• With less than two bootstrapping rounds, performance
depends heavily on the initial training set
• At least two retraining rounds are required in
HOG+linear SVM framework
• This problem will be alleviated by using more initial
negative samples, not solved
30
Experiment3 – Bootstrap (2/2)
• For boosting classifiers (Fig. 3(c))3, the situation is worse:
although mean performance seems stable over
bootstrapping rounds, the overall variance only decreases
slowly—the initial selection of negative samples has a
high influence on the final performance even after 3
bootstrapping rounds
31
Experiment4 – Seed & self similarity(1/1)
TUD
• Self-similarity on HOG blocks shows little improvement
• It is important to make sure the result does not depend on
the initial selection of negative samples, e.g. by retraining
enough rounds with SVMs
32
Experiment5 – CalTech pedestrian (1/2)
33
Experiment5 – CalTech pedestrian (2/2)
• Color self-similarity is indeed complementary to gradient
information
• The motion information contributes greatly on pedestrian
detection. The reason that HOF works so well on the
“near” scale is probably that during multi-scale flow
estimation compression artifacts are less visible at
higher pyramid levels, so that the flow field is more
accurate for larger people
• The performance of all evaluated algorithms is abysmal
under heavy occlusion
34
Experiment6 – Haar feature (1/1)
TUD
• Judging from the available research our feeling is that
Haar features can potentially harm more than they
help
35
Outline
•
•
•
•
•
•
Authors
Abstract
Main contributions
Algorithms
Experiments
Conclusion
36
Conclusion
• 主要结论
–
–
–
–
运动信息会对视频中的人体检测起到很大的促进作用(HOG)
颜色近似度对于人体检测器的性能有很大的提升(CSS)
Bootstrap在检测器的学习过程起到关键作用
现阶段的物体检测评价标准不合理…
• 次要结论
– LBP仅仅对于INRIA数据库有效
– HOG-linear SVM至少需要2轮bootstrap
– 使用Haar特征辅助人体检测可能弊大于利
37
Thanks!!
38