Object Size ↔ Camera Viewpoint
Download
Report
Transcript Object Size ↔ Camera Viewpoint
CV輪講
Putting Objects in Perspective
藤吉研究室 土屋成光
2008年7月1日
Back ground
一般物体認識/画像シーン認識
– 低解像度
– 見えの違い
– 奥行きによるサイズの違い
⇒局所的な認識法が通用しない
人間は物体間の関係を利用
– 三次元構造のモデル化
– 局所的な認識手法を高精度に
Putting Objects in Perspective
Derek Hoiem,Alexei A. Efros,Martial Hebert
Carnegie Mellon University Robotics Institute
CVPR2006
Understanding an Image
Today: Local and Independent
検出結果
Local Object Detection
False
Detections
True
Detection
Missed
Missed
True
Detections
Local Detector: [Dalal-Triggs 2005]
Object Support
Surface Estimation
Image
V-Left
Support
Vertical
Sky
V-Center
V-Right
V-Porous
Object
Surface?
Support?
V-Solid
[Hoiem, Efros, Hebert ICCV 2005]
Software available online
Object Size in the Image
Image
World
Object Size ↔ Camera Viewpoint
Input Image
Loose Viewpoint Prior
Object Size ↔ Camera Viewpoint
Input Image
Loose Viewpoint Prior
Object Size ↔ Camera Viewpoint
Object Position/Sizes
Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes
Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes
Viewpoint
Object Size ↔ Camera Viewpoint
Object Position/Sizes
Viewpoint
Efficient from surface and viewpoint
Image
P(surfaces)
P(viewpoint)
P(object)
P(object | surfaces)
P(object | viewpoint)
Efficient from surface and viewpoint
Image
P(object)
P(surfaces)
P(viewpoint)
P(object | surfaces, viewpoint)
Scene Parts Are All Interconnected
Objects
Camera Viewpoint
3D Surfaces
Input to Algorithm
Object Detection
Surface Estimates
Viewpoint Prior
Local Car Detector
Local Ped Detector
Local Detector: [Dalal-Triggs 2005]
Surfaces: [Hoiem-Efros-Hebert 2005]
Approximate Model
Objects
Viewpoint
3D Surfaces
Inference over Tree
Viewpoint
θ
Local Object
Evidence
Local Object
Evidence
Objects
o1
Local Surface
Evidence
...
on
Local Surface
Evidence
Local Surfaces
s1
…
sn
Viewpoint estimation
Viewpoint Final
Likelihood
Likelihood
Viewpoint Prior
Height
Horizon
Height
Horizon
Object Identitie
Local detector
Surface Geometry
Probability map
Object detection
Car: TP / FP
Initial (Local)
Final (Global)
4 TP / 2 FP
4 TP / 1 FP
3 TP / 2 FP
4 TP / 0 FP
Ped: TP / FP
Car Detection
Ped Detection
Local Detector: [Dalal-Triggs 2005]
Experiments on LabelMe Dataset
Testing with LabelMe dataset: 422 images
– 923 Cars at least 14 pixels tall
– 720 Peds at least 36 pixels tall
Each piece of evidence improves performance
Car Detection
Pedestrian
Detection
Local Detector from [Murphy-Torralba-Freeman 2003]
Can be used with any detector that outputs
confidences
Car Detection
Pedestrian
Detection
Local Detector: [Dalal-Triggs 2005]
(SVM-based)
Accurate Horizon Estimation
Horizon Prior
Median
Error:
90%
Bound:
8.5%
[Murphy-Torralba- [DalalTriggs
Freeman 2003]
2005]
4.5%
3.0%
Qualitative Results
Car: TP / FP Ped: TP / FP
Initial: 2 TP / 3 FP
Final: 7 TP / 4 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Qualitative Results
Car: TP / FP Ped: TP / FP
Initial: 1 TP / 14 FP
Final: 3 TP / 5 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Qualitative Results
Car: TP / FP Ped: TP / FP
Initial: 1 TP / 23 FP
Final: 0 TP / 10 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Qualitative Results
Car: TP / FP Ped: TP / FP
Initial: 0 TP / 6 FP
Final: 4 TP / 3 FP
Local Detector from [Murphy-Torralba-Freeman 2003]
Geometric Context
Estimate surface
ground: green, sky: blue, vertical: red, o:porous, x: solid
Geometric Cues
Color
Texture
Location
Perspective
Robust Spatial Support
RGB Pixels
Superpixels
[Felzenszwalb and Huttenlocher 2004]
oversegmentation
Multiple Segmentations
Multiple Segmentations
…
Superpixels
単一のセグメントではセグメントエラーの可能性
複数のセグメント数でセグメンテーション
Labeling Segments
…
…
各セグメント結果を統合
Learn from training images
Homogeneity Likelihood
Label Likelihood
前準備
–
–
multiple segmentationの算出
各セグメントのラベルの算出 – ground, vertical, sky, or
“mixed”
boosted decision trees による密度計算
–
–
8 nodes per tree
Logistic regression version of Adaboost
[Collins and Schapire and Singer 2002]
Image Labeling
Labeled Segmentations
…
Learned from
training images
Labeled Pixels
meter
s
Summary & Future Work
Pe
d
Pe
d
Car
Reasoning in 3D:
• Object to object
• Scene label
• Object segmentation
meters
Conclusion
Image understanding is a 3D problem
– Must be solved jointly
This paper is a small step
– Much remains to be done
CV輪講
Recovering Occlusion Boundaries
from a Single Image,
Closing the Loop in Scene
Interpretation
藤吉研究室 土屋成光
2008年8月26日
Back ground
一般物体認識/画像シーン認識
– 低解像度
– 見えの違い
– 奥行きによるサイズの違い
⇒局所的な認識法が通用しない
人間は物体間の関係を利用
– 三次元構造のモデル化
– 局所的な認識手法を高精度に
Recovering Occlusion Boundaries
from a Single Image
Derek Hoiem,Andrew N. Stein,
Alexei A. Efros,Martial Hebert
Carnegie Mellon University Robotics Institute
ICCV’07
単画像からのオクルージョン理解
オクルージョン,境界理解
– 物体を探索する際に必須
– Edge, region, depthによって推定
手法の流れ
1. 千領域にセグメンテーション
Watershed with Pb soft boundaries
2. Region, Boundary, 3D Cuesの算出
depth : horizon + junction to ground
3. Boundaryの算出
Conditional random field (CRF)
4. Boundaryを用いて更にセグメンテーション
results
Boundary
Object popout
Closing the Loop
in Scene Interpretation
Derek Hoiem,Alexei A. Efros,Martial Hebert
Carnegie Mellon University Robotics Institute
CVPR’08
Putting Objects in Perspective
Car: TP / FP
Initial (Local)
Final (Global)
4 TP / 2 FP
4 TP / 1 FP
3 TP / 2 FP
4 TP / 0 FP
Ped: TP / FP
Car Detection
Ped Detection
Local Detector: [Dalal-Triggs 2005]
Scene Parts Are All Interconnected
Objects
Camera Viewpoint
3D Surfaces
with Occlusions
一般物体認識フレームワーク
Putting Objects in Perspective
シーン構造認識
Automatic Photo Pop-up
Occlusion, Boundary情報の利用
関係モデル
相互に関係
Putting Objects への利用
相互的に情報を利用することで高精度に
Car : Up, Ped : Down
群衆の境界線の精度が問題
Initial : Dalal-Triggs Iter 1 : Hoiem et al. Final : This paper
Photo popup への利用
Occlusion, Objectの利用により高精度化
まとめ
Occlusion/Boundaryの算出
– 一枚の画像からgeometry, depthなどを用いて算出
– 高精度なセグメンテーション
Occlusion/Boundaryの利用
– セグメンテーションによるエラーの低減
– 一般物体認識に有用
課題:
– 群衆などから得られるBoundaryの高精度化