Ce Liu Contour motion, motion estimation, and motion magnification Bill Freeman, joint work with , Edward H.

Download Report

Transcript Ce Liu Contour motion, motion estimation, and motion magnification Bill Freeman, joint work with , Edward H.

Ce Liu

Contour motion, motion estimation, and motion magnification

Bill Freeman, joint work with , Edward H. Adelson Yair Weiss Antonio Torralba Fredo Durand

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

The sun never sets on Microsoft vision research Beijing 116 E Bangalore 77E Seattle 122 W Cambridge, MA 71 W Cambridge, UK 0 2

It’s always lunchtime somewhere in Microsoft vision research Beijing 116 E Bangalore 77E Seattle 122 W Cambridge, MA 71 W Cambridge, UK 0 3

Outline

• • • Motion magnification and exaggeration.

Motion of contours Human assisted motion estimation, to find ground truth for difficult datasets 4

Motion Microscopy

Original sequence Magnified sequence

7

8

Seemingly simple examples where conventional motion analysis fails

Kanizsa square From real video

Frame 1

Frame 2

Output from the State-of-the-Art Optical Flow Algorithm Dancer Optical flow field T. Brox

et al

. High accuracy optical flow estimation based on a theory for warping.

ECCV

2004

Why you need to delay local decisions Corners Spurious junctions Lines Boundary ownership Flat regions Illusory boundaries

Frame 1

Frame 2

Output from the State-of-the-Art Optical Flow Algorithm Kanizsa square Optical flow field T. Brox

et al

. High accuracy optical flow estimation based on a theory for warping.

ECCV

2004

Challenge: Textureless Objects under Occlusion • • • Corners are not always trustworthy (junctions) Flat regions do not always move smoothly (discontinuous at illusory boundaries) How about boundaries?

– Easy to detect and track for textureless objects

Analysis of Contour Motions

• Our approach: simultaneous grouping and motion analysis – – – Multi-level contour representation Junctions are appropriated handled Formulate graphical model that favors good contour and motion criteria – Inference using importance sampling • Contribution – An important component in motion analysis toolbox for textureless objects under occlusion

Three Levels of Contour Representation –

Edgelets

: edge particles –

Boundary fragments

: a chain of edgelets with small curvatures –

Contours

: a chain of boundary fragments Forming boundary fragments: easy (for textureless objects) Forming contours: hard (the focus of our work)

Overview of our system

1. Extract boundary fragments 2. Gather local evidence of contour motion. 3. Boundary grouping and illusory boundary 4. Motion estimation based on the grouping

Boundary fragments and local evidence for motion

• Boundary fragments extraction in frame 1 – Steerable filters to obtain edge energy for each orientation band – – Spatially trace boundary fragments Boundary fragments: lines or curves with small curvature • Temporal edgelet tracking with uncertainties – Frame 1: edgelet (

x

,

y

, q ) – Frame 2: orientation energy of q – A Gaussian pdf is fit with the weight of orientation energy – 1D uncertainty of motion (even for T junctions) (a) (c) (b) (d)

Forming Contours: Boundary Fragments Grouping • Grouping representation:

switch variables

(attached to every end of the fragments) –

Exclusive

: one end connects to at most one other end –

Reversible

: if end (

i,t i

) connects to (

j,t j

) , then (

j,t j

) connects to (

i,t i

) Possible connections Reversibility 1 1

b

1 0

b

3

b

2 0 1 A legal contour grouping Another legal contour grouping 0

There are many candidate contour groupings (can you name some good grouping criteria?) Illusory boundaries corresponding to the groupings (generated by spline interpolation) Motion stimulus

Local spatial-temporal cues for grouping: (a) Motion similarity The grouping with higher motion similarity is favored KL( ) < KL( )

v y

Velocity space

v x

Motion stimulus

Local spatial-temporal cues for grouping: (b) minimize total curvature and length The grouping with smoother and shorter illusory boundary is favored Motion stimulus

Local spatial-temporal cues for grouping: (c) Contrast consistency The grouping with consistent local contrast is favored Motion stimulus

The Graphical Model for Grouping

• Affinity metric terms – (a) Motion similarity

b

1 (  11 ,  11 ) (  21 ,  21 )

b

2 – (b) Curve smoothness

r

b

2

b

1

h

21

h

22

b

2 – (c) Contrast consistency

b

1

h

11

h

12 • The graphical model for grouping affinity reversibility no self-intersection

Motion estimation for grouped contours

• Gaussian MRF (GMRF) within a boundary fragment • The motions of two end edgelets are similar if they are grouped together • The graphical model of motion: joint Gaussian given the grouping This has been solved before, see, for example: Y. Weiss, Interpreting images by propagating Bayesian beliefs, NIPS, 1997.

Inference

• • Two-step inference – – Grouping (switch variables) Motion based on grouping (easy, least squares) Grouping: importance sampling to estimate the marginal of the switch variables – Bidirectional proposal density – Toss the sample if self-intersection is detected • Obtain the optimal grouping from the marginal

Example of Sampling

Self intersection Motion stimulus

Example of Sampling

A valid grouping Motion stimulus

Example of Sampling

More valid groupings Motion stimulus

Example of Sampling

More valid groupings Motion stimulus

From Affinity to Marginals

Affinity metric of the switch variable (darker, thicker means larger affinity) Motion stimulus

From Affinity to Marginals

Marginal distribution of the switch variable (darker, thicker means larger affinity) Greedy algorithm to search for the best grouping based on the marginals Motion stimulus

Experiments

• • All the results are generated using the same parameter settings Running time depends on the number of boundary fragments, varying from ten seconds to a few minutes in MATLAB

Frame 1

Two Moving Bars

Frame 2

Two Moving Bars

Two Moving Bars

Extracted boundary fragments. The green circles are the boundary fragment end points.

Two Moving Bars

Boundary grouping and illusory boundaries (frame 1). The fragments belonging to the same contour are plotted in one color.

Two Moving Bars

Boundary grouping and illusory boundaries (frame 2). The fragments belonging to the same contour are plotted in one color.

Two Moving Bars

Optical flow from Lucas-Kanade algorithm. The flow vectors are only plotted at the edgelets

Two Moving Bars

Estimated motion by our system after grouping

Kanizsa Square

Frame 1

Frame 2

Extracted boundary fragments

Boundary grouping and illusory boundaries (frame 1)

Boundary grouping and illusory boundaries (frame 2)

Optical flow from Lucas-Kanade algorithm

Estimated motion by our system, after grouping

Dancer

Frame 1

Frame 2

Extracted boundary fragments

Boundary grouping and illusory boundaries (frame 1)

Boundary grouping and illusory boundaries (frame 2)

Optical flow from Lucas-Kanade algorithm

Estimated motion by our system, after grouping

Lucas-Kanade flow field Estimated motion by our system, after grouping

Rotating Chair

Frame 1

Frame 2

Extracted boundary fragments

Boundary grouping and illusory boundaries (frame 1)

Boundary grouping and illusory boundaries (frame 2)

Estimated flow field from Brox et al.

Estimated motion by our system, after grouping

Analysis of Contour Motion, NIPS 2006, Liu et al

• • • • A contour-based representation to estimate motion for textureless objects under occlusion Motion ambiguities are preserved and resolved through appropriate contour grouping An important component in motion analysis toolbox To be combined with the classical motion estimation techniques to analyze complex scenes

Human Assisted Motion Annotation

Ce Liu William T. Freeman Edward H. Adelson CSAIL, MIT Yair Weiss The Hebrew University of Jerusalem 70

How good is optical flow?

• The AAE (average angular error) race on the

Yosemite

sequence for over 15 years Improvement # Yosemite sequence # I. Austvoll. Lecture Notes in Computer Science, 2005 * Brox

et al. ECCV

, 2004.

State-of-the-art optical flow * 71

But when optical flow is applied to real life videos…

A sample sequence State-of-the-art optical flow Flow visualization color map Optical flow is far from being solved: – – Often fails to capture occluding boundaries correctly Puzzles on the right choice of smoothness 72

Motion ground-truth databases

• Synthetic sequences

Yosemite

, [Roth & Black

IJCV

07] – – Do not present the phenomena in real-life videos Mislead people that motion estimation is solved • Middlebury flow database [Baker

et al

.

ICCV

07] – – Indoor scenes only Artificial motion • Need ground-truth motion for real-life videos 73

Measuring motion for real-life videos

• Challenging because of occlusion, shadow, reflection, motion blur, sensor noise and compression artifacts • • [Video courtesy: Antonio Torralba] Accurately measuring motion also has great impact in scientific measurement and graphics applications Humans are experts in perceiving motion. Can we use human expertise to annotate motion?

74

How to annotate motion?

• How do computer vision researchers check whether a flow field is good enough when there is no ground-truth?

– – The two frames are matched after warping The discontinuities of the flow field coincide with the object boundaries • Optical flow can be much better if users specify – Occluding boundaries and boundary ownership – Smoothness of the flow • Label every pixel in a video sequence?

75

Our solution: human-assisted

layer

motion annotation

• • We use video layer representation [Adelson 1990] as the interface for users to specify occluding boundary, boundary ownership and flow smoothness • Decompose a video into layers, annotate motion for each layer, and compose layer-wise motion into full frame motion Our goal is

not

to achieve 100% accurate annotations, but annotations that are far better than any machine vision algorithms can reach 76

System overview

Humans do the easy job

Label contours Specify depth ordering Make corrections Specify flow para Choose best flow field Label feature points Layer-based video representation

Computer does the tough job

Track contours Propagate corrections Layers Compute optical flow Track feature points Per-layer flow field Fit parametric motion Interpolate dense flow 77

Demo: interactive layer segmentation

78

Contour tracking model

• Objective function for contour tracking (Data term) (Smoothness term) 79

Occlusion handling

• Label a relative depth function (w.r.t. time) for each layer • • Iterative algorithm for tacking – – Detecting occlusion (no data term for occluded landmarks) Coarse-to-fine optimizing the objective function IRLS (iteratively reweighted least square) 80

Occlusion handling is crucial!

Car is in front of the biker Car is behind the biker 81

Layer-wise optical flow estimation

• Symmetric optical flow estimation – Energy terms from frame 1 to frame 2 – Objective function (controlled by and ) Frames Masks Flow field

Select the best flow field

• Balance between two criteria – – The smoothness of the flow field reflects object property The two frames are matched when the second frame is warped to the first frame with respect to the flow field Smooth flow, poor match Discontinuous flow, good match Good balance of the two criteria 83

Demo: interactive motion labeling

84

Validation 1: labeling consistency

• Nine subjects assigned motions to the same sequence with layering fixed. Their flow fields agreed, on average, within about

0.1

pixel per frame. A frame Mean of the annotated flow field Error distribution Flow visualization color map 85

• •

Validation 2: compare to existing ground-truth

Compared our method to “ground truth” of [Baker

et al

. 2007] The average error is

0.1

pixel per frame The sequence Annotation from Baker et al.

Annotation using our tool Layering (20 layers) Difference Flow visualization color map 86

Motion database of natural scenes

Color map Bruhn

et al

. Lucas/Kanade meets Horn/Schunck: combining local and global optical flow methods.

IJCV

, 2005 87

Optical flow is far from being solved

AAE=8.99

° AAE=5.24

° Frame Ground-truth motion Optical flow AAE=1.94

° 88

Conclusions

• It is possible to annotate motion for real life videos using video layer representation • Obtained a motion ground-truth database – – Benchmark motion analysis algorithms Measure motion statistics • The tool, code and database are online.

http://people.csail.mit.edu/celiu/motion/ 89

end

90

Layer Representation

• • Video is a composite of layers Layer segmentation assumes sufficient textures for each layer to represent motion • Some initial work.

J. Wang & E. H. Adelson 1994 Achieved with the help of spatial segmentation Y. Weiss & E. H. Adelson 1994

Layer Representation

• • Video is a composite of layers Layer segmentation assumes sufficient textures for each layer to represent motion • A true success?

J. Wang & E. H. Adelson 1994 Layer representation is good, but the existing layer for textureless objects of spatial segmentation Y. Weiss & E. H. Adelson 1994

Why bidirectional proposal in sampling?

b

2

b

1

b

3

b

4

Why bidirectional proposal in sampling?

b

1 Affinity metric of the switch variable (darker, thicker means larger affinity)

b

2

b

3

b

4 b 1  b 2 : 0.39

b 1  b 3 : 0.01

b 1  b 4 : 0.60

b 4  b 1 : 0.20

b 4  b 2 : 0.05

b 4  b 3 : 0.85

b b b 2 2 2    b b b 1 3 4 : 0.50

: 0.45

: 0.05

Normalized affinity metrics b 3  b 1 : 0.01

b 3  b 2 : 0.45

b 3  b 4 : 0.54

b 1  b 2 : 0.1750

b 1  b 3 : 0.0001

b 1  b 4 : 0.1200

Bidirectional proposal

Use bidirectional proposals in sampling

b

1 Bidirectional proposal of the switch variable (darker, thicker means larger affinity)

b

2

b

3

b

4 b 1  b 2 : 0.39

b 1  b 3 : 0.01

b 1  b 4 : 0.60

b 4  b 1 : 0.20

b 4  b 2 : 0.05

b 4  b 3 : 0.85

b b b 2 2 2    b b b 1 3 4 : 0.50

: 0.45

: 0.05

Normalized affinity metrics b 3  b 1 : 0.01

b 3  b 2 : 0.45

b 3  b 4 : 0.54

b 1  b 2 : 0.62

b 1  b 3 : 0.00

b 1  b 4 : 0.38

Bidirectional proposal (Normalized)

Neural Information Processing Systems 2006 Ce Liu

Analysis of Contour Motions

NIPS 2006

Ce Liu William T. Freeman Edward H. Adelson

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Visual Motion Analysis in Computer Vision • Motion analysis is essential in – – – –

Video processing Geometry reconstruction Object tracking, segmentation and recognition Graphics applications

• Is motion analysis solved?

• Do we have good representation for motion analysis?

• Is it computationally feasible to infer the representation from the raw video data? • What is a good representation for motion?

Ground-truth database approach for computer vision

When you can measure what you are speaking about and express it in numbers, you know something about it; …

--Lord Kelvin • Ground-truth databases have significantly influenced the directions of computer vision – – Berkeley image segmentation database [Martin

et al

.

ICCV

01] Middlebury stereo database [Scharstein & Szeliski

IJCV

02] – – PASCAL object recognition database [Leibe

et al

. 04] LabelMe online object annotation database [Russell

et al

.

IJCV

08] 98

Optical Flow Representation

We need to explicitly consider these possible interpretations Corners Spurious junctions Lines Flat regions Boundary ownership Illusory boundaries

There are many candidate contour groupings (can you name some good grouping criteria?) Illusory boundaries corresponding to the groupings (generated by spline interpolation) Motion stimulus