Computer Vision: Motion - Carnegie Mellon University

Download Report

Transcript Computer Vision: Motion - Carnegie Mellon University

Video Texture
© A.A. Efros
15-463: Computational Photography
Alexei Efros, CMU, Fall 2006
Weather Forecasting for Dummies™
Let’s predict weather:
• Given today’s weather only, we want to know tomorrow’s
• Suppose weather can only be {Sunny, Cloudy, Raining}
The “Weather Channel” algorithm:
• Over a long period of time, record:
– How often S followed by R
– How often S followed by S
– Etc.
• Compute percentages for each state:
– P(R|S), P(S|S), etc.
• Predict the state with highest probability!
• It’s a Markov Chain
Markov Chain
 0.3 0.6 0.1 


 0.4 0.3 0.3 
 0.2 0.4 0.4 


What if we know today and yestarday’s weather?
Text Synthesis
[Shannon,’48] proposed a way to generate
English-looking text using N-grams:
• Assume a generalized Markov model
• Use a large text to compute prob. distributions of
each letter given N-1 previous letters
• Starting from a seed repeatedly sample this Markov
chain to generate new letters
• Also works for whole words
WE NEED TO EAT CAKE
Mark V. Shaney (Bell Labs)
Results (using alt.singles corpus):
• “As I've commented before, really relating to
someone involves standing next to
impossible.”
• “One morning I shot an elephant in my arms
and kissed him.”
• “I spent an interesting evening recently with
a grain of salt”
Video Textures
Arno Schödl
Richard Szeliski
David Salesin
Irfan Essa
Microsoft Research, Georgia Tech
Still photos
Video clips
Video textures
Problem statement
video clip
video texture
Our approach
• How do we find good transitions?
Finding good transitions
• Compute L2 distance Di, j between all
frames
vs.
frame i
frame j
Similar frames make good transitions
Markov chain representation
1
2
Similar frames make good transitions
3
4
Transition costs
• Transition from i to j if successor of i is similar
to j
• Cost function: Cij = Di+1, j
•
i
i+
1
i
j
j-1
D
i+
1
,j
j
Transition probabilities
•Probability for transition Pij inversely related
to cost:
•Pij ~ exp ( – Cij / s2 )
high s
low s
Preserving dynamics
Preserving dynamics
Preserving dynamics
• Cost for transition ij
N -1
• Cij =  wk Di+k+1, j+k
k = -N
i-1
i
Di-1, j-2
Di, j-1
j-2
j-1
i j
i+1
i+2
Di+1, j
Di+2, j+1
j
j+1
Preserving dynamics – effect
• Cost for transition ij
N -1
• Cij =  wk Di+k+1, j+k
k = -N
Dead ends
• No good transition at the end of sequence
1
2
3
4
Future cost
• Propagate future transition costs backward
• Iteratively compute new cost
• Fij = Cij +  mink Fjk
1
2
3
4
Future cost
• Propagate future transition costs backward
• Iteratively compute new cost
• Fij = Cij +  mink Fjk
1
2
3
4
Future cost
• Propagate future transition costs backward
• Iteratively compute new cost
• Fij = Cij +  mink Fjk
1
2
3
4
Future cost
• Propagate future transition costs backward
• Iteratively compute new cost
• Fij = Cij +  mink Fjk
1
2
3
4
Future cost
• Propagate future transition costs backward
• Iteratively compute new cost
• Fij = Cij +  mink Fjk
• Q-learning
1
2
3
4
Future cost – effect
Finding good loops
• Alternative to random transitions
• Precompute set of loops up front
Visual discontinuities
• Problem: Visible “Jumps”
Crossfading
• Solution: Crossfade from one sequence to
the other.
3
4
… Ai-2
+
Ai-2
1
4
Ai-1
2
4
Bj-2
2
+
4
Ai-1/Bj-2
Ai
Bj-1
15
Ai-1/Bj-2
1
4
3
+
4
Ai+1
Bj
Ai-1/Bj-2
Bj+1 …
Bj+1
Morphing
• Interpolation task:
2
5
A +
2
5
B +
1
5
C
Morphing
• Interpolation task:
2
5
A +
2
5
B +
1
5
• Compute correspondence
between pixels of all frames
C
Morphing
• Interpolation task:
2
5
A +
2
5
B +
1
5
• Compute correspondence
between pixels of all frames
• Interpolate pixel position and
color in morphed frame
• based on [Shum 2000]
C
Results – crossfading/morphing
Results – crossfading/morphing
Jump Cut
Crossfade
Morph
Crossfading
Frequent jump & crossfading
Video portrait
• Useful for web pages
Region-based analysis
• Divide video up into regions
• Generate a video texture for each region
Automatic region analysis
Video-based animation
• Like sprites
computer games
• Extract sprites
from real video
• Interactively control
desired motion
©1985 Nintendo of America Inc.
Video sprite extraction
blue screen matting
and velocity estimation
Video sprite control
• Augmented transition cost:
Animation
Similarity term
{
{
Cij =  Cij +  angle
vector to
mouse pointer
velocity vector
Control term
Interactive fish
Lord of the Flies
Summary
• Video clips  video textures
•
•
•
•
define Markov process
preserve dynamics
avoid dead-ends
disguise visual discontinuities
Motion Analysis & Synthesis [Efros ’03]
• What are they doing?
– Activity recognition, surveillance, anti-terrorism
• Can we do the same?
– Motion retargeting, movies, video games, etc.
Gathering action data
• Low resolution, noisy data
• Moving camera
• Occlusions
Figure-centric Representation
• Stabilized spatio-temporal
volume
– No translation information
– All motion caused by person’s
limbs
• Good news: indifferent to camera
motion
• Bad news: hard!
• Good test to see if actions, not
just translation, are being
captured
Remembrance of Things Past
• “Explain” novel motion sequence with bits
and pieces of previously seen video clips
input sequence
motion analysis
run
swing
walk left
jog
walk right
motion synthesis
synthetic sequence
Challenge: how to compare motions?
How to describe motion?
• Appearance
– Not preserved across different clothing
• Gradients (spatial, temporal)
– same (e.g. contrast reversal)
• Edges
– Too unreliable
• Optical flow
– Explicitly encodes motion
– Least affected by appearance
– …but too noisy
Motion Descriptor
Image frame
Fx , Fy
Fx , Fx , Fy , Fy
Optical flow Fx , y
blurred Fx , Fx , Fy , Fy
Comparing motion descriptors

…
…
…
…
t
I matrix
frame-to-frame
similarity matrix
blurry I
motion-to-motion
similarity matrix
Recognizing Tennis
• Red bars show classification results
“Do as I Do” Motion Synthesis
input sequence
synthetic sequence
• Matching two things:
– Motion similarity across sequences
– Appearance similarity within sequence
• Dynamic Programming
Smoothness for Synthesis
•
•
•
•
Wact is similarity between source and target frames
Wapp is appearance similarity within target frames
For every source frame i, find best target frame  i
by maximizing following cost function:
n

i 1
n
Wact (i,  i )   appWapp ( i ,  i 1  1)
act
i 2
• Optimize using dynamic programming
“Do as I Do”
Source Motion
Source Appearance
3400 Frames
Result
“Do as I Say” Synthesis
run
walk left
swing
walk right
run
swing
walk left
jog
walk right
synthetic sequence
• Synthesize given action labels
– e.g. video game control
jog
“Do as I Say”
• Red box shows when constraint is applied
Application: Motion Retargeting
• Rendering new character into existing
footage
• Algorithm
–
–
–
–
Track original character
Find matches from new character
Erase original character
Render in new character
• Need to worry about occlusions
Actor Replacement
SHOW VIDEO