Transcript ppt

Multi-Scale Video Cropping
Hazem El-Alfy, David Jacobs
and Larry Davis
Department of Computer Science
University of Maryland, College Park
Sep 25th 2007, ACM MM ’07
Modern Surveillance Systems
Networks of surveillance cameras.
 Control Room:

 Fewer
monitors than
cameras.
 Far fewer operators
than monitors.
 Cameras cycle
through monitors.
2
Modern Surveillance Systems
Typical Control Rooms:
airports, subways, metropolitan
areas, seaports, crowd control.
3
“Future” Control Rooms


“Continuous” display
wall versus a fixed
set of discrete
monitors.
Algorithms to control:
 where
to display
videos,
 how much area to
assign to them,
 how to display them.
Barco Control Room, Vienna, Austria
4
Video Cropping
Munich Airport – Courtesy Siemens, NJ
5
Why Cropping?
Resize video to
save bandwidth or
to fit display area.
 Cropping before
resizing to focus
operator attention
of on important
areas.

6
Problem Definition
Determine trajectories of cropping
windows through
the video:
 variable
size window
 maximize captured
saliency
 smooth trajectory
 occasional jumps
(cuts) between
trajectories.
y
t
x
7
Problem Definition




Each frame t covered by
variable size overlapping
windows Wi,t
Saliency measure S(Wi,t)
argmaxQ Σt S(Wi,t), over
all window sequences Q
Subject to constraints for
smooth window motion
and size change.
Wi,t
8
Our Approach: Overview





Extract motion energy.
Model video as a graph.
Find trajectories as shortest paths in graph.
Merge trajectories.
Repeat for other segments of long videos.
Video
Frames
Extract
Motion
Energy
Motion
Frames
Building
Graph
Shortest
Path +
Smoothing
Trajectories
Merging
Trajectories
Cropped
Video
Wiping
Frames
9
Extracting Motion Energy


Motion energy as a saliency measure.
Frame differences are smoothed using
morphological operations.
10
Modeling Graph



Nodes: cropping windows in each frame.
Add dummy source and target nodes.
Edges: allowable window changes (location and
size) between consecutive frames.
w=0
dummy
source
node
dummy
target
node
w=0
windows
of first
frame
windows
of i th
frame
windows
of last
frame
11
Modeling Graph

Multi-scale energy function for window W:
S(W): always favors large windows
 E(W) = S(W)/A(W): favors small (dense) windows
 E(W) = S(Win)/A(Win) – Sbelt/K
 Edge weight: wij = 1 – ENorm(Wj)
 E(W) =
Sbelt
1
2
Win
4
3
12
Modeling Graph
Energy function computed
for all windows in all frames.
 Efficiently computed using
integral images [Viola &
Jones ’01]:

 ii(x,y)
x2
x1
W
x4
cropping
window
x3
= Σx’<x,y’<y i(x’,y’)
 E(W)=ii(x3)-ii(x2)-ii(x4)+ii(x1)
video frame
13
Shortest Path
Dial’s implementation of
Dijkstra’s algorithm:
linear in # graph nodes.
 Smoothing: low-pass
filter + cubic Hermite
interpolation.

14
Merging Trajectories



More cropping windows needed to capture
simultaneous activity.
Wipe captured activity from motion frames and
repeat earlier process on remaining motion.
Merge trajectories: find shortest path through a
graph of trajectories.
15
Processing Long Videos

Problems:
 Graph
gets too big if video is long.
 Latencies must be short in surveillance systems.

Solution:
 Break
long videos into segments with overlap.
 Process each segment then stitch results
together.
break
here
break
here
16
Processing Long Videos

Issues
 How
short can segments be?
 Are there preferable locations to break video?
 Overlap amount needed for smooth transitions?

We ran many experiments for fixed size crop
 Shortest
path converge quickly. Segments can be
as short as 40 frames.
 Avoid periods of low activity when breaking video.
 Overlap intervals of 20 frames are sufficient.
17
Results
Munich Airport: variable size single window.
18
Results
Munich Airport: video-in-video display.
19
Results
Traffic at a stop sign on campus (2 windows).
20
Contributions
Variable size smooth cropping window.
 Simultaneous multiple cropping windows.
 Relatively short video segments
processed vs. the entire video (online).
 Empirically shown identical to processing
the largest video that can be processed as
a whole.

21