Transcript ppt
Multi-Scale Video Cropping
Hazem El-Alfy, David Jacobs
and Larry Davis
Department of Computer Science
University of Maryland, College Park
Sep 25th 2007, ACM MM ’07
Modern Surveillance Systems
Networks of surveillance cameras.
Control Room:
Fewer
monitors than
cameras.
Far fewer operators
than monitors.
Cameras cycle
through monitors.
2
Modern Surveillance Systems
Typical Control Rooms:
airports, subways, metropolitan
areas, seaports, crowd control.
3
“Future” Control Rooms
“Continuous” display
wall versus a fixed
set of discrete
monitors.
Algorithms to control:
where
to display
videos,
how much area to
assign to them,
how to display them.
Barco Control Room, Vienna, Austria
4
Video Cropping
Munich Airport – Courtesy Siemens, NJ
5
Why Cropping?
Resize video to
save bandwidth or
to fit display area.
Cropping before
resizing to focus
operator attention
of on important
areas.
6
Problem Definition
Determine trajectories of cropping
windows through
the video:
variable
size window
maximize captured
saliency
smooth trajectory
occasional jumps
(cuts) between
trajectories.
y
t
x
7
Problem Definition
Each frame t covered by
variable size overlapping
windows Wi,t
Saliency measure S(Wi,t)
argmaxQ Σt S(Wi,t), over
all window sequences Q
Subject to constraints for
smooth window motion
and size change.
Wi,t
8
Our Approach: Overview
Extract motion energy.
Model video as a graph.
Find trajectories as shortest paths in graph.
Merge trajectories.
Repeat for other segments of long videos.
Video
Frames
Extract
Motion
Energy
Motion
Frames
Building
Graph
Shortest
Path +
Smoothing
Trajectories
Merging
Trajectories
Cropped
Video
Wiping
Frames
9
Extracting Motion Energy
Motion energy as a saliency measure.
Frame differences are smoothed using
morphological operations.
10
Modeling Graph
Nodes: cropping windows in each frame.
Add dummy source and target nodes.
Edges: allowable window changes (location and
size) between consecutive frames.
w=0
dummy
source
node
dummy
target
node
w=0
windows
of first
frame
windows
of i th
frame
windows
of last
frame
11
Modeling Graph
Multi-scale energy function for window W:
S(W): always favors large windows
E(W) = S(W)/A(W): favors small (dense) windows
E(W) = S(Win)/A(Win) – Sbelt/K
Edge weight: wij = 1 – ENorm(Wj)
E(W) =
Sbelt
1
2
Win
4
3
12
Modeling Graph
Energy function computed
for all windows in all frames.
Efficiently computed using
integral images [Viola &
Jones ’01]:
ii(x,y)
x2
x1
W
x4
cropping
window
x3
= Σx’<x,y’<y i(x’,y’)
E(W)=ii(x3)-ii(x2)-ii(x4)+ii(x1)
video frame
13
Shortest Path
Dial’s implementation of
Dijkstra’s algorithm:
linear in # graph nodes.
Smoothing: low-pass
filter + cubic Hermite
interpolation.
14
Merging Trajectories
More cropping windows needed to capture
simultaneous activity.
Wipe captured activity from motion frames and
repeat earlier process on remaining motion.
Merge trajectories: find shortest path through a
graph of trajectories.
15
Processing Long Videos
Problems:
Graph
gets too big if video is long.
Latencies must be short in surveillance systems.
Solution:
Break
long videos into segments with overlap.
Process each segment then stitch results
together.
break
here
break
here
16
Processing Long Videos
Issues
How
short can segments be?
Are there preferable locations to break video?
Overlap amount needed for smooth transitions?
We ran many experiments for fixed size crop
Shortest
path converge quickly. Segments can be
as short as 40 frames.
Avoid periods of low activity when breaking video.
Overlap intervals of 20 frames are sufficient.
17
Results
Munich Airport: variable size single window.
18
Results
Munich Airport: video-in-video display.
19
Results
Traffic at a stop sign on campus (2 windows).
20
Contributions
Variable size smooth cropping window.
Simultaneous multiple cropping windows.
Relatively short video segments
processed vs. the entire video (online).
Empirically shown identical to processing
the largest video that can be processed as
a whole.
21