Transcript ppt
Multi-Scale Video Cropping Hazem El-Alfy, David Jacobs and Larry Davis Department of Computer Science University of Maryland, College Park Sep 25th 2007, ACM MM ’07 Modern Surveillance Systems Networks of surveillance cameras. Control Room: Fewer monitors than cameras. Far fewer operators than monitors. Cameras cycle through monitors. 2 Modern Surveillance Systems Typical Control Rooms: airports, subways, metropolitan areas, seaports, crowd control. 3 “Future” Control Rooms “Continuous” display wall versus a fixed set of discrete monitors. Algorithms to control: where to display videos, how much area to assign to them, how to display them. Barco Control Room, Vienna, Austria 4 Video Cropping Munich Airport – Courtesy Siemens, NJ 5 Why Cropping? Resize video to save bandwidth or to fit display area. Cropping before resizing to focus operator attention of on important areas. 6 Problem Definition Determine trajectories of cropping windows through the video: variable size window maximize captured saliency smooth trajectory occasional jumps (cuts) between trajectories. y t x 7 Problem Definition Each frame t covered by variable size overlapping windows Wi,t Saliency measure S(Wi,t) argmaxQ Σt S(Wi,t), over all window sequences Q Subject to constraints for smooth window motion and size change. Wi,t 8 Our Approach: Overview Extract motion energy. Model video as a graph. Find trajectories as shortest paths in graph. Merge trajectories. Repeat for other segments of long videos. Video Frames Extract Motion Energy Motion Frames Building Graph Shortest Path + Smoothing Trajectories Merging Trajectories Cropped Video Wiping Frames 9 Extracting Motion Energy Motion energy as a saliency measure. Frame differences are smoothed using morphological operations. 10 Modeling Graph Nodes: cropping windows in each frame. Add dummy source and target nodes. Edges: allowable window changes (location and size) between consecutive frames. w=0 dummy source node dummy target node w=0 windows of first frame windows of i th frame windows of last frame 11 Modeling Graph Multi-scale energy function for window W: S(W): always favors large windows E(W) = S(W)/A(W): favors small (dense) windows E(W) = S(Win)/A(Win) – Sbelt/K Edge weight: wij = 1 – ENorm(Wj) E(W) = Sbelt 1 2 Win 4 3 12 Modeling Graph Energy function computed for all windows in all frames. Efficiently computed using integral images [Viola & Jones ’01]: ii(x,y) x2 x1 W x4 cropping window x3 = Σx’<x,y’<y i(x’,y’) E(W)=ii(x3)-ii(x2)-ii(x4)+ii(x1) video frame 13 Shortest Path Dial’s implementation of Dijkstra’s algorithm: linear in # graph nodes. Smoothing: low-pass filter + cubic Hermite interpolation. 14 Merging Trajectories More cropping windows needed to capture simultaneous activity. Wipe captured activity from motion frames and repeat earlier process on remaining motion. Merge trajectories: find shortest path through a graph of trajectories. 15 Processing Long Videos Problems: Graph gets too big if video is long. Latencies must be short in surveillance systems. Solution: Break long videos into segments with overlap. Process each segment then stitch results together. break here break here 16 Processing Long Videos Issues How short can segments be? Are there preferable locations to break video? Overlap amount needed for smooth transitions? We ran many experiments for fixed size crop Shortest path converge quickly. Segments can be as short as 40 frames. Avoid periods of low activity when breaking video. Overlap intervals of 20 frames are sufficient. 17 Results Munich Airport: variable size single window. 18 Results Munich Airport: video-in-video display. 19 Results Traffic at a stop sign on campus (2 windows). 20 Contributions Variable size smooth cropping window. Simultaneous multiple cropping windows. Relatively short video segments processed vs. the entire video (online). Empirically shown identical to processing the largest video that can be processed as a whole. 21