Transcript Document
Object Based Video Coding - A Multimedia Communication Perspective Muhammad Hassan Khan 2004-03-0020 Overview Motivation for Video Coding Today’s Video Coding Problems with today’s video coding Desirable Features Solution to get desirable features Object Based Video Coding MPEG-4 Support Model Based Coding Major Problem: Segmentation Segmentation by Graph Cuts Architecture to Incorporate this segmentation mechanism with MPEG-4 bit stream Why Video Coding? Consider a 1 minute video with 60fps No of frames = 60 x 60 = 3600 Given that each color frame in the video was a 640 x 480 pixels The size of the raw video comes out to be? 3600 x 640 x 480 x 3 = 3,317,760,000 bits Of the order of Gbs Now a days one might say that memory is no big deal… BUT What if we want to transfer this file from one node to the other node over a network! Things would collapse very soon Just imagine if the video was 1 hour duration rather than 1 minute!!! I hope the need for video coding is now obvious Today’s Video Coding YUV (lossy) Motion DCT Quantize (lossy) Order Designed for natural scenes Higher frequency DCT coefficients are quantized Sharp edges are not well preserved Entropy Problems with Today’s Video Coding Poor performance in case of Anything with sharp edges Highly textured regions Texts (Channel Logos) The bit stream produced by today’s coders is also debatable in that weather it is the MOST optimal bit stream In fact what is a most optimal bit stream is still a question Desired Features Better compression Improved quality Interactivity and Manipulation of Content Error Resilience Processing of content in the compressed domain Identification and selective coding/decoding of the object of interest Facilitate Search / Indexing (MPEG-7) Solution to Get Desirable Features MPEG-4 Support for Object Based Coding Rather than conventional block based coding for natural images The scene should be divided hierarchically into objects The scene will now be described by the objects placed in a hierarchical manner A sample is presented in the next slide The scene divided into objects audiovisual objects voice hierarchically multiplexed downstream control / data sprite hierarchically multiplexed upstream control / data 2D background audiovisual presentation y Hierarchical Description 3D objects scene coordinate system x z user events video compositor projection plane audio compositor hypothetical viewer speaker display user input The decoding process Meshed Video 11.5 11.5 11.4 11.4 11.1 11.2 11.3 11.1 11.2 4.4 4.2 4.1 4.3 2D mesh tessellates the video into patches Motion vector for each vertex Motivation 10.2 10.4 4.5 10.10 10.9 10.1 10.2 10.3 5.1 5.2 10.8 10.4 9.14 10.8 10.6 10.5 2.13 2.14 9.12 9.3 9.4 9.2 X 2.14 Z 7.1 2.10 X 10.6 Y 2.12 2.1 2.11 2.10 2.1 2.12 Z Right Eye Left Eye 3.14 3.2 3.4 3.12 3.13 3.6 3.8 3.11 3.1 3.3 Nose 3.5 9.6 3.9 3.10 9.7 3.7 9.12 9.14 9.13 Teeth 9.8 9.6 10.10 10.7 Y 4.4 4.6 4.6 Modeling (Motion and Shape) 9.2 9.10 9.11 9.4 9.3 9.15 9.9 Mouth 8.6 8.9 8.1 2.7 8.4 Tongue 8.10 8.5 .2.2 2.6 2.5 2.4 8.3 2.9 8.8 2.3 8.2 Feature points affected by FAPs Other feature points 6.2 6.4 2.8 6.3 8.7 6.1 9.1 9.5 Problems with Mesh Based Coding Works fine with previously known models and caters for a small class of objects The reliable tracking of features or control points along the video E.g. FAPs A ready-made model is assumed, 2D or 3D model of the object has to be known A more general approach was required Object Based Video Coding Shape, Color, and Motion Requires a Major Step! Segmentation Dividing the scene into objects In simplest form these objects can be foreground and background In more complex situations there can be multiple objects in the scene Segmentation is required to extract the objects Computing Motion Object Based Motion Parameterized Motion Information Segmentation by Graph Cuts Uses Max-Flow Min-Cut Algorithm from Graph Theory Divides the data into regions based on an energy function, usually employed to the intensities of the image A smoothness function is also used to make sure that the segmentation achieved is consistent Details will be provided in the final presentation Architecture We will also propose mechanism to assign motion information to the segmented objects Our approach will be as consistent as possible with the support provided by MPEG References Gary J. Sullivan, Pankaj Topiwala, Ajay Luthra SPIE Conference on Applications of Digital Image Processing XXVII, Special Session on Advances in the New Emerging Standard: H.264/AVC, August, 2004 Gabriel Antunes, Abrantes, Fernando Pereira, MPEG-4 Facial Animation Technology : Survey, Implementation and Results, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 2, March 1999 Roger H Clarke, Image and Video Compression: A Survey Department of Computing and Electrical Engineering, Heriot-Watt University, Riccarton, Edinburgh EH14 4 AS, Scotland. Noel Brady, MPEG-4 Standardized Methods for the Compression of Arbitrarily Shaped Video Objects, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 8, December 1999 Boykov, Y.; Veksler, O.; Zabih, R.; Fast approximate energy minimization via graph cuts, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 23, Issue 11, Nov. 2001 Page(s):1222 - 1239