Transcript Document

Object Based Video Coding - A
Multimedia Communication Perspective
Muhammad Hassan Khan
2004-03-0020
Overview








Motivation for Video Coding
Today’s Video Coding
Problems with today’s video coding
Desirable Features
Solution to get desirable features
 Object Based Video Coding
 MPEG-4 Support
 Model Based Coding
Major Problem: Segmentation
Segmentation by Graph Cuts
Architecture to Incorporate this segmentation mechanism with
MPEG-4 bit stream
Why Video Coding?






Consider a 1 minute video with 60fps
No of frames = 60 x 60 = 3600
Given that each color frame in the video was a 640 x 480 pixels
The size of the raw video comes out to be?
 3600 x 640 x 480 x 3 = 3,317,760,000 bits
 Of the order of Gbs
Now a days one might say that memory is no big deal… BUT
What if we want to transfer this file from one node to the other
node over a network!
 Things would collapse very soon
 Just imagine if the video was 1 hour duration rather than 1
minute!!!
 I hope the need for video coding is now obvious 
Today’s Video Coding
YUV
(lossy)



Motion
DCT
Quantize
(lossy)
Order
Designed for natural scenes
Higher frequency DCT coefficients are
quantized
Sharp edges are not well preserved
Entropy
Problems with Today’s Video Coding

Poor performance in case of




Anything with sharp edges
Highly textured regions
Texts (Channel Logos)
The bit stream produced by today’s coders is
also debatable in that weather it is the MOST
optimal bit stream

In fact what is a most optimal bit stream is still a
question
Desired Features







Better compression
Improved quality
Interactivity and Manipulation of Content
Error Resilience
Processing of content in the compressed
domain
Identification and selective coding/decoding
of the object of interest
Facilitate Search / Indexing (MPEG-7)
Solution to Get Desirable Features

MPEG-4

Support for Object Based Coding




Rather than conventional block based coding for natural
images
The scene should be divided hierarchically into objects
The scene will now be described by the objects placed
in a hierarchical manner
A sample is presented in the next slide
The scene divided into
objects
audiovisual objects
voice
hierarchically multiplexed
downstream control / data
sprite
hierarchically multiplexed
upstream control / data
2D background
audiovisual
presentation
y
Hierarchical Description
3D objects
scene
coordinate
system
x
z
user events
video
compositor
projection
plane
audio
compositor
hypothetical viewer
speaker
display
user input
The decoding process
Meshed Video
11.5
11.5
11.4
11.4
11.1
11.2
11.3
11.1
11.2
4.4 4.2 4.1 4.3



2D mesh tessellates
the video into patches
Motion vector for each
vertex
Motivation
10.2
10.4
4.5
10.10
10.9
10.1
10.2
10.3
5.1
5.2
10.8
10.4
9.14
10.8
10.6
10.5
2.13
2.14
9.12
9.3
9.4
9.2
X
2.14
Z
7.1
2.10
X
10.6
Y
2.12 2.1 2.11
2.10
2.1
2.12
Z
Right Eye
Left Eye
3.14
3.2
3.4
3.12
3.13
3.6
3.8
3.11
3.1
3.3
Nose
3.5
9.6
3.9
3.10
9.7
3.7
9.12
9.14
9.13
Teeth
9.8

9.6
10.10
10.7
Y
4.4
4.6
4.6
Modeling (Motion and
Shape)
9.2
9.10
9.11
9.4
9.3
9.15
9.9
Mouth
8.6
8.9
8.1
2.7
8.4
Tongue
8.10 8.5
.2.2 2.6
2.5
2.4 8.3
2.9
8.8
2.3
8.2
Feature points affected by FAPs
Other feature points
6.2
6.4
2.8
6.3
8.7
6.1
9.1
9.5
Problems with Mesh Based Coding


Works fine with previously known models and
caters for a small class of objects
The reliable tracking of features or control
points along the video



E.g. FAPs
A ready-made model is assumed, 2D or 3D
model of the object has to be known
A more general approach was required

Object Based Video Coding

Shape, Color, and Motion
Requires a Major Step!

Segmentation





Dividing the scene into objects
In simplest form these objects can be foreground
and background
In more complex situations there can be multiple
objects in the scene
Segmentation is required to extract the objects
Computing Motion


Object Based Motion
Parameterized Motion Information
Segmentation by Graph Cuts




Uses Max-Flow Min-Cut Algorithm from
Graph Theory
Divides the data into regions based on an
energy function, usually employed to the
intensities of the image
A smoothness function is also used to make
sure that the segmentation achieved is
consistent
Details will be provided in the final
presentation
Architecture


We will also propose mechanism to assign
motion information to the segmented objects
Our approach will be as consistent as
possible with the support provided by MPEG
References

Gary J. Sullivan, Pankaj Topiwala, Ajay Luthra SPIE Conference on Applications
of Digital Image Processing XXVII, Special Session on Advances in the New
Emerging Standard: H.264/AVC, August, 2004

Gabriel Antunes, Abrantes, Fernando Pereira, MPEG-4 Facial Animation
Technology : Survey, Implementation and Results, IEEE Transactions on
Circuits and Systems for Video Technology, Vol. 9, No. 2, March 1999

Roger H Clarke, Image and Video Compression: A Survey Department of
Computing and Electrical Engineering, Heriot-Watt University, Riccarton,
Edinburgh EH14 4 AS, Scotland.

Noel Brady, MPEG-4 Standardized Methods for the Compression of Arbitrarily
Shaped Video Objects, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 9, No. 8, December 1999

Boykov, Y.; Veksler, O.; Zabih, R.; Fast approximate energy minimization via
graph cuts, Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 23, Issue 11, Nov. 2001 Page(s):1222 - 1239