No Slide Title

Transcript No Slide Title

Video coding
[??]
Video coding
Types of redundancies:
– Spatial: Correlation between neighboring pixel values
– Spectral: Correlation between different color planes or spectral
bands
– Temporal: Correlation between different frames in a video
sequence
In video coding, temporal correlation is also exploited, typically
using motion compensation (a predictive coding based on motion
estimation)
Video standards review
H.261
For video-conferencing/video phone
– Low delay (real-time, interactive)
– Slow motion in general
• For transmission over ISDN
– Fixed bandwidth: px64 Kbps, p=1,2,…,30
H.261
• Video Format:
– CIF (352x288, above 128 Kbps)
– QCIF (176x144, 64-128 Kbps)
– 4:2:0 color format, progressive scan
• Published in 1990
• Each macroblock can be coded in intra- or inter-mode
• Periodic insertion of intra-mode to eliminate error
propagation due to network impairments
DCT coefficient quantization
DC Coefficient in Intra-mode:
Uniform
Others:
Uniform with deadzone (to avoid
too many small coefficients being
coded, which are typically due to
noise)
MVs coded differentially (DMV)
DCT coefficients are converted into runlength representations and
then coded using VLC (Huffman coding for each pair of symbols)
– Symbol: (Zero run-length, non-zero value range)
• Other information is also coded using VLC (Huffman coding)
MPEG-1
• Finalized in ~1991
• Audio/video on CD-ROM (1.5 Mbps, CIF: 352x240, 30 fps).
– Maximum: 1.856 mbps, 768x576 pels
– Progressive frames only
• Prompted explosion of digital video applications: MPEG1 video
CD and downloadable video over Internet
• Software only decoding, made possible by the introduction of
Pentium chips, key to the success in the commercial market
• MPEG-1 Audio
– Offers 3 coding options (3 layers), higher layers have higher
coding efficiency with more computations
– MP3 = MPEG1 layer 3 audio
MPEG-1 vs H.261
• Developed at about the same time
• Must enable random access (Fast forward/rewind)
– Using GOP structure with periodic I-picture and P-picture
• Not for interactive applications
– Does not have as stringent delay requirement
• Fixed rate (1.5 Mbps), good quality (VHS equivalent)
– SIF video format (similar to CIF)
• CIF: 352x288, SIF: 352x240
– Using more advanced motion compensation
• Half-pel accuracy motion estimation, range up to +/- 64
– Using bi-directional temporal prediction
• Important for handling uncovered regions
MPEG-1 GOP
Encoding order: 1 4 2 3 8 5 6 7
MPEG-1 coder
H.263
• Targeted for visual telephone over PSTN or Internet
• Enable video phone over regular phone lines (28.8 Kbps) or
wireless modem
• Developed later than H.261, can accommodate computationally
more intensive options
– Initial version (H.263 baseline): 1995
– H.263+: 1997
– H.263++: 2000
• Result: Significantly better quality at lower rates
– Better video at 18-24 Kbps than H.261 at 64 Kbps
H.263
(some of the ) H.263 improvements over H.261
• Better motion estimation
– half-pel accuracy motion estimation with bilinear interpolation
filter
– larger motion search range [-31.5,31], and unrestricted MV at
boundary blocks
– more efficient predictive coding for MVs (median prediction
using three neighbors)
– overlapping block motion compensation (option)
– variable block size: 16x16 -> 8x8, 4 MVs per MB (option)
– use bidirectional temporal prediction (PB picture) (option)
• 3-D VLC for DCT coefficients (runlength, value, EOB)
• Syntax-based arithmetic coding (option; at 50% more
computations)
H.263 and beyond
- Aimed particularly at video coding for low bit rates (typically 2030 Kbps and above).
- Similar to that used by H.261, however with some
improvements and changes to improve performance and error
recovery.
- Main differences:
- Half pixel precision is used for motion compensation
- Four optional negotiable options
- Unrestricted Motion Vectors
- Syntax-based arithmetic coding,
- Advance prediction, and
- forward and backward frame prediction (similar to MPEG
called P-B frames)
- Five resolutions instead of two
Further improvements in H.263+ and H.264
H.263
Example: MissAmerica
Description
Average
PSNR(dB)
Compr. Ratio
Original, 30fps
1:1
n/a
10fps, 20Kbps
139:1
29.79
10fps, 100Kbps
29:1
36.0
Bitrate (Kbit/s)
9124
21.83
105.47
MPEG-2
MPEG-2: finalized in 1994
» Field-interlaced video
» Levels and profiles
• Profiles: Define bit stream scalability and color space resolutions
• Levels: Define image resolutions and maximum bit-rate per
profile
MPEG-2
• A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High
Speed Inter/Intranet) as well as DVD video
• 4~8 Mbps for TV quality, 10-15 for better quality at SDTV
resolutions (BT.601)
• 18-45 Mbps for HDTV applications
– MPEG-2 video high profile at high level is the video coding
standard used in HDTV
• Test in 11/91, Committee Draft 11/93
• Consist of various profiles and levels
• Backward compatible with MPEG1
• MPEG-2 Audio
– Support 5.1 channel
– MPEG2 AAC: requires 30% fewer bits than MPEG1 layer 3
MPEG-2 vs MPEG-1
• MPEG1 only handles progressive sequences (SIF).
• MPEG2 is targeted primarily at interlaced sequences and at
higher resolution (BT.601 = 4CIF).
• More sophisticated motion estimation methods (frame/field
prediction mode) are developed to improve estimation accuracy
for interlaced sequences.
- Frame Motion Vectors: one motion vector is generated per MB
in each direction, which corresponds to a 16x16 pels luminance
area.
- Field Motion Vectors: two motion vectors per MB is generated
for each direction, one for each of the fields. Each vector
corresponds to a 16x8 pels luminance area.
• Different DCT modes and scanning methods are developed for
interlaced sequences.
• MPEG2 has various scalability modes.
• MPEG2 has various profiles and levels, each combination
targeted for different application
MPEG-2 scalability
• Data partition
– All headers, MVs, first few DCT coefficients in the base layer
– Can be implemented at the bit stream level
– Simple
• SNR scalability
– Base layer includes coarsely quantized DCT coefficients
– Enhancement layer further quantizes the base layer
quantization error
– Relatively simple
• Spatial scalability
– Complex
• Temporal scalability
– Simple
SNR scalability
Spatial scalability
temporal scalability
or
MPEG-2 profiles and levels
Profiles: tools
Levels: parameter
range for a given
profile
Main profile at main
level (mp@ml) is
the most popular,
used for digital TV
Main profile at high
level (mp@hl):
HDTV
4:2:2 at main level
(4:2:2@ml) is used
for studio production
MPEG-4
New features
» Provides technologies to view access and manipulate objects
rather than pixels
» Entire scene is decomposed into multiple objects
– Object segmentation is the most difficult task!
– But this does not need to be standardized ☺
» Each object is specified by its shape, motion, and texture (color)
- Shape and texture both changes in time (specified by motion)
- Texture encoding is done with DCT (8x8 pixel blocks) or
Wavelets
» MPEG-4 assumes the encoder has a segmentation map
available, specifies how to code (actually decode!) shape,
motion and texture
MPEG-4
Example of Scene Composition
Object-Based Coding
MPEG-4

MPEG-4 block diagram
MPEG-4
MPEG-4
– Coding Tools
» Shape coding: Binary or Gray Scale
» Motion Compensation: Similar to H.263, Overlapped mode is
supported
» Texture Coding: Block-based DCT and Wavelets for Static
Texture
– Type of Video Object Planes (VOPs)
» I-VOP: VOP is encoded independently of any other VOPs
» P-VOP: Predicted VOP using another previous VOP and
motion compensation
» B-VOP: Bidirectional Interpolated VOP using other I-VOPs or
P-VOPs
» Similar concept to MPEG-2

Mesh Animation
• An object can be described by an initial mesh and MVs of the
nodes in the following frames
• MPEG-4 defines coding of mesh geometry, but not mesh
generation
Body and Face Animation
• MPEG-4 defines a default 3-D body model (including its
geometry and possible motion) through body definition table
(BDP)
• The body can be animated using the body animation parameters
(BAP)
• Similarly, face definition table (FDP) and face animation
parameters (FAP) are specified for a face model and its
animation
• E.g. eye blink (FAP19)
Text-to-Speech Synthesis with Face Animation
Others…
• Sprite
– Code a large background in the beginning of the sequence,
plus affine mappings, which map parts of the background to the
displayed scene at different time instances
– Decoder can vary the mapping to zoom in/out, pan left/right
• Global motion compensation
– Using 8-parameter projective mapping
– Effective for sequences with large global motion
• Quarter-pixel motion estimation
• DivX:
- based on MPEG-4
- can reduce an MPEG-2 video (the same format used for DVD
and pay per view) to 10 percent of its original size (so that a
DVD can be recorded on a CD)
- audio is normally coded using MP3
MPEG-7
MPEG-1/2/4 make content available, whereas MPEG-7 allows
you to find the content you need!
– A content description standard
» Video/images: Shape, size, texture, color, movements and
positions, etc…
» Audio: Key, mood, tempo, changes, position in sound space,
etc…
– Applications:
» Digital Libraries
» Multimedia Directory Services
» Broadcast Media Selection
» Editing, etc…
Example:
Draw an object and be able to find object with similar
characteristics.
Play a note of music and be able to find similar type of music

MPEG-21





Aims at standardizing interfaces and tools to facilitate the
exchange of multimedia resources across heterogeneous
devices, networks and users.
More specifically, it standardizes requisite elements for
packaging, identifying, adapting and processing these
resources as well as managing their usage rights.
This framework will benefit the entire consumption chain from
creators and rights holders to service providers and consumers.
Basic unit of transaction in the MPEG-21 Multimedia
Framework: the Digital Item, which packages resources along
with identifiers, metadata, licenses and methods that enable
interaction with the Digital Item.
Another key concept : the User, i.e. any entity that interacts in
the MPEG-21 environment or makes use of Digital Items.
MPEG-21


MPEG-21 can be seen as providing a framework in which one
User interacts with another User and the object of that
interaction is a Digital Item.
Some example interactions include content creation,
management, protection, archiving, adaptation, delivery and
consumption.
MPEG-A


MPEG’s Multimedia Application Formats (MAF) provide the
framework for integration of elements from several MPEG
standards into a single specification that is suitable for specific,
but widely usable applications.
Typically, MAFs specify how to combine metadata with timed
media information for a presentation in a well-defined format
that facilitates interchange, management, editing, and
presentation of the media. The presentation may be ‘local’ to
the system or may be via a network or other stream delivery
mechanism.
MPEG-A

MAF specifications shall integrate elements from different
MPEG standards into a single specification that is useful for
specific but very widely used applications. Examples are
delivering music, pictures or home videos. MAF specifications
may use elements from MPEG-1, MPEG-2, MPEG-4, MPEG-7
and MPEG-21. Typically, MAF specifications include:
- The ISO File Format family for storage
- A simple MPEG-7 tool set for Metadata
- One or more coding Profiles for representing the Media
- Tools for encoding metadata in either binary or XML form
MPEG-A

MAFs may specify use of:
- MPEG-21 Digital Item Declaration Language for representing
the Structure of the Media and the Metadata
- Other MPEG-21 tools
- non-MPEG coding tools (e.g., JPEG) for representation of
"non-MPEG" media
- Elements from non-MPEG standards that are required to
achieve full interoperability
MPEG-A: 2 examples

3on4:
- MP3, is one of the most widely used MPEG
standards. Currently, the ID3 simply appends simple metadata
tags such as Artist, Album, Song Title, etc.
-MPEG-4 specifies what MPEG expects to be another very
successful specification, the MPEG-4 File Format, while MPEG7 specifies not only signal-derived meta-data, but also archival
meta-data such as Artist, Album and Song Title.
- As such, MPEG-4 and MPEG-7 represent an ideal
environment to support the current “MP3 music library” user
experience, and, moreover, to extend that experience in new
directions.
MPEG-A: 2 examples



Jon4
- Digital Cameras -> library with thousands of digital photos
- Search for photographs of interest can be difficult ->
- Need for provision of suitable metadata: photo content (e.g.
the subject being photographed), author, shoot location,
imaging parameters, etc, stored in a standardized format
- The EXIF standard (commonly adopted by camera
manufacturers) does not support advanced metadata.
MPEG-7 defines rich metadata descriptions for still images,
audio and also provides associated systems tools (file formats,
etc)
As such, MPEG-7 and MPEG-4 file format represent an ideal
environment to support the current “Digital Photos Library” user
experience
Summary (1/2)
• H.261:
– First video coding standard, targeted for video conf. over ISDN
– Uses block-based hybrid coding framework with integer-pel MC
• H.263, H.264…
– Improved quality at lower bit rate, to enable video
conferencing/telephony below 54 Kbps (modems or internet
access, desktop conferencing); half-pixel MC
• MPEG-1 video
– Video on CD and video on the Internet (good quality at 1.5
Mbps)
– Half-pixel MC and bidirectional MC
• MPEG-2 video
– TV/HDTV/DVD (4-15 Mbps)
– Extended from MPEG-1, considering interlaced video
Summary (2/2)
• MPEG-4
– To enable object manipulation and scene composition at the
decoder -> interactive TV/virtual reality
– Object-based video coding: shape coding
– Coding of synthetic video and audio: animation
• MPEG-7
– To enable search and browsing of multimedia documents
– Defines the syntax for describing the structural and conceptual
content
• MPEG-21: beyond MPEG-7, considering intellectual property
protection, etc.
• MPEG-A: integration of elements from different MPEG
standards into a single specification that is useful for specific
but very widely used applications