Video Coding

Download Report

Transcript Video Coding

Video Coding
TSBK01 Image Coding and Data Compression
Lecture 10
Jörgen Ahlberg
Outline
I.
Colour coding
II.
Moving images: From 2D to 3D?
III.
Hybrid coding
IV.
Video coding standards
Part I:
Colour Coding
The base colours of colour television are
– Red:
700 nm
– Green:
546 nm
– Blue:
435 nm
Three base colours enough to
synthesize any visible colour!
The Colour Vector
B
G
R
In this plane, the
luminance Y = R+G+B = 1
The PAL colours
R
G
Y
Matrix
B
Y = 0.30B + 0.59G + 0.11B
Cr = 0.70R - 0.59G - 0.11B
Cb = - 0.30R - 0.59G + 0.89B
Y luminance; Cr, Cb chrominance
R-Y
B-Y
Digital Colour Coding

Change basis to YUV (almost the same as YCrCb).
– For more info on color spaces, see colour FAQ at
www.poynton.com/Poynton-color.html

The Human Visual System perceives the luminance in higher
resolution than the chrominance!
 Subsample the colour components.
Y
U
4:2:0
V
Y
U
4:2:2
V
Part II:
Coding of Moving Images
Principle I - Extend known methods to 3D
Prestanda (bpp)
Complexity
Decoding
complexity
6–8
Low
Low
0.5 – 2
Very high
Low
Predictive
2–5
Low
Low
Transform
0.5 – 1.5
High
High
Subband/
Wavelet
0.1 – 1.0
High
High
Fractal
0.1 - 0.5
Very high
Low
Coding Method
PCM
VQ
Extending 2D Methods

Predictive coding
–
3D predictors
–
Motion compensated predictors

Transform coding
–

3D transforms
Subband coding
–
3D subband filters
BUT! The properties of the image signal are different
in the temporal and the spatial domain!
Thus:
Principle II:
Hybrid methods
Hybrid predictive/transform coding popular++
Part III:
Hybrid Coding

Combine predictive coding and transform coding.

Use predictive coding to predict the next frame in the
sequence.

Use transform coding to code the prediction error.
Transform Coding
T
Q
VLC
T:
Transform
Q:
Quantizer
VLC: Variable Length Coder
Predictive Coding
Q
VLC
Q-1
P
Q:
Q-1:
P:
Quantizer
Inverse quantizer (reconstructor)
Predictor
Hybrid Coding
T
Q
VLC
Q-1
T-1
P
Frame Prediction
Intra-coded
I-frame
Predictively
coded
P-frames
Better prediction if it can compensate for motion!
Motion Compensation
Motion Compensated
Hybrid Coding
TQ
VLC
TQ-1
TQ: Transform
+ quantization
P
ME
ME: Motion estimation
VLC
Motion Compensation

Typically one motion vector per macroblock (4
transform blocks)

Motion estimation is a time consuming process
– Hierarchical motion estimation
– Maximum length of motion vectors
– Clever search strategies

Motion vector accuracy:
– Integer, half or quarter pixel
– Bilinear interpolation
Part IV:
Video Coding Standards
Mobile
Videophone
ISDN
videophone over PSTN videophone
8
16
64
384
Video CD
1.5
kbit/s
Very low bitrate
Digital TV
5
HDTV
20
Mbit/s
Low bitrate
MPEG-4 H.263 H.261 MPEG-1
Medium bitrate
MPEG-2
High bitrate
Standards

H.26x
– Standards for real time communication like video telephony
and video conferencing.
– Standardized by ITU.

MPEG
– Standards for stored video data like movies on CDs, DVDs,
etc.
– Standardized by ISO.
H.261

Standard for ISDN picture phones in 1990.

Motion compensation:
– One motion vector per macroblock.
– One macroblock = four 8£8 luminance blocks + two chrominance
blocks (one U and one V).
– Motion vectors max 15 pixels long in each direction.

Format:
– CIF (352£288) or QCIF (176£144)
– 7.5 – 30 frames/s.

Bitrate: Multiple of 64 kbit/s (=ISDN) including audio.

Quality: Acceptable for small motion at 128 kbit/s.
H.263

Standard for picture telephones over analog subscriber
lines in 1995.

Format:
– CIF, QCIF or Sub-QCIF.
– Usually less than 10 frames/s.

Bitrate: Typically 20 – 30 kbit/s.

Quality: With new options as good as H.261 (at half the
bitrate).
MPEG

Moving Pictures Expert Group – a committee under
ISO and IEC.

Original plan:
– MPEG-1 for 1.5 Mbit/s (VideoCD)
– MPEG-2 for 10 Mbit/s (Digital TV)
– MPEG-3 for 40 Mbit/s (HDTV)

What happened:
– MPEG-1 for 1.5 Mbit/s (Video CD)
– MPEG-2 for 2 – 60 Mbit/s (TV and HDTV)
– MPEG-4, -7 and -21 for other things.
MPEG-1

ISO/IEC standard in 1991.

Target bitrate around 1.5 Mbit/s (Video CD).

Properties:
– Bi-directionally predictively coded frames (”B-frames”, see next
slide).
– More flexible than H.261.
– Almost JPEG for intra frames.

Format:
– CIF
– No interlace.
– 24 – 30 frames/s.
MPEG Frame Types
Predictively
coded
P-frames
Intra-coded
I-frame
I B B P B B P B B P B B I
Group of frames (GOF)
Bi-directionally
predictively
coded
B-frames
MPEG-coding of I-frames

Intracoded

8£8 DCT

Arbitrary weighting matrix for coefficients

Predictive coding of DC-coefficients

Uniform quantization

Zig-zag, run-level, entropy coding
MPEG-coding of P-frames

Motion compensated prediction from I- or P-frame.

Half-pixel accuracy of motion vectors, bilinear
interpolation.

Predictive coding of motion vectors.

Prediction error coded as I-frame.
MPEG-coding of B-frames

Motion compensated prediction from two consecutive Ior P-frames.
– Forward prediction only (1 vector/macroblock).
– Backward prediction only (1 vector/macroblock).
– Average of fwd and bwd (2 vectors/macroblock).

Otherwise as P-frames.
MPEG-2

ISO/IEC standard in 1994.

Properties:
– Handles interlace (optimized for TV)
– Even more flexible than MPEG-1

Format:
– 352£288
– 704£576 (25 frames/s) or 720£480 (30 frames/s)
– 1440£1152 or 1920£1080 (HDTV)

Bitrate:
– 2 – 60 Mbit/s
– ~4 Mbits/s: Image quality similar to PAL / NTSC / SECAM.
– 18 – 20 Mbit/s: HDTV.
MPEG-2 (cont.)

Profiles:
– Simple profile without B-frames.
– Scaleable profiles.

Experience tells that:
– At 1.5 – 2 Mbit/s MPEG-2 is not better than MPEG-1.
– With manual interaction at the coding, good quality can be
achieved at 3 – 4 Mbit/s.
– Problems with implementing the full standard has caused
compatibility problems.
– Buffering and rate control hard problems.
MPEG-4

ISO/IEC standard in 1998, version 2 in 1999

Instead of frames as coding units, MPEG-4 use audio-visual
objects

Focus is not primarily on compression, but on content-based
functionality

Contains definitions of:
– Media object types (video, audio, text, graphics, ...)
– Parameters for describing the objects
– Bitstream syntax for the (compressed) parameters
– Scene description, file format, streaming, synchronization, ...

Allows mixing of media objects.
Parts of the MPEG-4
standard

Part 1, Systems, contains
– The bitstream syntax and the the binary ”language” for scene
description
– Computer graphics object descriptions
– Multiplexing, transport, ...

Part 2, Visual, contains
– Video coding
– Still image coding
– Texture coding, ...

Part 3, Audio, contains a toolbox of audio coders for different
applications

...
Structure of an MPEG-4
Decoder
Decoder
A/V
object
Decoder
A/V
object
Decoder
MUX
Compositor
Bitstream
A/V
object
Audio/Video scene
MPEG-4 (Natural) Video

Instead of frames: Video Object Planes

Coded with Shape Adaptive DCT
A video frame
Alpha map
VOP
SA DCT
Background VOP
VOP
MPEG-4 Video Coding
TQ
VLC
TQ-1
TQ: Transform
+ quantization
Mux
Predictor
Motion
estimation
VLC
Shape
coding
VLC
Synthetic/Natural
Hybrid Coding

Mix traditional video with 2D/3D graphics
– Compose virtual environments
– Easy to add text, graphs, images, etc

High compression

Receive object from separate sources
– Use predefined or locally defined objects

Scaleability
– Progressive decoding
– Better terminal gives better quality.
Synthetic Objects

2D/3D graphics
–
Lines, polygons
–
Still images
–
Image/video mapping on polygon meshes

VRML scenes and objects

Animated people

More on animation and virtual characters in Lecture 12!

Synthetic audio

More on natural and synthetic audio in Lecture 11!
Natural video object
mapped on 2D mesh
Natural video object
Computer graphics generated
virtual environment
Still image or natural video object
mapped on animated 3D mesh
All mixed in
the decoder!!!
Virtual Environments

Downloaded virtual environment

Different environments for different users

Simple change between environments

Synthetic environments are cheaper than real ones
Tools for Synthetic Objects

Wavelet-based still image compression
– Scaleable quality and resolution
– Progressive decoding
– Can be mapped on 2D or 3D meshes

Compression of 2D and 3D meshes
– Mesh geometry and animation
– Transmit vertex coordinates and let the receiving terminal
calculate the polygons
– A moving or still image can be mapped on the mesh (texture
mapping).
More Tools for Synthetic
Objects

Face and Body Animation

Text-to-speech (TTS) interface

View-dependent scaleable texture
– Information about the users view position in a 3D scene is
transmitted on a back-channel
– Only the necessary texture information is transmitted to the
user
View-dependent Scaleable
Texture
The texture is mapped on a surface
Original texture
What the user sees
Other formats

Microsoft, RealVideo, QuickTime, ...

All are variations of the hybrid coder used in MPEGcoders, with some extra features.
New Stuff
ITU and ISO in cooperation:
H.264
=
MPEG-4 part 10
Finished in 2003.
H.264 / MPEG-4 part 10

4£4 integer transform (approximating DCT).

Prediction of blocks of sizes up to 16£16.

Motion vectors for blocks of sizes 4£4 up to 16£16.

Up to 5 reference images for prediction.

Non-uniform qunatization.

Arithmetic coding of run-level pairs.
What about the sound?

MPEG-1
– Audio layer I, II and III (mp3).

MPEG-2
– Four channels, same codec as in MPEG-1.
– AAC (Advanced Audio Codec) added later.

MPEG-4
– AAC
– Two speech coders
– Structured audio
– And more...
More on audio coding
in Lecture 11.
Conclusion

Color coding
– Change basis from RGB to YUV
– Colour components are compressed harder than the
luminance

Moving image coding
– Hybrid coding: Motion compensated predictive coding and
transform coding of the prediction error
– I-, P-, and B-frames
– Object-based coding (MPEG-4) mixing synthetic and natural
audio & video
Conclusion (cont)

Standards
– MPEG-1:
Video CD
– MPEG-2:
Digital TV
– MPEG-4:
Multimedia
– H.261:
ISDN videophone
– H.263:
PSTN videophone
– H.264 / MPEG-4 part 10: Universal video
That was the last slide!