VIDEO COMPRESSION: FUNDAMENTALS AND ALGORITHMS …

Download Report

Transcript VIDEO COMPRESSION: FUNDAMENTALS AND ALGORITHMS …

VIDEO COMPRESSION
FUNDAMENTALS, part 2
Pamela C. Cosman
1
Extra flavors and refinements

Many different variations/improvements
possible for motion compensation





Increased accuracy of motion vectors
Unrestricted motion vectors
Multiple frame prediction
Variable sized blocks
Motion compensation for objects
2
Accuracy of Motion Vectors



Digital images are sampled on a grid. What if the actual
motion does not move in grid steps?
Solution: interpolation of grid points in reference frame
adds a half-pixel grid
Reference frame effectively has 4 times as many
positions for the best match block to be found
A
h
v
m
C
B
D
h  (A  B)/2
v  (A  C)/2
m  (A  B  C  D)/4
3
Unrestricted Motion Vectors

Suppose the camera is panning to the left
Lower Left
Macroblock
Reference Frame


Current Frame
Now consider the lower left macroblock in the
current frame.
What is the best match for it in the reference frame?
4
Unrestricted Motion Vectors

If the macroblock were allowed to hang over the
edge, then the best match would be like this:
Lower Left
Macroblock
Reference Frame


Current Frame
But then the motion vector is pointing outside the
frame!
The encoder and decoder can agree on some
standard interpolation to deal with this case
5
Unrestricted Motion Vectors
Reference Frame
Current Frame

The edge pixels in the reference frame are just replicated
outside the frame, for as many extra columns as necessary

In this way, a motion vector pointing outside the frame is
acceptable. Can get better matches!
6
Arbitrary Multiple Reference Frames



In H.261, the reference frame for prediction is always
the previous frame
In MPEG and H.263, some frames are predicted from
both the previous and the next frames (bi-prediction)
In H.264, any frame may be designated to be used as
reference:
 Encoder and decoder maintain synchronized buffers
of available frames (previously decoded)
 Reference frame is specified as index into this buffer
7
Multiple Frame Prediction

H.264 allows multiple frames to be used as references
8
Some Advantages of Multiple References



If object leaves scene and then comes back,
can have a reference for it in long term past
Similarly, if the camera pans to the right, and
then back to the left, then the scene that
reappears has a reference
If there’s an error, and the receiver sends
feedback to say where the error is, then the
encoder can use another reference frame

Helpful even if there’s no feedback
9
Variable Block-Size MC

Motivation: size of moving/stationary objects is
variable



Many small blocks may take too many bits to encode
Few large blocks give lousy prediction
Choices: In H.264, each 16x16 macroblock may be:




Kept whole, or
Divided horizontally (vertically) into two sub-blocks of size
16x8 (8x16)
Divided into 4 sub-blocks (8x8)
In the last case, the 4 sub-blocks may be divided once
more into 2 or 4 smaller blocks.
10
H.264 Variable Block Sizes
8 x8
16 x 16
8 x8
Tree-Structured Motion Compensation
16 x 8
8 x 16
16 x 16
8 x 16
8x8
16 x 8
8x4
8x8
4x8
4x4
11
Motion Scale Example
T=1
T=2
12
H.264 Variable Block Size Example
T=1
T=2
13
Variable Output Rate

Suppose the control
parameters of a video
encoder are kept
constant:



Quantization parameter
Motion estimation search
window size, etc.
Then the # of coded
bits per macroblock
(and per frame) will
vary

Typically, more bits produced
when there is high motion or
fine detail

Example: # of bits per frame
varies from 1300 to 9000

(32-225 kbits per second)
Bits
per
frame
9000
1000
0
Frame Number
200
14
Rate Control




Streams are usually coded for target rates,
for example, 3 Mbit/second
How are bits allocated among frames?
Macroblocks in I-frames are all intra coded
Macroblocks in P/B frames can be coded as:




Intra (DCT blocks)
Motion vectors only
Motion vectors and difference DCT blocks
Nothing at all (skipped)
15
Rate Control

The frames will have differing numbers of bits

This variation in bit rate can be a problem for many
practical delivery and storage mechanisms


Constant bit rate channel (such as a circuit-switched
channel) cannot transport a variable-bitrate data stream
Even a packet-switched channel is limited by link rates and
congestion at any point in time
16
Constant rate channel

The variable data rate produced by an encoder can
be smoothed by buffering prior to transmission
ENCODER
Buffer
Buffer
Variable bit rate
Constant rate
output from encoder
channel


DECODER
Variable bit rate
input to decoder
First In/First Out (FIFO) buffer at the output of the
encoder; another one at the input to the decoder
Emptied by the decoder at a variable rate
17
Decoder Buffer Contents

First frame decoded


stall


0
1
2
3
4
seconds
7 8
9
Takes 0.5 sec before first
complete coded frame
received
Then, decoder can extract
and decode frames at
correct rate of 25 fps until…
At about 4 sec, buffer
empties, decoder stalls
(pauses decoding)
Problem: video clip freezes
until more data arrives
Partial solution: add
deliberate delay at decoder
(e.g., 1 sec delay to decode
frame 1, allow buffer to
reach higher fullness)
18
Variable Bit Rate

Example shows that variable coded bit rate can
be adapted to a constant bit rate delivery
medium using buffers. This entails




Cost of buffer storage space
Delay
Not possible to cope with arbitrary variation of bit
rate using this method, unless buffer size and
decoding delay allowed to get arbitrarily large.
So… encoder needs to keep track of buffer
fullness…
19
Rate Control




Goal: with the transmission system at the
target rate for the video sequence, the
encoder & decoder buffers of fixed size never
overflow or underflow
This is the problem of rate control
MPEG does not specify how to achieve this
In addition to preventing overflow/underflow,
the rate control algorithm should also make
the sequence look good
20
Choice of Rate Control Algorithm
Choice of rate control depends on application
1) Offline encoding of video for DVD storage




Processing time not a constraint
Complex algorithm can be employed
Two-pass encoding:



Encoder collects statistics about the video in the 1st pass
Encoder encodes the video on the 2nd pass
Goal is to “fit” the video on the DVD while:


maximizing the overall quality of the video
preventing buffer overflow or underflow during decoding
21
Choice of Rate Control

2) Encoding of live video for broadcast






One encoder and multiple decoders
Decoder processing and buffering are limited
Encoder may use expensive fast hardware
Delay of a few seconds usually OK
Medium-complexity rate-control algorithm
Perhaps two-pass encoding of each frame
22
Choice of Rate Control

3) Encoding for two-way videoconferencing






Each terminal does both encoding and decoding
Delay must be kept to a minimum (say <0.5 sec)
Low-complexity rate control
Buffering minimized to keep delay small
Encoder must tightly control output rate
This may cause the output quality to vary
significantly, e.g., may drop when there is
increased movement or detail in the scene
23
Rate Control


Various possible approaches to rate control
For example, calculate a target bit rate Ri for
a frame based on




The number of frames in the group of pictures
The number of bits available for the remaining
frames in the group
The maximum acceptable buffer size contents
The estimated complexity of the frame
24
Rate Control: Example Algorithm

Let S be the mean absolute
value of the difference
frame after motion
compensation (a measure
of frame complexity)
X 1S X 2 S
R
 2
Q
Q


Calculate S for the frame
Compute the quantizer step
size Q using the model



Encode the current
frame using parameter Q
Update the model
parameters X1 and X2
based on the actual
number of bits generated
for the frame
There are also
macroblock-level rate
control algorithms when
“tight” rate control is
needed
25
Standards







Standards Groups (MPEG, VCEG)
H.261: Videophone/videoconferencing (1990)
MPEG-1: Low bit rates for dig. storage (1992)
MPEG-2: Generic coding algorithms (1994)
H.263: Very low bit rate coding (1995)
MPEG-4: Flexibility and computer vision
approaches (1998)
H.264: Recent improvements (2003)
26
Advantages/Disadvantages

Disadvantages of
standardization:



Improvements in price
and performance come
from battle to create and
own proprietary approach
Proprietary codecs
generally exhibit higher
quality than a standard
Standards are slow
moving, developed by
committee, try to avoid
patents

Advantages of
standardization:





Interoperability
Different platforms
supported
Vendors can compete for
improved implementations
Worldwide technical
community can build on
each other’s work
Several standards have
been hugely successful
27
H.261: real-time, low complexity, low delay




Motivated by the definition
and planned deployment of
ISDN (Integrated Services
Digital Network)
Rate of p*64 kbits/s where
p is integer 1…30
For example, p=2→ 128
kbits/s with video coding at
112 kbits/s and audio at 16
kbits/s
Applications: videophone,
videoconferencing

Videoconferencing
compression:




Operate in real time
Not much coding delay
Low complexity
No particular advantage
to shifting the complexity
onto encoder or decoder
(each user will require
both encoding and
decoding capabilities)
28
H.261 Basics




Standardization started 1984, finished 1990
Uncompressed CIF (4:2:0 chrom. sampling,
15 frames per sec.) requires 18.3 Mbps
To get this down to p x 64 Kbps requires 10:1
up to 300:1 compression
H.261 achieves compression using the same
basic elements discussed before:



Motion compensation (for temporal redundancy)
DCT + Quantization (for spatial redundancy)
Variable length coding (run-length, Huffman)
29
H.261 Motion Compensation


Motion compensation done on macroblocks of size
16 x 16, same as MPEG-1 and -2
However, consider application fields:
videoconferencing, videophone






A call is set up, conducted, and terminated.
These events always occur together, in sequence
Don’t need random access into the video
Need low delay
Also, expect slow-moving objects
Question: What features should these facts lead to?
30
H.261 Motion Estimation

Slow movement: For each block of pixels in
the current frame, the search window is only
± 15 pixels in each direction
15
15
15
15
T=1 (previous frame)
T=2 (current frame)
31
H.261 Motion Compensation



No B pictures: don’t want the delay or
complexity associated with them
H.261 uses forward motion compensation
from the previous picture only
First frame is Intra-frame. NO frame after
that has to be Intra. Every subsequent frame
may use prediction from the one before

This means that to decode a particular frame in
the sequence, it is possible that we will have to
decode from the very beginning. No random
access.
32
ISO MPEG

Originally set up in 1988, committee had 3 work items:

MPEG-1: targeted at 1.5 Mbps
 MPEG-2: targeted at 10 Mbps
 MPEG-3: targeted at 40 Mbps
Later, became clear that algorithms developed for MPEG-2 would
accommodate higher rates, so 3rd work item dropped
Later MPEG-4 added

Goals:






MPEG-1: compression of video/audio for CD playback
MPEG-2: storage and broadcast of TV-quality audio and video
MPEG-4: coding of audio-visual objects
Also MPEG-7 and MPEG-21 which are about multimedia content and
not compression
33
MPEG-1 Audiovisual coder for digital
storage media


Goal: Coding full-motion video & associated audio at
bit rates up to about 1.5 Mbps
Brief history of MPEG-1






October 1988: working group formed
September 1989: 14 proposals made
October 1989: video subjective tests performed
March 1990: simulation model
November 1992: international standard
Solution to a specific problem:

Compress an audio-video source (~210 Mbps) to fit into a
CD-ROM originally designed to handle uncompressed
audio alone (requires aggressive compression 200:1)
34
MPEG-1 major differences

Unlike videoconferencing, for digital storage
media, random access capability is important




INTRA frames
In order to avoid a long delay between the frame a
user is looking for, and the frame where decoding
starts, INTRA frames should occur frequently
But then the coding efficiency goes down
Improve compression efficiency using B frames
35
B Frames



Bidirectionally predicted blocks
allows effective prediction of
uncovered background
Bidirectional prediction can
reduce noise (if good
predictions available both past
and future)
B pictures not used for
prediction→ substantial
reduction in bits (I:P:B 5:3:1)
B pictures – forward, backward, &
interpolatively motion compensated
from previous/next I/P frames

Increases motion estimation
complexity in 2 ways:


Search 2 frames
Search bigger window if
anchor frame farther away
36
MPEG-2 Generic Coding Algorithms


Goal: digital video transmission in range 2-15 Mbps
Generic coding algorithms to support:


Digital storage media, existing TV (PAL, SECAM, NTSC),
cable, direct broadcast satellite, HDTV, computer graphics,
video games
Brief history:




July 1990: working group established
Nov 1991: Subjective tests on 32 proposals
March 1993: technical contents of main level frozen
Nov 1994: international standard (parts 1-3)
37
Main differences MPEG-1 and -2

MPEG-2 aimed at higher bit rates


MPEG-2 has a wider range of bit rates


Tool kit approach allows use of different subsets
of algorithms
MPEG-2 supports scalable coding


Can be used for larger picture formats
SNR scalable, spatially scalable
MPEG-2 supports interlacing

This permeates everything: motion compensation,
DCTs, ZigZag ordering for variable length coding
38
Overview of MPEG-4 Visual

MPEG-4 Visual is meant to handle many
types of data, including





Moving video (rectangular frames)
Video objects (arbitrary-shaped regions of moving
video)
2D and 3D mesh objects (representing
deformable objects)
Animated human faces and bodies
Static texture (still images)
39
Video Objects






MPEG-4 moves away from traditional view of video
as a sequence of rectangular frames
Instead, collection of video objects
A video object is a flexible entity that a user can
access (seek, browse) and manipulate (cut, paste)
A video object (VO) is an arbitrarily-shaped area of
scene that may exist for an arbitrary length of time
An instance of a VO at a particular time is called a
video object plane (VOP)
Definition encompasses traditional view of
rectangular frames too
40
MPEG-4: Object-based motion
compensation
T=1
T=2
41
Static Sprite Coding


Background may be coded as a static sprite
The sprite may be much larger than the visible area
of the scene
Source:
http://mpeg.telecomitalialab.com/standards
/mpeg-4/mpeg-4.htm
42
Global Motion Compensation



The encoder sends up to 4 global motion vectors (GMVs) for
each VOP together with the location of each GMV in the VOP
For each pixel position, an individual MV is calculated by
interpolating between the GMVs and the pixel position is
motion compensated according to this interpolated vector
GMVs and
interpolated vector

GMC compensating
for rotation

GMC compensating
for camera zoom
43
Global Motion Estimation between 2
images assuming 2d affine motion
Compression example: error images before and after global motion compensation
(Soccer sequence: global motion estimation between 1st and 10th frame)
44
Coding Synthetic Visual Scenes

Animated 2D mesh
coding




A 2-D mesh is made up of
triangular patches
Deformation or motion can
be modelled by warping the
triangles
Surface texture may be
compressed as static
texture
Mesh and texture
information might both be
transmitted for key frames



No texture transmitted for
intermediate frames
Mesh parameters
transmitted
Decoder animates mesh
45
Motion Vectors for Meshes



A mesh is warped
by transmitting
vectors which
displace the nodes
Mesh MVs are
predictively coded
Texture residual
can be coded with
a very small
number of bits

MPEG-4 also allows 3-D meshes

The vertices need not be in one plane

3-D mesh samples the surface of a
solid body
46
Shape-Adaptive DCT

The shape-adaptive DCT uses one-dimensional DCT, where the
number of points in the transform matches the number of opaque
values in each column (or row)
Shift
vertically
1-D column
DCT
Final coefficients
Shift
horizontally
1-D row
DCT
More complex than
normal 8x8 DCT,
but improves
coding efficiency
for boundary MBs
47
Face and Body Animation

Two basic steps:



Define basic shape of
face or body model
(typically carried out once
at start of session)
Send animation
parameters to animate
the model
Encoder has choice of


Generic facial definition
parameters (FDPs)
Custom FDPs for a
specific face

In similar way, a body
object is rendered from a
set of Body Definition
Parameters (BDPs) and
animated using Body
Animation Parameters
48
Face Animation





The generic face can be modified by Facial
Definition Parameters (FDPs) into a particular face
FDP decoder creates a neutral face: one which
carries no expression
Change expressions by moving the vertices
Not necessary to transmit data for each vertex,
instead use Facial Animation Parameters (FAPs)
Some combinations of vectors are common in
expressions such as a smile, so these are coded as
visemes



Can be used alone
Can be used as predictions for more accurate FAPs
Resulting data rate is small, e.g., 2-3 kbps
49
H.264 Brief history


The work started in VCEG (in 1998) as a parallel
activity with the final version of H.263
First test model produced in 1999. Many small
steps over the next 4 years:






Many tweaks to the integer transform and to the variable
block size
1/8 pixel accurate MVs added in and then dropped
Many tweaks on the deblocking filter
Etc. etc.
Final version March 2003
Final results: 2-fold improvement in compression
(compared to H.263 and MPEG-2) & significantly
better than MPEG-4 ASP
50
Comparison of H.264 and MPEG-4
Comparison
MPEG-4 Visual
H.264
Supported data types
Rectangular video frames and
fields, arbitrary-shaped objects, still
texture and sprites, synthetic
objects, 2D and 3D mesh objects
Rectangular video
frames and fields
# profiles
19
3
Compression efficiency
medium
high
Support for video streaming
Scalable coding
Switching slices
Motion comp. min block size
8x8
4x4
MV accuracy
½ or ¼ pixel
¼ pixel
Transform
8 x 8 DCT
4 x 4 DCT approx.
Built-in deblocking filter
No
Yes
License payments required
Yes
Probably no for
baseline
51
Question



A 16 by 16 MB to be
motion compensated is
shown above
The search window is
shown below
Which block(s) in the
search window will
provide the best match


With MAE error metric?
With MSE error metric?
52
Question





A sequence of frames is being coded by an MPEGstyle coder that searches for best-match
macroblocks using full search with an MSE criterion
Frames are I,B,B,P,B,B.,…
The camera is moving horizontally by 10 pixels per
frame during this sequence, so 30 pixels of offset
between I frame and subsequent P frame
Many macroblocks in the P frame might get coded
using MV=(30,0) with no difference block
Why might some MBs not get coded precisely this
way? List all the reasons you can think of.
53