Powerpoint file on video compression.

Transcript Powerpoint file on video compression.

VIDEO COMPRESSION
FUNDAMENTALS
Pamela C. Cosman
1
Compressing Digital Video


Exploit spatial redundancy within frames (like JPEG:
transforming, quantizing, variable length coding)
Exploit temporal redundancy between frames

Only the sun has changed position between these 2 frames
Previous Frame
Current Frame
2
Simplest Temporal Coding - DPCM





Frame 0 (still image)
Difference frame 1 = Frame 1
– Frame 0
Difference frame 2 = Frame 2
– Frame 1
If no movement in the scene,
all difference frames are 0.
Can be greatly compressed!
If movement, can see it in the
difference images
0
1
2
3
3
Difference Frames

Differences between two frames can be
caused by





Camera motion: the outlines of background or
stationary objects can be seen in the Diff Image
Object motion: the outlines of moving objects can
be seen in the Diff Image
Illumination changes (sun rising, headlights, etc.)
Scene Cuts: Lots of stuff in the Diff Image
Noise
4
Difference Frames



If the only difference between two frames is
noise (nothing moved), then you won’t
recognize anything in the Difference Image
But, if you can see something in the Diff
Image and recognize it, there’s still
correlation in the difference image
Goal: remove the correlation by
compensating for the motion
5
6
7
8
Types of Motion
Frame n


Translation: simple
movement of typically
rigid objects
Camera pans vs.
movement of objects
Frame n+1
(Rotation)

Rotation: spinning
about an axis


Frame n+1
Camera versus object
rotation
Zooms –in/out

Frame n
Frame n+2
(Zoom)
Camera zoom vs. object
zoom (movement in/out)
9
Describing Motion

Translational


Rotational


Move (object) from (x,y) to (x+dx,y+dy)
Rotate (object) by (r rads) (counter/clockwise)
Zoom

Move (in/out) from (object) to increase its size by
(t times)
Which is easiest? Which are we most likely to
encounter?
10
Motion Estimation



Determining parameters for the motion
descriptions
For some portion of the frame, estimate its
movement between 2 frames- the current
frame and the reference frame
What is some portion?




Individual pixels (all of them)?
Lines/edges (have to find them first)
Objects (must define them)
Uniform regions (just chop up the frame)
11
General Idea


For a region PC in the current frame, find a
region PR in the search window in reference
frame so that Error(PR,PC) is minimized
Issues: Error measures, search techniques,
choice of search window, choice of reference
frame, choice of region PC
Search
window
Reference
Frame
Portion
of
interest
Current
Frame
PC
12
Block-based Motion Estimation


PC is a block of pixels (in the current frame)
The search window is a rectangular segment
(in the reference frame)
T=1 (reference)
T=2 (current)
13
Motion Vectors

A motion vector (MV) describes the offset between
the location of the block being coded (in the current
frame) and the location of the best-match block in
the reference frame
T=1 (reference)
T=2 (current)
14
Motion Compensation
The blocks being predicted are on a grid
1
3
2
5
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4
6
7
8
10
9
11
12
14
13
1
15
16
16
The blocks used for prediction are NOT
15
Motion Vector Search

1. Mean squared error


Select a block in the
reference frame to
minimize
Σ(b(Bref)-b(Bcurr))2

Given error measure,
how to efficiently
determine best-match
block in search window?

2. Mean abs. error

Select block to
minimize
Σ|b(Bref)-b(Bcurr)|


Full search: best results,
most computation
Logarithmic search –
heuristic, faster
Hierarchical motion
estimation
16
Motion Vector Search

Full search: Evaluate
every position in the
search window




Logarithmic Search: First examine
positions marked 1.
Choose best of these (lowest error
measure) and examine positions
marked 2 surrounding it
Choose the best of these, and
examine the positions marked 3
Final result = best of these
17
Hierarchical Motion Estimation



Use an averaging filter on the image, then
downsample by a factor of 2
Conduct a search on the downsampled
image (only ¼ of the size)
Given the results of the search on the
downsampled image, return to the full
resolution image and refine the search there
18
Motion Compensation



The standards do not specify HOW the
encoder will find the motion vectors (MVs)
The encoder can use exhaustive/fast search,
MSE /MAE/other error metric, etc.
The standard DOES specify



The allowable syntax for specifying the MVs
What the decoder will do with them
What the decoder does is to grab the
indicated block from reference frame, and
glue it in place
19
Standard specifies bit stream

The video compression
standards define syntax and
semantics for the bit stream
between encoder and decoder

bit stream
ENCODER
not this



DECODER
Standard defines
this
not this
Encoder is not specified by
MPEG except that it produces
a compliant bit stream
Compliant decoder must
interpret all legal MPEG bit
streams
This allows future encoders of
better performance to remain
compatible with existing
decoders.
Also allows for commercially
secret encoders to be
compatible with standard
decoders
Today’s Ho-Hum
Encoder
Today’s
Decoder
Tomorrow’s Nifty
Encoder
Very secret
Encoder
Today’s decoder
still works!
20
Motion Compensation Example
Frame n-1
Frame n
(0,0)
(-16,0) (5,0)
(0,0)
(0,0)
(16,7)
(0,0)
(20,-24) (0,0)
(5,2)
(-20,-18) (0,0)
MOTION COMPENSATED
Frame n
21
Objects versus Macroblocks

Real moving objects will not coincide with
boundaries of macroblocks
background
moving object



Prediction error
Moving object well encoded
with motion vector
Background well
encoded (no
motion vector)
Prediction error
If encoder sends MV=(MotX,MotY), object well
coded, but background poorly coded
If encoder sends MV=(0,0), background well coded,
but moving object poorly coded
Either approach is valid
22
Motion Compensation




This glued together frame is called
the motion compensated frame
The encoder can also form the difference between
the motion compensated frame and the actual
frame.
This is called the motion compensated difference
frame
This difference frame formed using MC should have
less correlation between pixels than the difference
frame formed without using MC
23
Motion Compensated Difference
Frames





Suppose we are doing lossless coding
Encoder has sequence of frames: …, F(n-2), F(n-1)
Next: encode F(n)
Past frames have been losslessly encoded, so the
decoder knows F(n-1) perfectly already
Encoder sends the motion vectors for frame F(n)
relative to frame F(n-1), to form motion
compensated frame M(n)

Encoder knows M(n), Decoder knows M(n)
24
Motion Compensation Example
F(n-1)
F(n)
M(n)
(0,0)
(-16,0) (5,0)
(0,0)
(0,0)
(16,7)
(0,0)
(20,-24) (0,0)
(5,2)
(-20,-18) (0,0)
MOTION COMPENSATED
Frame
25
Encoding Difference Frames

Encoder forms motion
compensated diff frame:

FD(n) = F(n) – F(n-1)
MCD(n) = F(n) – M(n)


Encoder losslessly
encodes MCD(n)
Decoder can then do
F(n) = MCD(n) + M(n)
→ knows F(n) exactly
If
With no motion compensation
encoder could do frame diff:


Encoder losslessly
encodes FD(n)
Decoder can then do
F(n) = FD(n) + F(n-1)
→ knows F(n) exactly
successive frames are very similar:
fewer
bits to send Motion Vectors + MCD(n) instead of FD(n)
fewer
bits to send FD(n) instead of F(n)
26
Motion compensated difference frames

Decoder knows F(n-1) and, once you send the
motion vectors, it knows M(n)
Send FD(n)
Reference Frame
F(n-1)
Original Frame
F(n)
Send Motion
Vectors
Motion compensated
frame M(n)
Difference Image
FD(n)=F(n)-F(n-1)
Send MCD(n)
Motion compensated
difference image
MCD(n) =F(n) – M(n)
27
Motion Compensated Difference
Frames






But we are NOT doing lossless coding
Encoder has sequence of frames: …, F(n-2), F(n-1)
Next: encode F(n)
Past frames have been lossy encoded, so the
decoder has versions …, G(n-2), G(n-1)
Encoder knows …, G(n-2), G(n-1) also
Encoder sends the motion vectors for frame F(n)
relative to frame G(n-1), to form motion
compensated frame M(n)
28
Encoding Difference Frames





Encoder forms motion compensated
difference frame: MCD(n) = F(n) – M(n)
Encoder lossy encodes MCD(n)
Call the decoder version MCD*(n)
If the decoder received MCD(n) exactly,
could do: F(n) = MCD(n) + M(n)
But with MCD*(n), decoder can do
G(n) = MCD*(n) + M(n)
→ knows F(n) approximately
29
Motion estimation philosophy


Goal of motion estimation is NOT to provide a
careful analysis of the actual motion
Goal is to achieve a given quality of representation
of the video while globally minimizing the bit rate
required to send



The motion information
The prediction error information
Most of the time, for a given representation quality


fewer bits to send MV+MCD(n) instead of sending FD(n)
fewer bits to send FD(n) instead of sending F(n) itself.
30
Motion Compensation for Chrominance



Luminance is highly correlated, more so than
chrominance
The “best” motion vectors are available by
searching in the luminance plane
Motion vectors for chrominance are not
computed separately, simply scaled as
needed
31
Motion Estimation/Compensation
Summary

At the encoder:



For each block in the frame being coded, examine
the search window(s) in the reference frame to
find the best match block (do this for luminance
only)
Form the MC difference image = original image
minus motion compensated image
Scale the motion vectors for the chrominance,
form the motion compensated chrominance
frames, and form chrominance difference image
32
Motion Estimation/Compensation
Summary

At the decoder:




Decode the reference frames (Y,Cr,Cb)
For each block in a temporally coded Y frame, use
the motion vector to select a block from the
reference frame and glue it in place
Add the Y difference image
For each block in temporally coded Cr,Cb frames,
first scale the motion vector, then do the previous
2 steps with Cr and Cb data
33
Progress of Video Compression
34
Progress of Video Compression
35
Progress of Video Compression
36
Temporal Location of Reference

The reference frame need not occur before
the temporally coded frames which use it

Why? Scene changes, allow better matches
37
Flavors of Motion Estimation




1. Forward predicted blocks: the best-match block
occurs in the reference frame before the block’s
frame
2. Backward predicted blocks: the best-match block
occurs in the reference frame after the block’s frame
3. Interpolatively predicted blocks: the best-match
block is the average of the best-match blocks from
reference frames before & after
The motion compensation direction can be selected
independently for each block in a frame.
38
MPEG Frame Types

Intra (I) pictures: coded by themselves, as
still images. No temporal coding. No motion
vectors.
39
MPEG Frame Types

Forward Motion Compensated predicted (P)
pictures – forward motion compensated from
the previous I or P frame
40
MPEG Frame Types

Motion Compensated interpolated (B) pictures –
forward, backward, and interpolatively motion
compensated from previous/next I/P frames
41
Motion Vector Coding

How are the motion vectors actually encoded
for transmission to the decoder?



Start by taking the difference between the current
motion vector and the most recent previous one of
the same type (forward/backward/interpolative)
Encode the difference using variable length
coding
Horizontal and vertical components coded
separately
42
MPEG Frame Structure Terminology

A block contains 8x8 pixels


The DCT unit
A macroblock (MB) contains 4 blocks from
the luminance, plus the corresponding
chrominance blocks




4 blocks from each of Cr/Cb if 4:4:4 format
2 blocks from each of Cr/Cb if 4:2:2 format
1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format
The motion compensation unit
43
MPEG Frame Structure Terminology

A slice is a collection of macroblocks, tracing
in a raster scan from upper left to lower right


A picture is a frame, either progressive (noninterlaced) or interlaced


The resynchronization unit
The primary coding unit
A Group of Pictures (GOP) contains ≥ 1
frame.

The unit for random access into the sequence
44
MPEG GOP Structure

A Group of Pictures (GOP) may contain




All I pictures
I & P pictures only
I, P, & B Pictures
A common GOP format for 30 frames/sec:


I-picture spacing 15 frames (1/2 second)
P-picture spacing 3 frames (1/10 second)
I
B B P B B P B B P B B P B B I
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
45
Frame Ordering

B
Display order (encoder input order):
B
I
B
B
P
B
B
P
B
B
P
-1 0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16

B
B
P
B
B
I
But consider coding dependencies:
Frame 2 (B) needs frame 4 (P) to be decoded first, etc.
So better transmit frame 4 before frame 2


I
B
B
P
B
B
P
B
B
P
B
B
P
B
B
I
B
B
1
-1 0
4
2
3
7
5
6
10 8
9
13 11 12 16 14 15
46
Types of Coding Modes

What if the best-match block in the reference
frame is a great match?


What if it is a terrible match?


Then the motion vector is all you need to send
Then don’t use the motion vector at all, just code
the block by itself, with something like JPEG
(called intra mode coding)
What if it is a so-so match?

Then you can send the MV, and also send the
frame difference information for that macroblock
47
Coding Mode I (Inter-Coding)

Inter coding refers to coding with motion vectors
Macro
Block
Previous
Frame
Current
Frame
Motion Vector
48
Coding Mode II (Intra-Coding)


INTRA coding refers to coding without motion vectors
The MB is coded all by itself, in a manner similar to JPEG
Macro
Block
Previous
Frame
Current
Frame
49
I-Picture Coding

Two possible coding modes for macroblocks
in I-frames




Intra- code the 4 blocks with the current
quantization parameters
Intra with modified quantization: scale the
quantization matrix before coding this MB
All macroblocks in intra pictures are coded
Quantized DC coefficients are losslessly
DPCM coded, then Huffman as in JPEG
50
I-Picture Coding

Use of macroblocks modifies block-scan order:
8
8
8
8


Quantized coefficients are zig-zag scanned and runlength/Huffman coded as in JPEG
Very similar to JPEG except (1) scaling Q matrix
separately for each MB, and (2) order of blocks
51
P-Picture Coding: many coding modes


Motion compensated
coding: Motion Vector only
Motion compensated
coding: MV plus difference
macroblock
Motion
Vector




DCT
Motion compensation: MV
& difference MB with
modified quant. scaling

MV = (0,0), just send
difference block
MV=(0,0), just send diff
block, with modified
quantization scaling
Intra: the MB is coded
with DCTs (no
difference is computed)
Intra with modified
quantization scaling
52
How to choose a coding mode?

MPEG does not specify how to choose mode

Full search = try everything…the different
possibilities will lead to different rate/distortion
outcomes for that macroblock
distortion
•
•
MV, no difference
•
•
• MV, plus difference
• • Intra
•
rate
53
How to choose a coding mode?


Tree search: use a decision tree
For example:



First find the best-match block in the search window. If
it’s a very good match, then use motion compensation.
Otherwise, don’t.
If you decided to use motion compensation, then need
to decide whether or not to send the difference block as
well. Make decision based on how good a match it is.
If you decided to send the difference block, then have to
decide whether or not to scale the quantization
parameter… check the current rate usage…
54
B-Picture Coding

B pictures have even more possible modes:








Forward prediction MV, no difference block
Forward prediction MV, plus difference block
Backward prediction MV, no difference block
Backward prediction MV, plus difference block
Interpolative prediction MV, no difference block
Interpolative prediction MV, plus difference block
Intra coding
Some of above with modified Quant parameter
55
Group of Pictures

IIIII…: Every picture is intra-coded.




Fully decodable without reference to any other picture
Editing is straightforward
Requires about 2.5 more bit rate than bidirectional
IBBPBBPB…: Forward and bidirectional




Best compression factor
Needs large decoder memory
Hard to edit
Most useful for final delivery of post-produced material
(e.g., broadcast) because no editing requirement
56
Group of Pictures

IPPPPIPP…: Forward predicted only.


Needs less decoder memory
IBIBIB…: bidirectional compromise



Some of the bit rate advantage of bidirectional coding
Not nearly the full latency penalty of bidirectional
Editable with moderate processing.
For example, if the video after a B picture needs to be
deleted, the B frame would not be decodable.
Solution is to decode the B frame first, re-encode it using
forward prediction only. Some quality loss.
57