No Slide Title

Download Report

Transcript No Slide Title

Video Coding Standards
Heejune AHN
Embedded Communications Laboratory
Seoul National Univ. of Technology
Fall 2011
Last updated 2011. 5. 13
Agenda







History and Concepts
JPEG and JPEG-2000
MPEG-1 and MPEG-2
MPEG-4
H.261 and H.263
H.264
Beyond H.264
Heejune AHN: Image and Video Compression
p. 2
1. Standards and Standards Bodies

VCEG (video coding expert group) in ITU (formerly CCITT)


MPEG/JPEG (moving picture expert group) in ISO


Focus on real-time, two-way video communication
Focus on multimedia storage and distribution for entertainment
Some are overlapped
ITU VCEG
ISO MPEG/JPEG
JPEG
MPEG-1
MPEG-2
=>
MPEG-4
JPEG-2000
MPEG-4/AVC <=
MPEG-7
MPEG-21
Heejune AHN: Image and Video Compression
H.261
H.262
H.263
H.264
H.264 High Profile
H.264 SVC
H.264 MVC
HEVC(H.265)
p. 3
History of Video Coding Standards
HP
SVC HEVC
MVC
2011
Heejune AHN: Image and Video Compression
p. 4

ISO-MPEG/JPEG






ITU-VCEG




JPEG (1992) : compression of still image (DCT)
MPEG-1 (1993) : real time play back of VHS quality on Video CD (1.4Mbps)
MPEG-2 (1995) : broadcasting quality video service (3~5Mbps)
MPEG-4 (1998) : wide bandwidth (20bps to high) and object oriented coding
JPEG-2000 (2000) : better quality still image
H.261 (1990) : video telephony over ISDN (px64kbps)
H.263 (1995) : video telephony over circuit and packet network, at 20 kbps
to high bandwidth
H.264 (2003) : multipurpose better quality video coding
Others


MPEG-7 (Multimedia content description interface) for search and retrieval
in multimedia DB
MPEG-21(Multimedia Framework) for multimedia delivery for interoperability
Heejune AHN: Image and Video Compression
p. 5
Standards process and usage

Standards process
Scope &
Aim of
St’ds
Test
Model
(Docs &
ref. SW)
Proposals Performance
&
From
complexity
Companies,
evaluation
Universities

Draft
St’ds
Int’l
St’ds
Improvement
Proposals
Understanding standards



Only Syntax and Decoder system are defined in Standards.
Encoder, application, and Implementation are open to users
Standards provides “profile and level” and recommended usage for
helping users to choose from many technical options.
Heejune AHN: Image and Video Compression
p. 6
2. JPEG

ISO IS-10918



By ISO/IEC JTC1/SC29/WG10, (1984~1992)
Widely used in WWW and digital photography
Motion-JPEG is just a successive stream of JPEG images
Heejune AHN: Image and Video Compression
p. 7
Baseline JPEG CodecSSSS-value
DC Huffman
tables
RGB or YCbCr coded in either
separately or in interleaved
order
input

image
Level
offset
8x8
DCT
dc quantization indices
Differential
Coding
Uniform
scalar
quantization
[0,255] => [-128,127]
Zig-zag
scan
Run-level
coding
VLC
VLC
bits
bits
ac quantization indices
Quantization
tables
8x8 blocks
16
12
11
12
10
14
16
19
24
26
14
14
18
13
17
22
16
22
37
24
29
56
40 57 69 56
51 87 80 62
68 109 103 77
24
49
36
64
55
78
64 81 104 113 92
87 103 121 120 101
72
92
95
98 112 100 103 99
Heejune AHN: Image and Video Compression
40
58
51
60
61
55
AC Huffman
tables
RRRRSSSS-value
p. 8

Lossless JPEG


DPCM used, prediction from 3 neighbors pixels
Optional mode

Progressive encoding
• Store image data in order of DC only, low-frequency AC, high
frequency AC

Hierarchical encoding
• Store image data in low resolution to high resolution

Motion-JPEG



Just a sequence of JPEG still images
Low complexity, Error tolerance, Market awareness
Used for video conferencing and surveillance before widely
available cheap MPEG-1/2/4 solution in a market
Heejune AHN: Image and Video Compression
p. 9
JPEG-2000

Features

Good compression performance than JPEG
• at high compression ratio, no blocking effects






Good compression for continuous tone, bi level (text)
Both lossless and lossy compression in one framework
ROI (region of interest) support
Error resilient support (data partitioning)
Rather slow in current embedded system due to complexity
Encoding process
image
(Tiling)
Wavelet
Transform
Heejune AHN: Image and Video Compression
Quantizer
Arithmetic
Encoder
bits
p. 10

Comparison between JPEG vs. JPEG-2000
Lenna, 256x256 RGB
Baseline JPEG: 4572 bytes
Heejune AHN: Image and Video Compression
Lenna, 256x256 RGB
JPEG-2000: 4572 bytes
p. 11
MPEG-1/2

MC-DCT Hybrid Coding
Coder
Control
Decoder
DCT
Coefficients
Quant
Intra-frame
Decoder
DeQ
0
MotionCompensated
Intra/Inter
Predictor
Motion
Estimator
Heejune AHN: Image and Video Compression
Entropy coder
Intra-frame
DCT Coder
Control
Data
Motion
Data
p. 12
MPEG-1

MPEG-1




Targeted VHS quality(352x288, 30fps, YCbCr420) on VCD (600MB)
1.4 Mbps (1.2 Mbps video + 0.2 Mbps audio) VCD, 70 minutes
Three parts: Part 1 System, Part 2 Video, Part 3 Audio
Technology

MC-DCT Hybrid
• Macro-block (16x16 pixels): Motion estimation unit
• Block (8x8 pixels): DCT and Quant unit

GOP structure
• I, P, B picture
• Trade-off between random access and coding efficiency

Asymmetric complexity
• Larger memory and high computation required at Encoder
Heejune AHN: Image and Video Compression
p. 13
MPEG-1 Structure
SH : Sequence Header

Syntax Hierarchy

Sequence layer
S
H
I

GOP layer

Picture Layer
S
H
GOP
B
GOP : Group of Picture
S
H
GOP
B
P
B
S
H
GOP
B
P
...
...
P
Slice

Slice Layer
MB

MB
MB
MB
...
...
8
MB Layer
6
8
Cr
1

Block Layer
3
2
4
5
Cb
5
6
Cr
1
2
3
4
Cb
16
Y
16
8
(4:2:0)
Heejune AHN: Image and Video Compression
8
p. 14

Picture Coding
• I Picture: no interframe prediction
• P Picture: interframe prediction from one casual reference picture
• B Picture: interframe prediction from one previous and one future
picture

GOP and picture order
• display order (input at encoder)
I1
B1
B2
P1
B4
B5
P2
B6
B7
I2
• Transmission order (Encoding/decoding order)
I1
P1
B1
B2
P2
Heejune AHN: Image and Video Compression
B4
B5
I2
B6
B7
p. 15
MPEG-2

Major target application


Digital television quality (720x576/480, 25/30 fps) at 3 ~ 4Mbps
Interlaced video support


Frame picture vs field picture : motion compensation unit
Frame DCT vs field DCT in frame picture
field picture
field picture
frame picture
Frame DCT
Heejune AHN: Image and Video Compression
Field DCT
p. 16

Scalability Support

Spatial scalability
• Low resolution at Base layer and high resolution at Enhancement layer
• BL is used for prediction of EL
• E.g. SD resolution at BL, HD resolution at EL

Temporal scalability
• 30 fps at BL, 60 fps at EL

SNR scalability
• Same resolution but different quality

Data partitioning
• Coding Data is packed into different stream
BL bit stream
BL Dec
BL Enc
Lower
Quality
down
Input video
EL Enc
EL bit stream
Heejune AHN: Image and Video Compression
EL Enc
Higher
Quality
p. 17

Profile & Level


MPEG-2 has many options; all implementation do not needs all of them
Profiles
•
•
•
•
•
•

Simple : 4:2:0 input, I and P picture only, low complexity & low perf.
Main : 4:2:0 input, I,P,B Picture, interlaced
4:2:2 : 4:2:2 input (same vertical resolution of color)
SNR : SNR scalable
Spatial : Spatial scalable
High : Spatial and 4:2:2
Level
• Low (352x288), Main(720x576), High 1440 (1440x1152), High (1920x1152)

E.g.
• MPEG-1 : Main profile & Low Level
• SD DTV, DVD : Main profile & Main Level
• HDTV : Main profile & High Level (Historically MPEG-3’s target application)
Heejune AHN: Image and Video Compression
p. 18
MPEG-4

Features


Support for low bit rate (from 20 Kbps)
Support for object based coding
• Reuse of components, composition, and interactivity support.


In practice, object based is not well used
Object-based Coding



Video Object
Shape Coding : transparent/opaque region, binary or grey scale
Texture coding with arbitrary shape
• DCT after zero filling in interblock and exrapolation in Intrablock
VO3
VO1
VO2
Heejune AHN: Image and Video Compression
p. 19

Visual data structure
: 비쥬얼 화면열 (VS : visual seguence/video session)
VS
: 비디오 객체
(video object)
2차원/3차원 합성객체
(synthetic object)
VO1
VO2
VOL1
VOL2
: 비디오 객체 계층 (VOL : video object layer)
GOV1
GOV2
: 비디오 객체 화면모음 (GOV : group of VOP)
VOP1
VOP2
: 비디오 객체 화면 (VOP : video object plane)
MB
: 대블록 (MB : macro block)
Heejune AHN: Image and Video Compression
p. 20
H.261


ITU Mostly focus on real-time communication
H.261


First video coding std(1990)
N-ISDN (1990’s)
• px64Kbps (p=1,..30), typically 64 ~ 384kbps
• Circuit network based: low delay, reliable

H.261 key features




YCbCr420 CIF, QCIF input
MC-DCT
Integer-pel motion
Optional loop filter (for deblocking)
• Filtering at 8x8 block boundary

FEC used
Heejune AHN: Image and Video Compression
p. 21
H.261 syntax structure

H.261 Bit structure
CIF
QCIF
PSC
176
352
화면
화면 층
1
2
1
3
4
3
5
6
5
8
9
10
11
12
PTYPE
PEI
PSPARE
GOB 층
GQUANT
GEI
GSPARE
대블럭 층
144
288
7
TR
GOB (Group of block) 층
GBSC
GN
대블럭(Macro block) 층
GOB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
MVD
MBA
MTYPE
MQUANT
MVD
CBP
블럭 층
CBP
Y
대블럭
Cb
Cr
MBA
채워넣기
16
16
가변길이 부호
8
블럭
고정길이 부호
블럭 층
TCOEFF
EOB
8
Heejune AHN: Image and Video Compression
p. 22
H.263

H.263 Versions

Version 1 (1995)
• Improvement to H.261
• 4 optional modes

Version 2 (2000, H.263+)
• 12 optional modes

Version 3 (2002, H.263++)
• 19 optional modes

Key Features



Targets to 20 kbps and for packet based network also
Half-pel prediction
Redesigned 3-D VLC code
Heejune AHN: Image and Video Compression
p. 23

H.263 Optional Modes












Annex D: Unrestricted motion vectors
Annex E: Syntax-based arithmetic coding
Annex F: Advanced Prediction
Annex G: PB Frames
Annex I : Advanced Intra Coding
Annex J: Deblocking Filter
Annex K: Slice Structured Mode
Annex L: Supplemental enhancement information
Annex M: Improved PB frames
Annex N: Reference Picture Selection
Annex O: Scalability
Annex P: reference picture resampling
Heejune AHN: Image and Video Compression
p. 24








(continued)
Annex Q: Reduced resolution update
Annex R: Indepenedent Segment Decoding
Annex S: Alternative inter VLC
Annex T: Modified Quantization
Annex U: Enhanced reference picture selection
Annex V: Data partition slice
Annex W: Additional supplemental enhancement information
Heejune AHN: Image and Video Compression
p. 25

Performance
Heejune AHN: Image and Video Compression
p. 26
H.264

Name




ITU H.264 = ISO MPEG-4 Part 10/AVC
H.26L : Long term enhancement, not compatible H.263
Now accepted in DMB-T/S, IPTV, replacing many MPEG-2 solutions
For 50% gain to H.263+
Heejune AHN: Image and Video Compression
p. 27

Key features



Smaller processing units (upto 4x4 pixel block)
Intra prediction
Inter prediction
• Macroblock based Interframe prediction selection
• ¼ pixel motion vector support
• Motion vector options for subblocks




4x4 Integer DCT
Deblocking filter
Universal VLC
CAVAC (content-based adaptive binary arithmetic coding)
Heejune AHN: Image and Video Compression
p. 28
Intra-frame Prediction

luma
- 4x4:
9 modes
M A B C D
I
J
K
L
M A B C D
I
J
K
L
M A B C D
I
Mean
J
(A-D,
K
I-M)
L
M A B C D E F G H
I
J
K
L
- 16x16: 4 modes
H

chroma
- 8x8:
……..
V
……..
V
H
H
V
Mean
(H, V)
…
H
V
4modes
H
Mean
(H, V)
V
……..
V
H
H
H
V
……..
V
- The same prediction mode is always applied to both chroma blocks
Heejune AHN: Image and Video Compression
p. 29
Inter-frame Prediction
H.264
Permits up to 15 (2 mostly
used) reference pictures
 Bi-predictive B-slices
 A P-slice may reference a
picture that has B-slices
 Supports explicit weighting
coefficients and (a+b)/2 type


Tree-structured (16x16  16x8,
8x16, 8x8  8x4, 4x8, 4x4)

half or ¼-pixel accuracy
 6-point interpolation for halfpixel and 2-point linear
interpolation for ¼-pixel


References
Block Sizes

Motion
Estimation
MPEG-1/2/4, H.261/3
Heejune AHN: Image and Video Compression
A P-slice
references only
one I-picture
 Bi-directional
B-slices
I
B P
Only permit (a+b)/2 type
prediction weighting

Either 16x16 or 8x8
MPEG2 permits half-pixel
accuracy and MPEG4 permits
¼-pixel accuracy
2-point linear interpolation
p. 30
Heejune AHN: Image and Video Compression
p. 31
Transform and Quantization
Integer DCT


No encoder decoder mismatch
Three types of transform followed by quantization

- Type 1: for the 4x4 array of luma DC coefficients in intra MBs predicted in 16x16 mode # -1
- Type 2: for the 2x2 array of chroma DC coefficients #16-17
- Type 3: for all other 4x4 blocks # 0-15, 18-25
-1
( 16x16 Intra
Mode only)
16
17
4 pixels
5
2
3
6
7
8
9
12
13
10
11
14
15
Heejune AHN: Image and Video Compression
18
19
20
21
22
23
24
25
4 pixels
4
4 pixels
4 pixels
1
4 pixels
0
4 pixels
*Data is transmitted in the numbered order
p. 32
Transform and Quantization

4×4 DCT ( X – Input, Y – output)

4×4 integer transform
- forward
- backward
W
Post-scaling factor (PF)
1
2
with a  , b 
2
5
Heejune AHN: Image and Video Compression
p. 33
Entropy Coding
Parameters to be coded
entropy_coding_mode=
0
entropy_coding_mode=
1
Macroblock type (Intra/Inter)
Coded block pattern
Quantizer parameter
Reference frame index
Exponential Golomb
codes (Exp_Golomb)
Variable Length Coding
(VLC)
Context-based Adaptive
Binary Arithmetic
Coding (CABAC)
Motion vector
Residual data
Context-adaptive
variable length coding
(CAVLC)
Heejune AHN: Image and Video Compression
p. 34
Deblocking Filters

A boundary-strength (BS) parameter
is assigned to every 4×4 block


Block modes and conditions
BoundaryStrength
parameter
(BS)
BS = 0
No filtering
BS = 1-3
Slight filtering
BS = 4
Strong filtering
Filters only when



One of the blocks is intra-coded and the
edge is a MB edge
4
One of the blocks is intra-coded
3
One of the blocks has coded residuals
2
Difference of block motion ≥ one luma
sample distance
1
Motion compensation from different
reference frames
1
Else
0
P3

Heejune AHN: Image and Video Compression

|P0-Q0|< α
|P1-P0|< β
|Q1-Q0|< β
P2
P1
P0
Q0
Q1
Q2
Q3
Thresholds α and β depend on
the average quantization
parameter (QP)
The deblocking filtering accounts
for 1/3 of the computational
complexity of a decoder.
p. 35
Network Adaptation


VCL & NAL

VCL (video coding layer)

NAL (network adaptation layer)
Error Resilient Tools

Flexible macroblock ordering (FMO)
• Allows to assign MBs to slices In an order other than
scan order

Arbitrary slice ordering (ASO)
• Improved end-to-end delay in real-time applications

Redundant slices (RS)
• Redundant representations are coded using different
coding parameters
Slice Group #0
Slice Group #1
Heejune AHN: Image and Video Compression
p. 36
Profile & Level

Main application




Baseline : Video telephony
Main : DTV and Storage
Extended :Streaming
Profile & tools
Heejune AHN: Image and Video Compression
p. 37
Performance comparison
Heejune AHN: Image and Video Compression
p. 38
Contributions of the VCL Tools
Spatial Prediction for Intra-coded
Macroblocks
Saves 6-9% bits
Temporal Prediction
Saves around 50% bits
Transforms
PSNR less than 0.02dB
Logarithmic Quantization
A change in step size by
12% also saves 12% bits
CAVLC
Saves 5-8% bits
CABAC
Saves 5-15% bits over
CAVLC
Picture-adaptive frame/field (PAFF) coding
Saves 16%-20% bits
MB-adaptive frame/field (MBAFF) coding
Saves 14-16% bits over
PAFF
Deblocking Filter
Saves 5-10% bits
Heejune AHN: Image and Video Compression
p. 39
Conclusion

Many video coding standards



St’ds reflect Coding Technology and Implementation Technology
Coding performance has improved over 4 times since H.261 (1990)
What’s next





SVC (Scalable Video Coding) in H.264 (done)
H.264ext (further improvement of H.264)
3-D and MVC (Multi-View Coding) is on going.
UDTV (ultra Definition TV: 3840x2160)
And what’s next?
Heejune AHN: Image and Video Compression
p. 40