Xilinx XC4000 FPGA devices - Department of Electronics

Download Report

Transcript Xilinx XC4000 FPGA devices - Department of Electronics

Chapter III
Video Coding
By Theerayod Wiangtong, PhD., DIC
Mahanakorn University of Technology
1
Part I: Introduction
2
What is “Video”?
• General speaking, “Video”
refers to pictorial (visual)
info, including still images &
time varying images (aka
image sequences)
• In most references, “video” refers to
time-varying images unless otherwise
is specified
3
Video Signal
4
Digital Video
• What is digital video?
It refers to the capturing, manipulation and storage of video
in digital formats. A digital video (DV) camcorder, for
example, is a video camera that captures and stores
images on a digital medium.
• What is different than analog video?
– The way the signal is transmitted or stored
– Every picture element (pixel) is represented by a
sequence of 0’s and 1’s
5
How is it defined by?
•
•
•
•
Frame rate
Vertical resolution
Horizontal resolution
Bit-depth
6
What is so special about it?
•
•
•
•
•
•
•
Less subject to noise
Better quality than analog
Computer-friendly
Fast, small, reusable media, rather than videotapes.
Allows advanced editing and processing
Allows repeated reproduction without losing quality
Allows better compression and encryption schemes
7
How to obtain it?
• Directly using a digital video camera
• Obtaining it from an analog source (sampling
and quantizing a raster scan)
8
Color Models
• In the RGB color model, the three colors are
equally important
• The human visual system is less sensitive to
color than to luminance (brightness)
• The luminance/chrominance representation
reduces the bandwidth requirements of the
video signal
• It ensures the compatibility with monochrome
TV systems
• YCbCr is the adopted color model in most of
the video coding standards
9
Digital Video Compression
• What? Reducing the quantity of data used to
represent video content without excessively
reducing the quality of the picture. It also reduces
the number of bits required to store and/or
transmit digital media. temporal
300fps
• Why?
30fps
1M
10M
spatial
8bpp
32bpp
Bit-depth
10
The Need for Video Compression
• High-Definition Television (HDTV)
– 1920x1080
– 30 frames per second (full motion)
– 8 bits for each three primary colors (RGB)
Total 1.5 Gb/sec!
• Cable TV: each cable channel is 6 MHz
– Max data rate of 19.2 Mb/sec
– Reduced to 18 Mb/sec w/audio + control …
Compression rate must be ~ 80:1!
11
Video Coding
Encoded Video
“Bitstream”
Encoder
Original
Video
Storage or
Transmission
Medium
Decoder
Reconstructed
Video
12
Layers of DTV Standard
Transmitter Process
Receiver Process
Production
PICTURE FORMATS
Display
Compression
COMPRESSION
Decompression
Packetizing
TRANSPORT
Depacketizing
Modulation
TRANSMISSION
Demodulation
13
Typical Video Sequences
Sequence
Type
Resolution
Frame Rate
Description of the first ten frames
Type of motion
Foreman
QCIF
176×144
30 fps
A man talking to a still camera
Slow limited motion
Foreman
CIF
352×288
30 fps
A man talking to a still camera
Slow limited motion
Football
CIF
352×288
30 fps
A part of a football game
Extensive motion
Mobile
CIF
352×288
30 fps
A train moving horizontally and a
calendar moving vertically
City
SD(4CI
F)
704×576
30 fps
A scene of a city taken with a panning
camera
Regular motion
Shields
HD(72
0p
)
1280×720
60 fps
A person pointing at a group of shields
while the camera is panning
Camera shooting of
highly textured
scenes
Random multidimensional motion
14
Typical Video Sequences
QCIF
CIF
720p
SDHD
4CIF
15
Typical Transmission and
Storage Capacity
Media/Network
Capacity
Ethernet LAN (10 Mbps)
Max. 10 Mbps; Typical 1-2 Mbps
ADSL
Typical 1-2 Mbps (downstream)
ISDN-2
128 kbps
V.90 modem
56 kbps downstream; kbps upstream
DVD-5
4.7 Gbytes
CD-ROM
640 Mbytes
16
Subjective Video Quality
• A subjective characteristic of video quality. It is
concerned with how video is perceived by a
viewer and designates his or her opinion on a
particular video sequence.
17
Subjective Video Quality
• Mathematical models that successfully emulate
the subjective quality assessment results, based
on criteria and metrics that can be measured
objectively.
MSE 
2
[
f
(
i
,
j
)

F
(
i
,
j
)]

i , j
N M
(2 n  1) 2
PSNR  10 log10
MSE
18
R-D Performance
Better
Performance
19
Part II: Video Coders
20
Video Compression Fundamental
• Compression using different profiles and levels
through
– Spatial redundancy
– Temporal redundancy
• A profile defines a set of coding tools or algorithms
that can be used in generating a compliant
bitstream.
• Within each profile, there are a number of levels
designed for a wide range of applications, bitrates,
resolutions, qualities, and services. A level places
constraints on certain key parameters of the
bitstream.
21
Levels and Profiles
22
Types of Video Redundancy
•
•
•
•
Spatial
Temporal
Psycho-visual
Statistical
Intra-frame Coding
Inter-frame Coding
Use less bits for chroma
Entropy Coding
23
Spatial Redundancy
• Take advantage of similarity among most
neighboring pixels
24
Loss of Resolution
Original (63 kb)
Low (7kb)
Very Low (4 kb)
25
Temporal Redundancy
• Take advantage of similarity between
successive frames
950
951
952
26
Chroma Sub-sampling
Y sample
Cb sample
Cr sample
4:4:4
4:2:2
4:2:0
27
Types of Coding
• Intra-frame coding
– Review of JPEG
• Inter-frame coding
– Conditional Replenishment (CR) Coding
– Motion Compensated Predictive (MCP) Coding
• Object-based and scalable video coding
– Motion segmentation, scalability issues
28
28
A Tour of JPEG Coding Standard
Components
 Transform
-8×8 DCT
-boundary padding
 Quantization -uniform quantization
-DC/AC coefficients
 Coding
-Zigzag scan
-run length/Huffman coding
29
29
JPEG: Spatial Redundancy Reduction
“Intra-Frame
Encoded”
Quantization
• major reduction
• controls ‘quality’
Zig-Zag Scan,
Run-length
coding
30
Step 1: Transform
• DC level shifting
183
183

179
177
178

179
179

180
160
94
153 116
168 171
177 179
178 179
180 180
179 180
179 181
153 194 163 132 165
176 187 166 130 169
182 179 170 131 167 
177 179 165 131 167 
176 182 164 130 171

179 183 169 132 169
182 183 170 129 173

179 181 170 130 169
-128
55
55

 51
49
50

 51
 51

 52
36  34 25 66 35 4 37
25  12 48 59 38 2 41
40 43 54 51 42 3 39 
49 51 49 51 37 3 39 
50 51 48 54 36 2 43

52 52 51 55 41 4 41
51 52 54 55 42 1 45

51 53 51 53 42 2 41
• 2D DCT
55
55

 51
49
50

 51
 51

 52
36  34 25 66 35 4 37
25  12 48 59 38 2 41
40 43 54 51 42 3 39 
49 51 49 51 37 3 39 
50 51 48 54 36 2 43

52 52 51 55 41 4 41
51 52 54 55 42 1 45

51 53 51 53 42 2 41
DCT
 313
 38

  20
  10
 6

 2
 4

 3
56
 27 18 78  60
27
 27
 17
8
1
13 44 32  1
10 33 21  6
9 17
9  10
6 4
3 7
 24
 16
 13
5
3
4
0 3
1  2
7 4
9 0
0
2
1
0
4
 2 1
3
 27
 10 
9 
1 
5 

3 
4 

1 
31
31
Step 2: Quantization
16
12

14
14
18

24
49

72
Q-table
 313
 38

  20
  10
 6

 2
 4

 3
11
10 16
12 14 19
13 16 24
17 22 29
22 37 56
35 55 64
64 78 87
92 95 98
56
 27 18 78  60
27
 27
 17
8
1
13 44 32  1
10 33 21  6
9 17
9  10
6 4
3 7
 24
 16
 13
5
3
4
0 3
1  2
7 4
9 0
0
2
1
0
4
 2 1
3
51 61 
26 58
60 55 
40 57
69 56 
51 87
80 62 
68 109 103 77 

81 104 113 92 
103 121 120 101

112 100 103 99 
 27
 10 
9 
1 
5 

3 
4 

1 
24 40
Q
 20
 3

 1
 1
 0

 0
 0

 0
Why increase
from top-left to
bottom-right?
 3 1 3  2 1 0
 2 1 2 1 0 0 0
1 1 1
1 0 0 0
0
0 1
0 0 0 0
0
0 0
0 0 0 0

0
0 0
0 0 0 0
0
0 0
0 0 0 0

0
0 0
0 0 0 0
5
32
32
Step 3: Entropy Coding
 20
 3

 1
 1
 0

 0
 0

 0
 3 1 3  2 1 0
 2 1 2 1 0 0 0
1 1 1
1 0 0 0
0
0 1
0 0 0 0
0
0 0
0 0 0 0

0
0 0
0 0 0 0
0
0 0
0 0 0 0

0
0 0
0 0 0 0
5
Zigzag Scan
(20,5,-3,-1,-2,-3,1,1,-1,-1,
0,0,1,2,3,-2,1,1,0,0,0,0,0,
0,1,1,0,1,EOB)
End Of the Block:
Zigzag Scan
All following coefficients
are zero
33
33
Entropy Coding (e.g.)
• Mainly concerned with representation of
information
Symbol
Occurrence
probability
Code 1
Code 2
a1
a2
a3
a4
a5
0.1
0.2
0.5
0.05
0.15
000
001
010
011
100
0000
01
1
0001
001
34
Entropy Coding
• Lavg,1 = 3 bits per symbol
• Lavg,2 = 4x0.1+2x0.2+1x0.5+4x0.05+3x0.15
= 1.95 bits per symbol
• Code 2 with variable-length coding is more
efficient than code 1 with natural binary coding.
35
Importance of Coeff’s
36
Types of Coding
• Intra-frame coding
– Review of JPEG
• Inter-frame coding
– Conditional Replenishment (CR) Coding
– Motion Compensated Predictive (MCP) Coding
• Object-based and scalable video coding
– Motion segmentation, scalability issues
37
37
Conditional Replenishment
• Based on motion detection rather than motion
estimation
• Partition the current frame into “still areas” and
“moving areas”
– Replenishment is applied to moving regions only
– Repetition is applied to still regions
• Need to transmit the location of moving areas as
well as new (replenishment) information
– No motion vectors transmitted
38
38
Conditional Replenishment
39
39
Motion Detection
40
40
From Replenishment to
Prediction
• A more powerful approach of exploiting temporal
dependency is prediction
– Locate the best match from the previous frame
– Use the history to predict the current
41
Motion-Compensated Predictive Coding
Residual and Motion vector
42
A Closer Look
Frame n-1
Frame n
43
MCP Key Components
Motion Estimation/Compensation
– At the heart of MCP-based coding
Coding of Motion Vectors (overhead)
– Lossless: errors in MV are catastrophic
Coding of MCP residues
– Lossy: distortion is controlled by the quantization
step-size
44
44
Motion Estimation and
Compensation
• Predict the current frame based on reference
frames while compensating for the motion.
• For each motion compensation block
– Find the block in the reference decoded frame that
gives the least distortion.
• If the distortion is too high then code the block
independently. (intra block)
• Otherwise code the difference (inter block)
45
Motion Compensation
If motion estimation was effective
– little data left in difference macroblock
– more efficient compression.
46
MCP Key Components
Motion Estimation/Compensation
– At the heart of MCP-based coding
Coding of Motion Vectors (overhead)
– Lossless: errors in MV are catastrophic
Coding of MCP residues
– Lossy: distortion is controlled by the quantization
step-size
47
Motion Vector
48
Block-based Motion Model
• Block size
– Fixed vs. variable
• Motion accuracy
– Integer-pel vs. fractional-pel
• Number of hypothesis
– Overlapped Block Motion Compensation (OBMC)
– Multi-frame prediction
49
49
Block-based Motion Model
Reference Frame
Current Frame
MV
Search Window
50
Motion Vector Coding
• 2D lossless DPCM
– Spatially (temporally) adjacent motion vectors are
correlated
– Code Motion Vector Difference (MVD) instead of MVs
• Entropy coding techniques
– Variable length codes (VLC)
– Arithmetic coding
51
51
MCP Key Components
Motion Estimation/Compensation
– At the heart of MCP-based coding
Coding of Motion Vectors (overhead)
– Lossless: errors in MV are catastrophic
Coding of MCP residues
– Lossy: distortion is controlled by the quantization
step-size
52
MCP Residue Coding
Transform
Quantization
Coding
Conceptually similar to JPEG
Transform: Unitary transform
Quantization: Deadzone quantization
Coding: Run-length coding
53
53
Transform
• Unitary matrix: A is real, A-1=AT
• Unitary transform: A is unitary, Y=AXAT
• Examples
– 8-by-8 DCT
– 4-by-4 integer transform
1
1
1 1
2 1  1  2

A 
1  1  1 1 


1

2
2

1


54
54
Types of Coding
• Intra-frame coding
– Review of JPEG
• Inter-frame coding
– Conditional Replenishment (CR) Coding
– Motion Compensated Predictive (MCP) Coding
• Object-based and scalable video coding
– Motion segmentation, scalability issues
55
55
Object-based Video Coding
• Waveform-based coding discussed so far uses a
simple source model (e.g., H.261/263/264, MPEG-1/-2)
– Does not consider the semantic content (e.g. objects and
their shape) of the video
• Object-based video coding identifies objects (or
regions) in a video and encodes them. Potential
benefits may include
–
–
–
–
Improved coding efficiency
Improved visual quality (e.g., no blocking artifacts)
Content description
Content-based interactivity
• Also called “content-dependent video coding”
56
Essential Tasks in Object-based
Video Coding
• Object/region segmentation
– Separate pixels based on their color, texture, motion
characteristics
– The major challenge in content/object-based coding
• 2D shape modeling and coding
– Not all shapes are equally probable
• 2D texture modeling and coding
– Extension of existing block-based MCP into regionbased
– Shaped DCT
57
Motion-based Segmentation
• Motion-based segmentation: to segment an
image using motion information
– We can first estimate the motion field and then
segment the motion field
+
58
2-D Shape Modeling and Coding
• Bitmap coding: a binary map specifying whether
or not a pixel belongs to an object
– A special case of the general alpha-map
• Contour coding: code only the contour of the
object or the region
– Chain codes
– Polygon approximation
– Spline approximation
59
2-D Texture Modeling and Coding*
• Shape-adaptive DCT
=
&
• Shape-adaptive wavelet transform
60
Part III: Video Compression Standard
61
History and Naming
62
ITU organization with subgroups
relevant for video
63
IEC and ISO
64
ISO/IEC organization with subgroups
relevant for video
International
Electrotechnical
Comission
International Organization
for Standardization
65
Video Compression Quality Road Map
66
Part IV: MPEG
67
Familiar things (1990-1995)
•
MPEG-1: “Coding of moving pictures and associated audio for digital
storage media” (1992)
– Target was VHS Quality at 1.5MBits/s
– Basis of Video-CD
– MP3 is still with us! (MPEG-1 Layer 3)
68
Familiar things (1995-2000)
•
MPEG-2: “Generic coding of Moving
Pictures and Associated Audio”
–
–
–
•
Broadcasting and storage
Bitrates: 4-9 MBits/s
Satellite TV, DVD
MPEG-3?
–
–
–
Aimed to do High Definition TV (HDTV)
MPEG-2 could do that anyway
Folded into MPEG-2
•
MPEG-4: “Coding of audio-visual objects”
– Started as very low-bitrate project
– Turned out to be much more:
•
•
•
•
•
Coding of media objects
64kbps to 240Mbps (Part 10/H.264)
Synthetic/Semi-synthetic objects
XMT: Like HTML, but to build videos
First standard with Intellectual Property
Management
69
Present & Future (2000-2010)
•
MPEG-4 Part 10: Advanced Video
Coding / H.264
– Designed by a Joint MPEG and
ITU-T group
– Claims 50% bitrate savings to
MPEG-2, 30% over MPEG-4!
•
H.265 aims to have 50% better
compression
•
MPEG-7: “Multimedia Content
Description Interface” (2001)
– Describing audio/video
•
Applications
– Indexing of video databases
– Search & Retrieval
– Browsing
70
MPEG-1 = JPEG + Motion Prediction + Rate Control
•
Early motivation: to encode motion video at 1.5Mbits/s for transport over T1
data circuits and for replay from CD-ROM
•
•
Defines the decoder but not the encoder
Frames (pictures)
– Intra-coded using JPEG
– Inter-coded using (interpolated)
motion estimation & compensation
and JPEG for the residuals
Predicted and Bi-directional
•
MacroBlocks (MBs)
– 16×16 pixels block
•
Rate control
– buffer at each end
– Test Model 5 (TM5)
71
MPEG-1 – Motion Prediction
•
Motion prediction = motion estimation + error compensation
72
Group of Pictures
• I frames are independently encoded
• P frames are based on previous I, P frames
– Can send motion vector plus changes
• B frames are based on previous and following I and P frames
– In case something is uncovered
• Group of Pictures (GOP): Starts with an I-frame till ends with
frame right before next I-frame
73
Typical Compress. Performance
Type Size Compression
--------------------I
18 KB
7:1
P
6 KB
20:1
B
2.5 KB
50:1
Avg 4.8 KB
27:1
---------------------
Note, results are Variable Bit Rate,
even if frame rate is constant
74
MPEG-2 = MPEG-1 + ?
• Improvements
– Color space: could support 4:2:2 and 4:4:4 coding
– Quantization: could have 9- or 10- bit precision for DC coefficients,
also Improved coding efficiency by different quantization, VLC tables
– Concealment motion vectors: used when an intra-MB is lost
– Pan and Scan: supports display of different aspect ratios, e.g., 16:9
• Profiles and levels
– Profiles: define the tools or syntactical elements
– Levels: define the permissible ranges of parameters
• Interlace tools
• Scalable coding profiles
• System layer: define two bit stream constructs
– Program stream (PS): modeled on MPEG-1 (backward compatibility)
– Transport stream (TS): more robust, does not need a common time
base, designed for use in error-prone environment.
75
MPEG-2 – Interlace Tools
•
•
Interlaced Scanning: Image flicker is less apparent because the image is
painted twice as many times as what is in non-interlaced scanning.
Frame Pictures and Field Pictures
–
•
two fields are processed sequentially or not
Frame DCT and Field DCT
–
Field pictures usually use field DCT
–
•
Frame pictures use field DCT when
there is obvious vertical motion
Frame Prediction and Field Prediction
Frame DCT
Field DCT
76
MPEG-4 = MPEG-2+Objects+Other
Enhancements
• Objects (optional)
– Video (texture + shape), image, audio, speech, text, etc.
– Encoded using different techniques
– Transmitted independently
• Improvements in MPEG-4 version2
– Global motion compensation (GMC)
– Quarter pixel motion compensation
– Shape-adaptive DCT
• Why is MPEG-4 not a success as MPEG-2?
– Not substantially better than MPEG-2
– Issue of licensing
77
MPEG4 Coder
78
Shape Adaptive DCT
79
MPEG Comparison
R-D Performance of MPEG Codecs
50
48
46
PSNR (Y)
44
42
40
38
36
34
32
350
450
550
650
750
850
950
1050
Bit rate (kbps)
MPEG-1
MPEG-2
MPEG-4
H.264
80
Part V: ITU-T Rec.
81
ITU-T Rec. H.261
• International standard for ISDN picture phones and for video
conferencing systems (1990)
• Image format: CIF (352 x 288 Y samples) or QCIF (176 * 144 Y
samples), frame rate 7.5 ... 30 fps
• Bit-rate: multiple of 64 kbps (= ISDN-channel), typically 128 kbps
including audio.
• Picture quality: for 128 kbps acceptable with limited motion in the
scene
• Stand-alone videoconferencing system or desk-top
videoconferencing system, integrated with PC
• Sampling format: 4:2:0
82
ITU-T Rec. H.263
• International standard for picture phones
over analog subscriber lines (1995)
• Image format usually CIF, QCIF or SubQCIF, frame rate usually below 10 fps
• Bit-rate: arbitrary, typically 20 kbps for
PSTN
• Picture quality: with new options as
good as H.261 (at half rate)
• Software-only PC video phone or TV
set-top box
• Widely used as compression engine for
Internet video streaming
• H.263 is also the compression core of
the MPEG-4 standard
83
H.261 vs. H.263
• Improved motion compensation
– H.261 (1990): integer-pel accuracy, loop filter, 1 motion vector per
MB
– H.263 (1995): half-pel accuracy, no loop filter, 1 motion vector per
MB
•
•
•
•
Improved 3-D VLC for DCT coefficients, (last, run, level)
Reduced overhead
Support more picture sizes and sampling formats
More optional features in H.263++. (H.263 as of 2001)
84
Performance of H.263 and H.261
85
H.264/AVC
• H.264, a.k.a. MPEG-4 Part 10, is a digital video codec standard
• Written by the ITU-T Video Coding Experts Group (VCEG) and the
ISO/IEC Moving Picture Experts Group (MPEG) as the product of a
collective partnership effort known as the Joint Video Team (JVT).
• The ITU-T H.264 standard and the ISO/IEC MPEG-4 Part 10
standard (formally, ISO/IEC 14496-10) are technically identical, and
the technology is also known as AVC, for Advanced Video Coding.
• Goals:
– good video quality at bit rates that are substantially lower (e.g., half or
less) than what previous standards without excessive complexity
– applied to a very wide variety of applications (e.g., for both low and high
bit rates, and low and high resolution video) and
– to work well on a very wide variety of networks and systems (e.g., for
broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia
telephony systems).
http://en.wikipedia.org/wiki/H.264
86
Possible applications of H.264
– Conversational services operated
below 1Mbps with low latency.
• ISDN-based H.320, H.324, H.323
– Entertainment services operated between 1-8+ Mbps with
moderate latency such as 0.5-2s in modified MPEG2/H.222.0 systems.
• Broadcast via satellite, cable, terrestrial or DSL
• DVD for standard and high-definition video
• Video-on-demand via various channels
– Streaming services operated at 50-1500kbps with 2s or
more of latency.
87
Part VI: Comparisons
88
Comparisons
H.264
Prediction in
space domain
• Spatial prediction
• Encode the prediction modes
(Use predictive coding if 4x4
modes are used)
• No spatial prediction
• Integer transform of residue
• 8x8 Discrete Cosine
Transform (DCT) for pixel
values
• Quantization including scaling
• Quantization
• No coefficient prediction
• Coefficient prediction (for
DC values in MPEG-2 and
AC values in the first row
and column in MPEG-4)
Transform
Quantization
Prediction in
frequency
domain
MPEG-1/2/4, H.261/3
89
Comparisons
H.264
References
Block Sizes
Motion Estimation
MPEG-1/2/4, H.261/3
• Permits up to 15 (2 mostly
used) reference pictures
• Bi-predictive B-slices
• A P-slice may reference a
picture that has B-slices
• Supports explicit weighting
coefficients and (a+b)/2 type
• A P-slice
references only
one I-picture
• Bi-directional
B-slices
• Tree-structured (16x16 
16x8, 8x16, 8x8  8x4, 4x8,
4x4)
• Either 16x16 or 8x8
• half or ¼-pixel accuracy
• 6-point interpolation for halfpixel and 2-point linear
interpolation for ¼-pixel
• MPEG2 permits half-pixel
accuracy and MPEG4
permits ¼-pixel accuracy
•2-point linear interpolation
I
B P
• Only permit (a+b)/2 type
prediction weighting
90
Comparison: PSNR
• H.264 improves the coding efficiency over other coding
methods
91
Window Media Codec
• Microsoft formed the codec team in 1994 and
shipped multiple codecs in Windows Media Player
– Early involvement with MPEG4
– Until now: VC-1
• Windows Media Video 9 codec released in
January 2003
• Widely integrated with various Microsoft products
– Windows Media Encoder, Movie Maker, Media Player,
Office communicator/Instant messenger, Media Center,
Windows CE, Xbox, MSN Video, MS IPTV etc
92
Scope of VC-1
• VC-1 is an 8 bit 4:2:0 format
– Both interlace and progressive modes supported
• WMV-9 is the Microsoft implementation of VC-1
• VC-1 technology is scalable over a wide range
of applications
– Next-generation HD DVD (7-15 Mbps)
– Video over IP / VOD (0.3-2 Mbps)
– “Internet” video and wireless (<30-500 kbps)
Why VC-1?
• Video quality / bit efficiency
– 3x better than MPEG-2
– 2x better than MPEG-4
• SD (D1) quality @ 1  2 Mbps
• HD (720/1080) quality @ 5  12 Mbps
– Equivalent performance to H.264
• Low Complexity – processor efficient algorithm
–
–
–
–
Only around 40% more complex to decode than MPEG-4 simple profile ¶
Around half as complex to decode as H.264
1080p software decoding possible on today’s PCs
Reduced power consumption for portable devices
VC-1 Architecture
•
VC-1 uses block motion compensation & block transform
– Basic architecture is similar to MPEG2, MPEG4, H.264 etc, but details
are very different
– I, P and B frames
•
Important VC-1 Components
–
–
–
–
–
–
–
Motion compensation
Transform
Entropy coding
Loop filter
Intra coding
Interlace coding
Post processing
BITSTREAM
PARSING
Decoding Process Block Diagram
for VC-1 Advanced Profile
CONFORMING
IMPLEMENTATION
VLC
DECODE
COEFFS
AC/DC
PRED
INVERSE
QUANT
INVERSE
TRANS
OVERLAP
SMOOTH
LOOP
FILTER
RANGE
MAP
DECODED
FRAME
IMPLEMENTATION
SPECIFIC
DERING/
DEBLOCK
DISPLAY
PROCESS
INTRA
INTER
VLC
DECODE
COEFFS
INVERSE
QUANT
COLOR
CONV
INVERSE
TRANS
+
RESIZE
MOTION COMP
VLC
DECODE
MV
MV PRED
Etc.
½ or ¼ pel
interpolation
1 MV /4 MV
INTENSITY
COMP
DECODED
PICTURE
BUFFER
Quality Comparison
Complexity of VC-1
• VC-1 decoding is ~ 1.8x faster than H.264/AVC decoding
– H.264/AVC ~ 3x slower than MPEG-2 MP / MPEG-4 SP
• Based on H.264’s own studies
– VC-1 ~ 1.6x slower than MPEG-2 MP / MPEG-4 SP
• Based on empirical data, e.g., testing VC-1 (WMV-9) vs. MPEG-4 SP on x86,
ARM based systems
• 1080p VC-1 software decoding possible on today’s PCs
– DVD companion discs with 720/1080p
Sequence
Millions of ARM cycles/second
VC-1 Main
H.264/AVC Baseline
(Optimized by Nokia)
Foreman
27
38
News
17
22
Container
19
24
Silent
18
25
Glasgow
25
30
Average
21.2
27.8
Experimental data from 3GPP