ECE-C490 Winter 2000 Image Processing Architecture
Download
Report
Transcript ECE-C490 Winter 2000 Image Processing Architecture
ECEC 453
Image Processing Architecture
Lecture 12, 3/3/2004
MPEG and Friends
Oleh Tretiak
Drexel University
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 1
Lecture Outline
Review of Teleconferencing
Advanced Video Coding
Computational cost of video
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 2
Picture of Layers
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 3
Video Compression: Picture Types
Group of Pictures: Three types
Lecture 12
I — intraframe coding only
P — predictive coding
B — bi-directional coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 4
Teleconferencing Standards
Digital video areas
Lecture 12
Broadcast television
Recorded programs
Two-way communications
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 5
Review: Action in the Video Arena
The sponsors: ITU/T SG 15 and ISO/IEC MPEG
The players: H.x standards and MPEG-x standards
Standards, ITU-T (Telecom Guys)
H.261 (1990)
H.263 (draft March 1995)
New standards in the works
Standards, ISO/IEC (Entertainment Video)
Lecture 12
MPEG family
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 6
Review: Video Telephone System
H.261
H.221
H.200/AV.250 -Series
H.320
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 7
Review: H.261 Features
Common Interchange Format
Interoperability between 25 fps and 30 fps countries
252 pix/line, 288 line, 30 fps noninterlace
Terminal equipment converts frame and line numbers
Y Cb Cr components, color sub-sampled by a factor of 2 in both
directions
Coding
Lecture 12
DCT, 8x8, 4 Y and 2 chrominance per masterblock
I and P frames only, P blocks can be skipped
Motion compensation optional, only integer compensation
(Optional) forward error correction coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 8
H.324/H.263
H.324: Like H.320
H.261/H.263
H.223
G.723.1
H.245
signaling
H.253, H.234
encryption
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 9
Parts of H.324
H.263: Video coding for low rate communications
G.723.1: Audio and speech for multimedia, 5.3 and 6.3 kbps
H.223: Multiplexing protocol
H.245: Control protocol. Can be used to specify standard, LAN,
and ATM networks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 10
Features of H.263
Intended for lower rates than H.261, including 28.8 kbit/sec
modem
Includes QCIF(176 x144) and sub-QCIF format (128 x 96 in Y
channel)
Optional error correction for mobile channels
Half-pixel accuracy motion compensation
Differential encoding of motion vectors
Improved coding of DCT coefficients
Optional advanced coding options
Lecture 12
better SNR at the same rate, lower rate at the same SNR
50% more complex than basic H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 11
Picture Formats for H.263
Image Size
Format
sub-QCIF
QCIF
CIF
ACIF
16CIF
Lecture 12
Y
128 x 96
176 x 144
352 x 288
704 x 576
1408 x 1152
Cb, Cr
64 x 48
88 x 72
176 x 144
352 x 288
704 x 576
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 12
All JPEG, ~ 12 Kbytes
Lecture 12
551x369
389x261
327x219
231x155
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 13
Experimental Procedure
Original image subsampled (using ® Photoshop) to various
resolutions (pixel number from max to max/8)
Each subsampled image JPEG coded to various quality levels
with ® Matlab
A group of images with ~ 12 Kbytes per image is compared
Result: Subsampling + JPEG coding is better, at given total bits,
than just JPEG coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 14
Future of Low-Rate Video
Solution looking for a user?
‘Picturephone’ - not popular
Liked by inventors, surveys of the public less then enthusiastic
Videoconferencing: some success, but limited acceptance
What is needed to make it successful?
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 15
Advanced Video Coding
H.263 and MPEG-4 based on ~1995 technology
After 1995, MPEG and VCEG (video coding) started working on
a new low-rate standard (H.26L)
Rec H.264 released in September 2002
Information on http://www.vcodex.com/ (some is on our web
site)
Site maintained by Ian Richardson, who has written books about
video coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 16
AVC Encoder
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 17
AVC Decoder
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 18
New Features
Prediction in I pictures
Different block transform
Different Block Sizes
Changes in motion compensation
VLC and arithmetic coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 19
I Picture Prediction
System operates with 4x4 blocks and 16x16 macroblocks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 20
9 Prediction Modes for 4x4 Blocks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 21
4 Modes for 16x16 Macroblocks
Mode 0: Vertical, extrapolate from upper samples
Mode 1: Horizontal, extrapolate from left samples
Mode 2: DC, mean of upper and left-hand samples
Mode 3: Plane, linear fit to left and upper samples
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 22
Different Block Transform
Basically, 4x4 DCT
Scanning sequence for 16x16 macroblock is shown below
4x4 and 2x2 DC coefficients transformed (again)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 23
4x4 DCT Tricks
a a a a
A b c c b
a a a a
c b b c
Y = AXAT
a = 1/2, b = 0.707 cos(π/8), c = cos(3π/8)
Trick: Y = (CXCT).*E
1
C 1
1
1
Lecture 12
1
1
1
2
a 2
E ab 2/2
a
ab /2
1
1
1
2
1
2
1
1
ab /2 a 2
b 2 / 4 ab /2
ab /2 a 2
b 2 / 4 ab /2
ab /2
b 2 / 4
ab /2
b 2 / 4
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 24
Motion Compensation Ideas
Adaptive motion compensation blocks:
Lecture 12
16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 25
Coding Ideas
Constant quantizer value
Zig-zag scan with novel run-length code
Arithmetic coding an option
Motion vectors to 1/4 pixel
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 26
Loop Filter
Concept to overcome block artifacts
Average across inter-block lines if difference
is too big
Difference threshold depends on coding
mode (intra or inter) and quantization step
size
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 27
Example of Loop Filter
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 28
Summary: AVC
16 - 4 Block size reminiscent of wavelet
Flexible scheme of motion compensation
New software and hardware for videoconferencing is using this
standard
Will Broadband brind on the age of Picturephone?
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 29
Measuring Complexity
Count RISC-like operations
r1 = a + b
a, b in external memory, r1 ~ register
3 operations, 2 loads and one add
Example: 2-D 8 point DCT, YCbCr frames, 4:2:0 sampling,15
frames/second
Lecture 12
DCT:
8 coefficient loads, 8 data loads, 8 multiply & add, one store —> 25
ops. 2x25 (for 2D) —> 50 ops per sample —> 3200 ops per block
Y is 176x144, Cr & Cb are 88 x 72 —> 594 blocks
Processing rate = 3200 x 594 x 15 = 28.5 MOPS
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 30
Processing Requirements
Range of standards
H.263 to HDTV
Processing options
Pentinum Computers
RISC Computers
either can be in multiprocessor configurations
RICS cores
DSP systems
(High-end) commodity computers
Lecture 12
Decode MPEG-1, MPEG-2, H.261
Encode H.261 at ~10 frames per second
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 31
Design of Video Coders
Feet first
Design options
Software
What platform? Pentium, RISC, choices of clock speed, bus architecture, memory
Hardware
DSP, ASIC, PLA, choices of architecture
Complete the designs
Evaluate - performance, cost
Expensive and time consuming
Forecast
Preliminary designs, preliminary evaluation
Complexity Measures, MIPS
Lecture 12
Choose among outcomes from preliminary designs
Examine best designs in detail
More alternatives can be examined
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 32
Example: DCT
Hardware choices
RISC, DSP
RISC - more versatile instruction set
DSP - faster execution
Algorithm choices
Separable matrix implementation
Regular dataflow
Parallelizable
DCT
Fast Algorithm
Fewer operations
Less regular dataflow
RISC, conventional computer
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 33
DCT: DSP, Matrix, Separable
Basic operation
8
y i c ij x j , i 1,2,
,8
j 1
8 data loads, 8 coefficient loads, 8 multiply-accumulate
operations, one data store = 25 operations per output coefficient
Basic operation: s = s + c *x
Repeat for 8 output values = 8x25 = 200 ops
Do on 8 rows = 1600 ops
(Separable) Do on 8 columns = 1600 ops
Assume coefficients are kept in a register file (fast)
Total 3200 ops (1024 loads, 128 stores)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 34
DCT - RISC implementation
From jfdctfst.c
Arai, Agui, and Nakajima's algorithm for scaled DCT
Fast 8 point DCT, repeated over rows and columns
Integer implementation
Compiled with gcc, level 2 optimization, for SPARC (Sun)
processor
Features
ftp://ftp.uu.net/graphics/jpeg/.
8 data loads and stores per 1-D DCT, all other ops are register
No multiplications (shifts and adds)
90 instructions per 8 point DCT
Total of 16x90 = 1440 instructions (128 loads, 128 stores)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 35
Sample of DCT code
tmp0 = dataptr[0] + dataptr[7];
tmp7 = dataptr[0] - dataptr[7];
tmp1 = dataptr[1] + dataptr[6];
tmp6 = dataptr[1] - dataptr[6];
Lecture 12
ld [%o1],%i0
ld [%i4],%g2
ld [%i4-24],%i1
ld [%i4-4],%g3
ld [%i4-20],%i2
add %i0,%g2,%i5
sub %i0,%g2,%o2
add %i1,%g3,%g4
ld [%i4-8],%i0
sub %i1,%g3,%o7
ld [%i4-16],%g3
addcc %o3,-1,%o3
ld [%i4-12],%g2
add %i2,%i0,%i1
sub %i2,%i0,%i2
add %g3,%g2,%i0
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 36
DCT Code - more
#define FIX_0_382683433
((INT32) 98) /* FIX(0.382683433)
*/ binary 1100010
z5 = MULTIPLY(tmp10 - tmp12,
FIX_0_382683433);
add
sub
sll
add
sll
add
sll
sra
%o7,%o2,%i0
%i3,%i0,%g3
%g3,1,%g2
%g2,%g3,%g2
%g2,4,%g2
%g2,%g3,%g2
%g2,1,%g2
dumb
%g2,8,%g3
g3=(((g3+2*g3)*16+g3)*2)/256
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 37
Design study: DCT, DSP vs RISC
DSP: 3200 ops (include 1152 memory references)
RISC: 1440 ops (include 256 memory references)
Timing:
DSP instruction = 3 ns, DSP mem reference 20 ns
RISC instruction = 5 ns, RISC mem reference 20 ns
TDSP=3200*3 + 1152*(20-3) = 29184 ns
TRISC=1440*5 + 256*(20-5) = 11040 ns
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 38
Requirements for codec
Processing requirements, H.261 compression and
decompression, CIF @ 30 fps
Compression
Function
MOPS
RGB to YCbCr
27
Motion extimation (25 searches
in a 16x16 region)
608
Inter/Intraframe coding
40
Loop filtering
55
Pixel prediction
18
2-D DCT
60
Quantization, Zig-Zag
44
Entropy coding
17
Frame reconstruction
99
Total
968
Lecture 12
Decompression
Function
Entropy decoding
Inverse quantization
2-D DCT
Motion estimation
Loop filtering
Pixel prediction
YCbCr to RGB
MOPS
17
9
60
0
55
30
27
Total
198
Reference: Table 8.1
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 39
Processor Speed Trends
General Video Processors
Programmable DSP’s
General Purpose Microprocessors
Source: Figure 8.1, Bhaskaran
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 40