ECE-C490 Winter 2000 Image Processing Architecture

Download Report

Transcript ECE-C490 Winter 2000 Image Processing Architecture

ECEC 453
Image Processing Architecture
Lecture 12, 3/3/2004
MPEG and Friends
Oleh Tretiak
Drexel University
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 1
Lecture Outline



Review of Teleconferencing
Advanced Video Coding
Computational cost of video
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 2
Picture of Layers
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 3
Video Compression: Picture Types

Group of Pictures: Three types



Lecture 12
I — intraframe coding only
P — predictive coding
B — bi-directional coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 4
Teleconferencing Standards

Digital video areas



Lecture 12
Broadcast television
Recorded programs
Two-way communications
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 5
Review: Action in the Video Arena



The sponsors: ITU/T SG 15 and ISO/IEC MPEG
The players: H.x standards and MPEG-x standards
Standards, ITU-T (Telecom Guys)




H.261 (1990)
H.263 (draft March 1995)
New standards in the works
Standards, ISO/IEC (Entertainment Video)

Lecture 12
MPEG family
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 6
Review: Video Telephone System
H.261
H.221
H.200/AV.250 -Series
H.320
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 7
Review: H.261 Features

Common Interchange Format





Interoperability between 25 fps and 30 fps countries
252 pix/line, 288 line, 30 fps noninterlace
Terminal equipment converts frame and line numbers
Y Cb Cr components, color sub-sampled by a factor of 2 in both
directions
Coding




Lecture 12
DCT, 8x8, 4 Y and 2 chrominance per masterblock
I and P frames only, P blocks can be skipped
Motion compensation optional, only integer compensation
(Optional) forward error correction coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 8
H.324/H.263

H.324: Like H.320
H.261/H.263
H.223
G.723.1
H.245
signaling
H.253, H.234
encryption
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 9
Parts of H.324




H.263: Video coding for low rate communications
G.723.1: Audio and speech for multimedia, 5.3 and 6.3 kbps
H.223: Multiplexing protocol
H.245: Control protocol. Can be used to specify standard, LAN,
and ATM networks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 10
Features of H.263







Intended for lower rates than H.261, including 28.8 kbit/sec
modem
Includes QCIF(176 x144) and sub-QCIF format (128 x 96 in Y
channel)
Optional error correction for mobile channels
Half-pixel accuracy motion compensation
Differential encoding of motion vectors
Improved coding of DCT coefficients
Optional advanced coding options


Lecture 12
better SNR at the same rate, lower rate at the same SNR
50% more complex than basic H.261
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 11
Picture Formats for H.263
Image Size
Format
sub-QCIF
QCIF
CIF
ACIF
16CIF
Lecture 12
Y
128 x 96
176 x 144
352 x 288
704 x 576
1408 x 1152
Cb, Cr
64 x 48
88 x 72
176 x 144
352 x 288
704 x 576
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 12
All JPEG, ~ 12 Kbytes
Lecture 12
551x369
389x261
327x219
231x155
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 13
Experimental Procedure




Original image subsampled (using ® Photoshop) to various
resolutions (pixel number from max to max/8)
Each subsampled image JPEG coded to various quality levels
with ® Matlab
A group of images with ~ 12 Kbytes per image is compared
Result: Subsampling + JPEG coding is better, at given total bits,
than just JPEG coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 14
Future of Low-Rate Video


Solution looking for a user?
‘Picturephone’ - not popular



Liked by inventors, surveys of the public less then enthusiastic
Videoconferencing: some success, but limited acceptance
What is needed to make it successful?
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 15
Advanced Video Coding





H.263 and MPEG-4 based on ~1995 technology
After 1995, MPEG and VCEG (video coding) started working on
a new low-rate standard (H.26L)
Rec H.264 released in September 2002
Information on http://www.vcodex.com/ (some is on our web
site)
Site maintained by Ian Richardson, who has written books about
video coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 16
AVC Encoder
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 17
AVC Decoder
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 18
New Features





Prediction in I pictures
Different block transform
Different Block Sizes
Changes in motion compensation
VLC and arithmetic coding
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 19
I Picture Prediction

System operates with 4x4 blocks and 16x16 macroblocks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 20
9 Prediction Modes for 4x4 Blocks
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 21
4 Modes for 16x16 Macroblocks




Mode 0: Vertical, extrapolate from upper samples
Mode 1: Horizontal, extrapolate from left samples
Mode 2: DC, mean of upper and left-hand samples
Mode 3: Plane, linear fit to left and upper samples
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 22
Different Block Transform



Basically, 4x4 DCT
Scanning sequence for 16x16 macroblock is shown below
4x4 and 2x2 DC coefficients transformed (again)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 23
4x4 DCT Tricks
a a a a 


A  b c c b 
a a a a

c b b c


Y = AXAT

a = 1/2, b = 0.707 cos(π/8), c = cos(3π/8)

Trick: Y = (CXCT).*E
1

C  1
1

1


Lecture 12

1
1
1
2
 a 2

E  ab 2/2
a

ab /2
1
1
1
2
1 
2
1 
1

ab /2 a 2
b 2 / 4 ab /2
ab /2 a 2
b 2 / 4 ab /2
ab /2
b 2 / 4 
ab /2
b 2 / 4 

Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 24
Motion Compensation Ideas

Adaptive motion compensation blocks:

Lecture 12
16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 25
Coding Ideas




Constant quantizer value
Zig-zag scan with novel run-length code
Arithmetic coding an option
Motion vectors to 1/4 pixel
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 26
Loop Filter



Concept to overcome block artifacts
Average across inter-block lines if difference
is too big
Difference threshold depends on coding
mode (intra or inter) and quantization step
size
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 27
Example of Loop Filter
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 28
Summary: AVC




16 - 4 Block size reminiscent of wavelet
Flexible scheme of motion compensation
New software and hardware for videoconferencing is using this
standard
Will Broadband brind on the age of Picturephone?
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 29
Measuring Complexity

Count RISC-like operations




r1 = a + b
a, b in external memory, r1 ~ register
3 operations, 2 loads and one add
Example: 2-D 8 point DCT, YCbCr frames, 4:2:0 sampling,15
frames/second



Lecture 12
DCT:
8 coefficient loads, 8 data loads, 8 multiply & add, one store —> 25
ops. 2x25 (for 2D) —> 50 ops per sample —> 3200 ops per block
Y is 176x144, Cr & Cb are 88 x 72 —> 594 blocks
Processing rate = 3200 x 594 x 15 = 28.5 MOPS
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 30
Processing Requirements

Range of standards


H.263 to HDTV
Processing options


Pentinum Computers
RISC Computers
 either can be in multiprocessor configurations



RICS cores
DSP systems
(High-end) commodity computers


Lecture 12
Decode MPEG-1, MPEG-2, H.261
Encode H.261 at ~10 frames per second
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 31
Design of Video Coders

Feet first

Design options
 Software

What platform? Pentium, RISC, choices of clock speed, bus architecture, memory
 Hardware

DSP, ASIC, PLA, choices of architecture




Complete the designs
Evaluate - performance, cost
Expensive and time consuming
Forecast

Preliminary designs, preliminary evaluation
 Complexity Measures, MIPS



Lecture 12
Choose among outcomes from preliminary designs
Examine best designs in detail
More alternatives can be examined
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 32
Example: DCT

Hardware choices

RISC, DSP
 RISC - more versatile instruction set
 DSP - faster execution

Algorithm choices

Separable matrix implementation
 Regular dataflow
 Parallelizable
 DCT

Fast Algorithm
 Fewer operations
 Less regular dataflow
 RISC, conventional computer
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 33
DCT: DSP, Matrix, Separable
Basic operation

8
y i   c ij x j , i  1,2,
,8
j 1

8 data loads, 8 coefficient loads, 8 multiply-accumulate
operations, one data store = 25 operations per output coefficient






Basic operation: s = s + c *x
Repeat for 8 output values = 8x25 = 200 ops
Do on 8 rows = 1600 ops
(Separable) Do on 8 columns = 1600 ops
Assume coefficients are kept in a register file (fast)
Total 3200 ops (1024 loads, 128 stores)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 34
DCT - RISC implementation

From jfdctfst.c


Arai, Agui, and Nakajima's algorithm for scaled DCT






Fast 8 point DCT, repeated over rows and columns
Integer implementation
Compiled with gcc, level 2 optimization, for SPARC (Sun)
processor
Features


ftp://ftp.uu.net/graphics/jpeg/.
8 data loads and stores per 1-D DCT, all other ops are register
No multiplications (shifts and adds)
90 instructions per 8 point DCT
Total of 16x90 = 1440 instructions (128 loads, 128 stores)
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 35
Sample of DCT code
tmp0 = dataptr[0] + dataptr[7];
tmp7 = dataptr[0] - dataptr[7];
tmp1 = dataptr[1] + dataptr[6];
tmp6 = dataptr[1] - dataptr[6];
Lecture 12
ld [%o1],%i0
ld [%i4],%g2
ld [%i4-24],%i1
ld [%i4-4],%g3
ld [%i4-20],%i2
add %i0,%g2,%i5
sub %i0,%g2,%o2
add %i1,%g3,%g4
ld [%i4-8],%i0
sub %i1,%g3,%o7
ld [%i4-16],%g3
addcc %o3,-1,%o3
ld [%i4-12],%g2
add %i2,%i0,%i1
sub %i2,%i0,%i2
add %g3,%g2,%i0
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 36
DCT Code - more


#define FIX_0_382683433
((INT32) 98) /* FIX(0.382683433)
*/ binary 1100010
z5 = MULTIPLY(tmp10 - tmp12,
FIX_0_382683433);








add
sub
sll
add
sll
add
sll
sra
%o7,%o2,%i0
%i3,%i0,%g3
%g3,1,%g2
%g2,%g3,%g2
%g2,4,%g2
%g2,%g3,%g2
%g2,1,%g2
dumb
%g2,8,%g3
g3=(((g3+2*g3)*16+g3)*2)/256
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 37
Design study: DCT, DSP vs RISC



DSP: 3200 ops (include 1152 memory references)
RISC: 1440 ops (include 256 memory references)
Timing:




DSP instruction = 3 ns, DSP mem reference 20 ns
RISC instruction = 5 ns, RISC mem reference 20 ns
TDSP=3200*3 + 1152*(20-3) = 29184 ns
TRISC=1440*5 + 256*(20-5) = 11040 ns
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 38
Requirements for codec

Processing requirements, H.261 compression and
decompression, CIF @ 30 fps
Compression
Function
MOPS
RGB to YCbCr
27
Motion extimation (25 searches
in a 16x16 region)
608
Inter/Intraframe coding
40
Loop filtering
55
Pixel prediction
18
2-D DCT
60
Quantization, Zig-Zag
44
Entropy coding
17
Frame reconstruction
99
Total
968
Lecture 12
Decompression
Function
Entropy decoding
Inverse quantization
2-D DCT
Motion estimation
Loop filtering
Pixel prediction
YCbCr to RGB
MOPS
17
9
60
0
55
30
27
Total
198
Reference: Table 8.1
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 39
Processor Speed Trends
General Video Processors
Programmable DSP’s
General Purpose Microprocessors
Source: Figure 8.1, Bhaskaran
Lecture 12
Image Processing Architecture, © 2001-2004 Oleh Tretiak
Page 40