Pengantar Pemampatan Video - Institut Teknologi Bandung

Download Report

Transcript Pengantar Pemampatan Video - Institut Teknologi Bandung

Introduction to Multimedia
Data Compression
DSP Research and Technology
IURC Microelectronics ITB
DSP Research and Technology Group ITB
IURC-ME ITB
Contents
• Introduction: Compression Objectives
• Multimedia: System and Applications
• Digital Media
–
–
–
–
Voice
Music
Image
Video
• Compression
– Basic Principles
– Multi Media Compression
– Standards
• What Next? Future Technology MPEG-7, Emotion
Digitization
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia Compression
Objectives
• Primary Objectives
– To use digital bits efficiently for representing
multimedia signals effectively
• Multimedia signals
– Speech, audio/music, images, video, dll
• Applications
– Communications, Internet, Broadcasting,
Storage
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia Technology
It’s importance
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia: ART Joins BRAIN
Merging of advanced digital and analog technology
Analog or
Real-World
Signals
Analog Domain
Analog
Signal
Processing
Digital Domain
A/D
D/A
Converter
Technology
Provides
the
Bridge
DSP Research and Technology Group ITB
Digital
Signal
Processing
Digital or
Computer
-World
Signals
[ICE 1997]
IURC-ME ITB
Three Types of Contents: Control,
Information, and Media
Control
(Automation, Interaction, Instruction)
Information
(Database, WEB)
D
NS
Low Bitrate
Error Intolerant
(Intermittent)
IP
PS
Medium Bitrate
Error Disliked
(Burst)
Media
CS
(Speech, Music, Image, Video) MPLS
High Bitrate
Error tolerant
(Stream)
Evolution trends
DSP Research and Technology Group ITB
IURC-ME ITB
Media Access
Information Infrastructure
PSTN
Leased
Lines
CDMA/GSM
IP
Cloud
Private
IP
Transmux
User
Services
Billing
Control/
Gatekeeper
Media Switch
Local
Media
Server
Info Switch
Gateway
Media
Channel
Control
Channel
Information
Channel
Local
Info
Server
Transmux
Analog
WLL WLAN Ethernet RS232/
Phone + V34
RS422
Fiber
Powerline
Users
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia Systems
A technology of use and
integration of different
media, such as
•
•
•
•
•
Text
Graph
Speech
Audio
Video
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia System Cores
Computing
Communications
System/
Integration
Signal Processing
DSP Research and Technology Group ITB
IURC-ME ITB
Applications
•
•
•
•
•
•
•
Education
Health Care
Consumer Electronics
Geographical IS
Navigational Systems
Business and Finance
High Quality
Communications
•
•
•
•
•
•
•
Digital Libraries
Entertainment
Telecommuting
Publishing
Virtual Reality
Commercial Electronics
Cooperation
[Gray97]
DSP Research and Technology Group ITB
IURC-ME ITB
Education
•
•
•
•
Information Access
Teaching Tools
Interactive Teaching
Distance Learning
DSP Research and Technology Group ITB
IURC-ME ITB
Health Care
• Biomedical data
acquisition,
transmission,
storage,
interpretation
• Diagnosis aids
• Tele medicine
DSP Research and Technology Group ITB
IURC-ME ITB
Collaboration Environment
DSP Research and Technology Group ITB
IURC-ME ITB
Media Processing and Integration
• Text and Graph
Compression
• Speech, audio,
image, and video
processing and
coding
• Joint audio visual
coding
• Hypermedia, 3D, VR
processing
DSP Research and Technology Group ITB
IURC-ME ITB
Multimedia System Design and Implementation
• Parallel DSP
architecture
• ASIC design
• DSP Software and
hardware design
• Sound and Display
devices and peripherals
• Storage technology
• System integration
DSP Research and Technology Group ITB
IURC-ME ITB
Application Developers
• Design houses
• Software developers
• Applications market
systems
DSP Research and Technology Group ITB
IURC-ME ITB
Media Digitalization
Converting Media Into Bits
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Media System Diagram
Media:
Speech
Audio
Images
Video
Channel:
Internet, Broadcast, Point-to-Point,
Telephone, Satellite, Cable, Wireless
Media
Digital Bit
A/D
Converter
Compression
Encoder
Transmitter /
Encoder /
Recorder
•
•
•
•
•
•
Media
Digital Bit
Media
Bitstream
Compression
Decoder
Storage:
CD, VCD, DVD, HD, Tape
Media:
Speech
Audio
Images
Video
D/A
Converter
Receiver /
Decoder /
Player
A/D converter converts multimedia signals into digital bit
Digital bits are digital representation of the media signals
Compression Encoder reduces the number of representation bits without eliminating
media contents
Bitstream is highly-compacted digital bits as the compression results
Compression Decoder reconstructs digital bits back from the bitstream
D/A converter converts digital bits into multimedia signal
DSP Research and Technology Group ITB
IURC-ME ITB
Analog Signal
xa t   A cost   ;  t  
xa(t)
A
Tp 
1
F
Acos
t
xa t   A cos2Ft   ;  t  
DSP Research and Technology Group ITB
A: Amplitude
: Frequency
(Radian)
F: Frequency
(Hertz)
: Phase
IURC-ME ITB
Digital Signal
xn  A cosn   ;  n  
x(n)
A
N
Acos
n
xn  A cos2fn   ;  n  
DSP Research and Technology Group ITB
A: Amplitude
: Frekuency
(Radian)
f: Frequency
: Phase
IURC-ME ITB
Analog-Digital Conversion
• Sampling, with Fs
= 1/T
Analog Signal
Filter
xn  xa t  |t  nT  xa nT 
Bandlimited Signal
Sampling
• Quantization
Discrete Time
xn  xa t  |t  nT  xa nT  Quantization
• Coding
Discrete Valued
2b  L
Coding
Digital
DSP Research and Technology Group ITB
IURC-ME ITB
Sampling of Sinusoids
Suppose there is an analog signal being sampled
xa t   A cos2Ft   
If sampling frequency Fs = 1/T, we obtain


F
x(n)  xa t  |t  nT  A cos2FnT     A cos 2
n   
 Fs

Thus there is a linear relationship f = F/Fs or  = T
1/2
f
Folding
Frequency
F
Fs
0.5Fs
-1/2
DSP Research and Technology Group ITB
IURC-ME ITB
Sampling Theorem
To reconstruct analog signals having maximum frequency
Fmax = B while Fs > 2 Fmax, use sinc interpolator:
g t  
xa t  
sin 2Bt
2Bt

 xng t  nT 
n  
Fn = 2 Fmax is called Nyquist rate.
- Sampling frequency must exceed Nyquist rate.
- If sampling frequency cannot be increased, Fmax must be
lowered with an anti-aliasing filter.
DSP Research and Technology Group ITB
IURC-ME ITB
Quantization
•
•
•
•
•
•
•
Quantisation Error
eq(n) = xq(n)-x(n)
Is limited by resolution D
-D/2  eq(n)  D /2
Resolution improves is the
number of quantization level
L increases while dynamic
range (xmax-xmin) decreases:
D=dynamic range /(L-1)
L = 2b, where b is the number
of bits per sample
Total bit = b x Fs
Signal-to-Noise Ratio (SNR
dB) = 10 log( Signal Energy /
Distortion Energi)
SNR is about 6 x b dB
Amplitude
xq(n)
xmax
D
xa(t)
xmin
Quantization levels
DSP Research and Technology Group ITB
n
IURC-ME ITB
Applications On Digital Speech
• Speech signal from a microphone
– filtered (anti aliasing) 300Hz – 3300 Hz
– sampled at 8000 samples per second
– Quantization resolution is 8 bits per sample.
• Resulting digital speech data:
– Bit rates needed: 8000 x 8 = 64 kbps
– Quality: SNR about 48 dB
DSP Research and Technology Group ITB
IURC-ME ITB
Applications on Audio / Music
• Audio signal from audio source
(microphone, audio out)
–
–
–
–
filterered (anti aliasing) 0Hz – 20000 Hz
sampled 44100 at samples per second
quantisation at 16 bit per sample resolution
has two channels L-R stereo
• Resulting digital audio data:
– Bit rates: 44100 x 16 x 2 = 1,411,200 bps =
about 1.4 Mbps
– Quality: SNR about 96 dB
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Image
• In contrast with
speech and music,
an image signal is
known as an
intensity signal at a
two dimensional
domain.
Marco: 294x383, 24 bpp, colour,
331 KBytes
DSP Research and Technology Group ITB
IURC-ME ITB
Sampling of Digital Image
Image Size
Width x Height
Resolution DPI =
(N x M) / (Width x
Height)
Digital Image
N x M pixels
A Picture
Elemen
(pixel)
Height
M
Rows
Width N Column
New Line Boundaries
N x M sample file
DSP Research and Technology Group ITB
IURC-ME ITB
Interlaced vs Progresive Scan
Original Image
Progressive
Interlaced (2 half-pictures being integrated)
DSP Research and Technology Group ITB
IURC-ME ITB
Color Table
Index
Color
0
1
Monochrome,
1 bpp
•
•
•
Index
Color
Index
Color
0
5
127
153
0
5
127
153
0
5
127
153
Green
0
5
127
153
191
191
191
191
191
255
Greyscale, 8
bpp
255
255
255
255
Custom
Colour, 8 bpp
Red
Blue
0
5
127
153
RGB Colour, 24 bpp
Color Table is a code mapping indices with certain color
Every picture sample (pixel) contains an index (in bits) of Color
Table
More bits per index results in richer possible color, but in effect
increases bits required per image
DSP Research and Technology Group ITB
IURC-ME ITB
Additive Color RGB
Magenta
White
Blue
Red
Cyan
Yellow
Green
DSP Research and Technology Group ITB
IURC-ME ITB
RGB Images
24 bpp
RGB
8 bpp
Red
Component
8 bpp
Blue
Component
8 bpp
Green
Component
DSP Research and Technology Group ITB
IURC-ME ITB
Hue-Saturation-Brightnes
A. Saturation B. Hue C. Brightness D. All hues
DSP Research and Technology Group ITB
IURC-ME ITB
Alternative to RGB: Y-C1-C2
• Colour image can be
represented by three
‘8 bpp’ images: R, G,
B.
• Alternatively, one can
use three ‘8 bpp’
images: Luminance Y,
Chrominance 1 (C1),
and Chrominance 2
(C2)
• NTSC, PAL, and
Secam use similar
Luminance but slightly
different Chrominance
Luminance
RED
GREEN
RGB to
Y C1 C2
Chrominance 1
BLUE
Chrominance 2
Standards
Y
Cb
Cr
NTSC
Y
I
Q
PAL
Y
U
V
SECAM
Y
Db
Dr
DSP Research and Technology Group ITB
IURC-ME ITB
Color Picture Sizes
Standar
Compone 30 fps
25 fps
nt
(NTSC)
(PAL)
CCIR 601 Y
720 x 480 720 x 576
Cb, Cr
360 x 480 360 x 576
Y
360 x 240 360 x 288
Cb Cr
180 x 120 180 x 144
Significant Y
Pixel Area
Cb, Cr
SIF
352 x 240 352 x 288
SIF
176 x 120 176 x 144
DSP Research and Technology Group ITB
IURC-ME ITB
RGB-NTSC Color Converter
• Conversion
– : Y = 0.299 R + 0.587 G + 0.114 B
– : I = 0.596 R – 0.274 G – 0.322 B
– : Q = 0.211 R – 0.523 G + 0.311 B
• Inversion
– : R = Y + 0.956 I + 0.621 Q
– : G = Y – 0.272 I – 0.649 Q
– : B = Y – 1.106 I + 1.703 Q
DSP Research and Technology Group ITB
IURC-ME ITB
RGB-PAL Color Converter
• Conversion
– : Y = 0.299 R + 0.587 G + 0.114 B
– : U = – 0.148 R – 0.289 G + 0.437 B
– : V = 0.615 R – 0.515 G – 0.100 B
• Inversion
– : R = Y + 1.14 V
– : G = Y – 0.394 U – 0.581 V
– : B = Y + 2.032 U
DSP Research and Technology Group ITB
IURC-ME ITB
RGB-SECAM Color Converter
• Conversion
– : Y = 0.299 R + 0.587 G + 0.114 B
– : Db = – 0.450 R – 0.833 G + 1.333 B
– : Dr = – 1.333 R + 1.116 G – 0.217 B
• Inversion
– : R = Y – 0.526 Dr
– : G = Y – 0.129 Db + 0.268 Dr
– : B = Y + 0.665 Db
DSP Research and Technology Group ITB
IURC-ME ITB
YUV Example
24 bpp
Original
8 bpp
Y Component
8 bpp
U
Component
(offset by 0.5)
8 bpp
V
Component
(offset by 0.5)
DSP Research and Technology Group ITB
IURC-ME ITB
Color Needs Many Bits
8 bpp
greyscale
66 KBytes
1 bpp
Monochrome
9 KBytes
24 bpp
colour,
193 KBytes
8 bpp
Custom Color
66 KBytes
Marco 256 x 256
DSP Research and Technology Group ITB
IURC-ME ITB
Size Factor
Marco: 256x256, 24 bpp, colour,
193 KBytes
DSP Research and Technology Group ITB
Marco: 294x383, 24 bpp, colour,
331 KBytes
IURC-ME ITB
Number of Samples Factor
64x64
12,344 bytes
32x32
3,148 bytes
16x16
824 bytes
8x8
248 bytes
DSP Research and Technology Group ITB
IURC-ME ITB
Comparison of Spatial Resolution
1920 x 1080 = 2 Mpixel per frame, HDTV
1280 x 720 = 1 Mpixel per frame, HDTV
800 x 600 SVGA
420 x 411 PAL TV
360 x 345 MPEG
352 x 288 CIF
176 x
144
QCIF
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Video Signal
3
5
7
Time Index 1
9
Time Index 11
• Digital video signal is a collection of digital
images (called frames) that is ‘displayed’ in
sequence with respect to time index.
DSP Research and Technology Group ITB
IURC-ME ITB
Frames Per Second Factor
• Increasing frame
fps
per second (fps)
improves transition 5
smoothness such
that images
become alive.
25
However that
increases required
number of bits.
30
• Typical fps are 5
fps, 30 fps, dan 60
fps, for videophone,
TV, and HDTV
60
Dim
Mbps
256 x 256
24 bpp
7.8
256 x 256
24 bpp
39.3
256 x 256
24 bpp
47.2
256 x 256
24 bpp
94.4
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Video Signal
Time
Index
1
Time
Index
L
N x M pixel /
frame
B bit/pixel
N x M x L x B bits video file
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Video Structure
A GOP consists of
several Frames
A Video signal consists of
Groups of Pictures (GOPs)
A Frame consists of
Groups of macroBlocks
(GOBs)
GOB 1
M
GOB 2
GOP 1
GOP 2
...
GOP 3
GOB L
MB 1
MB2
MB3
B1
B2
B3
B4
MB...
DSP Research and Technology Group ITB
N
A MB
consists
of four
Blocks
8
B4
8
A GOB consists of several Macro
Blocks (MBs)
One Block
has 64 pixels
IURC-ME ITB
Macroblocks and YUV File
Data Bits of
YUV file start
here
Y
Frame 1
Y
Frame 2
16
8
U
4:2:0 Macroblock from
a YUV File
V
8
Y
8
16
8
U
Frm 1
U
Frm 2
V
Frm 1
V
Frm 2
Data
Bits end
here
4:2:0 YUV File
DSP Research and Technology Group ITB
IURC-ME ITB
Compression of Media Bits
Saving Bits For Multimedia
DSP Research and Technology Group ITB
IURC-ME ITB
Compression: Bit Savings
Media
Bit Rates
Without
Compression
With
Compression
Speech (8 ksps, 8 bit per sample)
64 kbps
2-8 kbps
Slow Motion Video (10 fps) 176 x 144,
24 bpp
6.08 Mbps
8 – 16 kbps
Audio Conference (8 ksps, 16 bit per
sample)
128 kbps
6 – 64 kbps
Video Conference (15 fps) 352 x 288,
24 bpp
36.5 Mbps
64 – 768 kbps
Digital Audio Stereo (44.1 ksps, 16 bit
per sample)
1.5 Mbps
128-768 kbps
MPEG VCD (30 fps) 352 x 288, 24 bpp
72.99 Mbps
1.5 – 4 Mbps
Broadcast MPEG (30 fps) 720 x 480, 24
bpp
248.33 Mbps
3 – 8 Mbps
HDTV (59.94 fps), 1280 x 720, 24 bpp
1.33 Gbps
20 Mbps
DSP Research and Technology Group ITB
IURC-ME ITB
How To Compress?
Media
Data N
bits
Insignificant
Data
(Irrelevansi)
Redundant Data
(Duplication)
Media
Data N
bits
Compress
Media Bitstream
M < N bits
•
•
Decompress
Pockets are
refilled with
Redundancy
(Duplication)
Pockets of
redundant and
insignificant data is
‘deflated’
Redundancy Elimination : Remove Duplications -> Data: Lossless -> Entropy
is the lower bound (i.e., entropy becomes the most compact size)
Irrelevansi Reduction: Remove data that are not significant -> Data: Lossy
DSP Research and Technology Group ITB
IURC-ME ITB
Compression Issues
Technology Target
Excellent: 5
Good: 4
Fair: 3
Video
Still Image
Speech
Wideband Audio
Poor: 2
Bad: 1
1/8
1/4
1/2
1
2
Rate (Bits per sample)
DSP Research and Technology Group ITB
[Jaya92]
IURC-ME ITB
Standards Available?
Media
Algorithms
Example of Compression
Standards
Text and
Data
RLE, Statistical, Entropy
Huffman, Arithmetic,
MNP5, MNP7, LZW, ZIP,
ARJ
Speech
LPC, Hybrid, Waveform
Coding
ADPCM, G.723.1, LPC-10,
G.729, GSM, VSELP,
CELP
Music
Subband Coding
MPEG, AAC, Dolby, MP3
Image
PCM, DPCM, SQ, VQ,
Fractal, Transform Coding,
Subband Coding, Wavelet,
DCT
JPEG, EZW, SPIHT,
JPEG2000
Video
Hybrid Coding, Motion
Estimation
H.261, H.263, MPEG-1,
MPEG 2, MPEG-4
DSP Research and Technology Group ITB
IURC-ME ITB
Subsampling, PCM, and
DPCM
Compression of Multimedia
Through Spatial and Scalar
Quantization, as well as Prediction
DSP Research and Technology Group ITB
IURC-ME ITB
Issues
• Factors determining bit rates
• Reducing bit rates with Spatial
Subsampling
• Reducing bit rates with PCM
quantization
• Reducing bit rates with prediction
DPCM
DSP Research and Technology Group ITB
IURC-ME ITB
Subsampling Principles
• There are areas with small
variations of pixel values
• This measn the picture has
low frequency
• Subsampling with sampling
frequency larger than twice
highest frequency does not
remove information
• Subsamplign factor are
sampling frequency divisor.
(if the divisor is 8, sampling
frequency is reduced by 1/8
of the original sampling
frequency).
DSP Research and Technology Group ITB
256x256
193 KBytes
64x64
12,344 bytes
Subsampling
Factor
4
IURC-ME ITB
Aliasing
•
•
Aliasing occurs with Nyquist crfiteria is not satisfied
Subsampling mix spectra especially at high frequency, such that
–
–
High frequency is damaged
False high frequency occurs.
Spectrum of Digital Signals is a sum of infinite duplicates of basic
analog spectrum shifted by Fs
Spectrum 1st
Duplicate
Basic
Spectrum
0
-Fs / 2
Fs / 2
Subsampling with factor 2
Damages basic spectrum
Basic
Spectrum
0
Fs / 2
Fs
Frequency
Prefilter controls damages
Spectrum 1st
Duplicate
Fs
Basic
Spectrum
Frequency
DSP Research and Technology Group ITB
0
Fs / 2
Spectrum 1st
Duplicate
Fs
Frequency
IURC-ME ITB
Subsampling, Spectrum and Aliasing
Basic
Spectrum
Original
Subsampling
Without
Pre Filter
Subsampling
with
Pre Filter
DSP Research and Technology Group ITB
IURC-ME ITB
Subsampling Limits
Subsampling
2 times
Original
128x128
Subsampling
8 times
Subsampling
4 times
DSP Research and Technology Group ITB
IURC-ME ITB
Impacts on Color Subsampling
24 bpp
Original
16 bpp
YUV
4:2:2
12 bpp
YUV
4:2:0
10 bpp
YUV
DSP Research and Technology Group ITB
IURC-ME ITB
PCM Quantization
Frequency of
Occurance
10
7
3
0
1
2
4
Intensity
3
4
Histogram
Pixel Intensity in a 6 x 4 image
•
•
•
In Pulse Code Modulation (PCM), bit per pixel allocation is limited
This bit is used to index pixel intensity. More pixel intensity levels requires
more bits.
We assume that dynamic range (max intensity – min intensity) is limited
DSP Research and Technology Group ITB
IURC-ME ITB
PCM Bit Rate Reduction
Line 1 of Image
General Cases
Intensity
PCM
1 Bit
Spatial Sample (Width Direction)
DSP Research and Technology Group ITB
PCM
2 Bit
PCM
3 Bit
PCM
4 Bit
Spatial Sample (Width Direction)
IURC-ME ITB
Coding of Quantization Output
Encoder:
x
Quantization
yj
00101101
Mapping to
Bits
Quality : SNR
Decoder:
yj
0 0 10 110 1
Decoding bit
DSP Research and Technology Group ITB
IURC-ME ITB
Pulse Code Modulation (PCM)
Typical Compression Scheme:
x
Natural
Binary Code
Quantization
(Fine)
stream bit
Inversi
NBC
x^
Code Table Natural Binary Code (NBC):
Index
0
1
2
3
4
5
6
7
Code
000
001
010
011
100
101
110
111
DSP Research and Technology Group ITB
IURC-ME ITB
PCM Quality
Graph of SNR vs R
Bitrate PCM:
RPCM  log 2( L)
  x2 
1
R theoretical, Gaussian  log 2  2 
 
2
 q
Quality:
SNR Gausian  6,02  RPCM
DSP Research and Technology Group ITB
IURC-ME ITB
Bitrate
5 bpp
8 bpp
2 bpp
1 bpp
DSP Research and Technology Group ITB
IURC-ME ITB
Probability Of Sample Levels
Q(x)
p(x)
x
Levels with higher
probability
Probability:
Pk 
xk 1
 p( x)dx
xk
DSP Research and Technology Group ITB
IURC-ME ITB
Feedforward vs Feedback Schemes
Feedforward
x (n)
Feedback
Signal
Measurement
Signal
Measuremen
t
Select
Quantization
Step
Select
Quantization
Step
Quantize +
NBC
bit overhead
001010110101
x (n)
bit data
DSP Research and Technology Group ITB
Quantize +
NBC
Decode
bit data
001010110101
IURC-ME ITB
Code with Fixed vs Variable Length
 Fixed Length Code (FLC): Although effective, fixed length code such as
Natural Binary Code (NBC) is not efficient
 Variable Length Code (VLC): Quantization levels with high probability (e.g.,
sample 0) are assigned with short codewords. Examples: Huffman code.
Quantization
Level
Index k
Probability Pk
NBC
y0
y1
y2
y3
y4
y5
y6
y7
0
1
2
3
4
5
6
7
0.005
0.02
0.14
0.20
0.51
0.08
0.04
0.005
111
110
101
100
000
001
010
011
DSP Research and Technology Group ITB
Huffman
1001111
100110
101
11
0
1000
10010
1001110
IURC-ME ITB
Average Code Length
 Average code length (in bits) for a code:
L
Rrata rata   pk nk
k 1
nk = number of bits for k-th codeword
 Minimum average code length that can be obtained is entropy
Theorema source code:
L
Rminimum  H ( x)   pk log 2 pk
k 1
 H(x) is memoryless entropy
 Example:
RNBC = 3;
Raverage = 2.204;
DSP Research and Technology Group ITB
Rhuffman = 2.04
IURC-ME ITB
What is a good VLC
 A good VLC is a code satisfying:






Average code length approaches H(x) bits/sample
Codewords can be uniquely decoded and are not prefix of each other
Can be easily encoded and decoded
Self synchronization (additional feature)
Can be decoded from both sides (additional feature)
Source code theorem does not provide ways of obtaining a good VLC
y0
y4 y4 y3 y6
y1
Encoder
1001111 0 0 11 10010 100110
Decoder
1001111001110010100110
DSP Research and Technology Group ITB
IURC-ME ITB
Limitation of Huffman code
 Rate always >= 1.0 bit/sample
 Predesigned kode
 Fixed kode table
 If the probability of data does not match probability used to create the code,
expansion may occur.
 In practices:
- Implemented in two-passes
- Adaptif block (code table per data block)
- Huffman rekursive (continuous table update)
DSP Research and Technology Group ITB
IURC-ME ITB
Runlength Coding
 Runlength Coding is used for data containing clusters with similar values
Example :
Values “0” and “1” in sequence
00000000001111111111110001111111111111000000
10
12
3
13
6
 Very efficient for coding quantization output “0” (dead zone)
 Used in combination with signal amplitude:
Huffman code for {run length of zero, signal amplitude }
DSP Research and Technology Group ITB
IURC-ME ITB
Example in JPEGCode
Null Length
Amplitude Category
Code Length
0
0
0
0
0
0
0
1
2
3
4
5
6
7
2
2
3
4
5
6
7
00
01
100
1011
11010
111000
1111000
1
1
1
1
1
2
3
4
4
6
7
9
1100
111001
1111001
111110110
2
2
1
2
5
8
11011
11111000
3
3
1
2
6
9
111010
111110111
4
5
6
7
8
9
10
1
1
1
1
1
1
1
6
7
7
8
8
9
9
111011
1111010
1111011
11111001
11111010
111111000
111111001
4
1010
End of Block (EOB)
DSP Research and Technology Group ITB
Code
IURC-ME ITB
Discrete Cosine Transform
8x8 DCT
Coefficient
128x128
Original
8 bpp
I-DCT
1.02 bpp
31.7dB
Quantized
DCT
DSP Research and Technology Group ITB
IURC-ME ITB
Example DCT Quantization
Luminance Quantization JPEG
Chrominance Quantization JPEG
16
11
10
16
24
40
51
61
17
18
24
47
99
99
99
99
12
12
14
19
26
58
60
55
18
21
26
66
99
99
99
99
14
13
16
24
40
57
69
56
24
26
56
99
99
99
99
99
14
17
22
29
51
87
80
62
47
66
99
99
99
99
99
99
18
22
37
56
68 109 103 77
99
99
99
99
99
99
99
99
24
35
55
64
81 104 113 92
99
99
99
99
99
99
99
99
49
64
78
87 103 121 120 101
99
99
99
99
99
99
99
99
72
92
95
98 112 100 103 99
99
99
99
99
99
99
99
99
214 49
-3
20
-10
-1
1
-1
13
4
0
1
0
0
0
0
34
-25
11
13
5
-3
15
-6
3
-2
1
1
0
0
0
0
-6
-4
8
-9
3
-3
5
10
0
0
1
0
0
0
0
0
8
-10
4
4
-15
10
6
6
1
-1
0
0
0
0
0
0
-12
5
-1
-2
-15
9
-5
-1
-1
0
0
0
0
0
0
0
5
9
-8
3
4
-7
-14
2
0
0
0
0
0
0
0
0
2
-2
3
-1
1
3
-3
-4
0
0
0
0
0
0
0
0
-1
1
0
2
3
-2
-4
-2
0
0
0
0
0
0
0
0
Example DCT Luminance
Quantization Results of Luminance
DSP Research and Technology Group ITB
IURC-ME ITB
VLC: Huffman
Data DCT
0
24
-24
40
-40
56
-56
72
Total
Frequency Fix Code VLC
40
000
0
20
001
100
15
010
101
10
011
110
5
100
11100
4
101
11101
3
110
11110
3
111
11111
100
300 bit
250 bit
DSP Research and Technology Group ITB
IURC-ME ITB
Huffman Code Construction
 Rules for developing a Huffman code:
y0
y1
y2
y3
y4
y5
y6
y7
0.51
0.51(0)
0.20
0.20(1) 0.49(1)
0.14
0.14(1) 0.29(0)
0.08
0.08(0) 0.15(0)
0.04
0.04(0) 0.07(1)
0.02
0.02(0) 0.03(1)
0.005(0) 0.01(1)
0.005(1)
DSP Research and Technology Group ITB
0
11
101
1000
10010
100110
1001110
1001111
IURC-ME ITB
More on VLC
 More on VLC:
o Multi-symbol: combination of subsequent quantization levels are combined in a
single codeword.
o Avoid storing large code table.
o Observing data statistics of encoding output.
o More popular for text compression than audio/image/video
 Arithmetic Code is one of the best VLC
DSP Research and Technology Group ITB
IURC-ME ITB
Arithmetic Code
1111
0101
0100
0011
0010
0001
0000
P(0) = 0.7
0
1
.2401 .3430 .4459
Interval width corresponds to probability
Coding:
Find the fraction in the interval
Contoh: 0000 is coded as 0
0010 is coded as 3/8
1111 is coded as 15/16
= (0.) 0
DSP Research and Technology Group ITB
= (0.) 011
= (0.) 1111
IURC-ME ITB
VLC Lanjut III
 Ekspansi biner dapat dibuat secara rekursif
 Probabilitas dapat berubah saat coding dilakukan
0.7
0
0
1
1
0.49
0
00
0.7
01
000
Mengirim 1
001
0.4459
0010
DSP Research and Technology Group ITB
Mengirim 0
0.49
0.343
0
Tidak mengirim apapun
0.49
0011
Dan seterusnya
IURC-ME ITB
Quantization and VLC
x
Quantization
Selection
p(x)
L
yj
00101101
Decode
P(yj )
 Quantization is optimized for selected length L
 VLC is optimized for probability value yj
 This combination is not alwas optimal, because there may be
another combination that can give better rate-distortion performance
DSP Research and Technology Group ITB
IURC-ME ITB
Optimality
 Finding an optimal quantizer (with optimal R-D) is a complex problem
 A possible approach is using UTQ : Uniform Threshold Quantizer
x
Design
Parameter
yj 
j 1
 xp  x dx
x
j
x
j 1

x
DSP Research and Technology Group ITB
p ( x ) dx
j
IURC-ME ITB
Quantization with Step Scale Q
Output Q/VLC
3.5Q 1000
2.5Q 1110
1.5Q 110
0
Q 2Q 3Q 4Q
x
101
1111
10011
• Quantizer is scaled by Q
DSP Research and Technology Group ITB
IURC-ME ITB
Synchronzation in VLC
 Decoder needs to parse bits before being decoded
 Bit error can cause lost of synchronization between encoder and decoder
 Example:
 Quantization Level :
 Bit NBC
:
 Bit Huffman
:
 Asume 4-th bit is error
 NBC decoded
:
 Huffman decoded :
4 4 3 0 5 3 4
000 000 100 111 001 100 000
0 0 11 1001111 1000 11 0
4
4
3
4
3
2
0
4
5
4
3
3
4
3
5
3
4
Synchronization is lost
DSP Research and Technology Group ITB
IURC-ME ITB
Visual Quality PCM for Images
At low rate, image quality is determined by:
 Contour error
 Wide uniform area
x (n)
Contour Error
x (n)
n
n
PCM normal
DSP Research and Technology Group ITB
PCM with noise
IURC-ME ITB
Dithering
Dithering improves visual quality for PCM at low rate by breaking contour
error and simulates texture.
x(i,j)
+
+
+
Coarse
PCM
Inversi
PCM
Generator
Pseudo
Random
Generator
Pseudo
Random
Encoder
Initial Value
DSP Research and Technology Group ITB
^
x(i,j)
-
Decoder
IURC-ME ITB
Dithering
• In PCM, quantization
causes sharp error
that is not visually
pleasant.
• Error is more
acceptable if it is
independent from
image such that
human can filter the
error perceptually.
• Hence, random error is
intentionally added to
signal before
quantization: Dithering
DSP Research and Technology Group ITB
3 bpp
without
Dither
3 bpp
with
Dither
IURC-ME ITB
Channel Error Impact
DSP Research and Technology Group ITB
IURC-ME ITB
DPCM Principle: Prediction
Neighboring
Pixel
Coefficient
U
b
B a x
B a x
Orde 1
Orde 2
BL
U
BL
U
c
b
c
b
• Pixel intensity can be
estimated by
neighborhood
predictor.
• Pixel intensity = sum
(neighboring pixel
intensities x prediction
coefficients)
• Best coefficients can
be estimated using
correlation method
• In ideal cases,
predictors are
sufficient to represent
images.
B a x
B a x
Orde 3
Orde 3
DSP Research and Technology Group ITB
TL
d
IURC-ME ITB
Basic Model
Image
-
Prediction
Error Image
Prediction
Image
Prediction
Prediction
Estimation
DSP Research and Technology Group ITB
IURC-ME ITB
Slope Overload and Overshoot
Intensity
Line 1 of Image
6
Error
Overshoot
5
4
Error
Slope Overload
3
2
Prediction
Result
1
0
Spatial Sample (Width
Direction)
DSP Research and Technology Group ITB
IURC-ME ITB
Prediction Error (Orde 1)
Original
3 bpp
DPCM
DSP Research and Technology Group ITB
IURC-ME ITB
Performance of Orde 4
Original
3bpp
DPCM
1bpp
DPCM
2bpp
DPCM
DSP Research and Technology Group ITB
IURC-ME ITB
Impact of Channel Error
Orde 1
DSP Research and Technology Group ITB
Orde 2
IURC-ME ITB
Conclusions
• Compression is performed targeting factors
determining bit rates
• Spatial Subsampling reduces pixels
• PCM quantization reduces number of bits
per pixel
• Dithering can affect perceptual quality
• DPCM quantization reduces bit rates of
pixel errors with low dynamic range.
• Different compression results in different
impacts of channel error.
DSP Research and Technology Group ITB
IURC-ME ITB
Video Compression
Basics of Hybrid Coding For
MPEG
DSP Research and Technology Group ITB
IURC-ME ITB
Overview
•
•
•
•
Introduction: Video
Digital Video
Hybrid Coding
MPEG
Compression
• Closing Remarks
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Video Signal
Time
Index
1
Time
Index
L
N x M pixel /
frame
B bit/pixel
N x M x L x B bits video file
DSP Research and Technology Group ITB
IURC-ME ITB
Color Picture Sizes
Standar
CCIR 601
(165.888 Mbps)
SIF
(31.104 Mbps)
Significant Pixel
Area SIF
(30.4128 Mbps)
Komponen
30 fps (NTSC)
25 fps (PAL)
Y
720 x 480
720 x 576
Cb, Cr
360 x 480
360 x 576
Y
360 x 240
360 x 288
Cb Cr
180 x 120
180 x 144
Y
352 x 240
352 x 288
Cb, Cr
176 x 120
176 x 144
DSP Research and Technology Group ITB
IURC-ME ITB
Reconstruction Quality
• Picture Energy
– Average of
square of pixels
values in image
Original
image
• MSE
– Energy of
difference
image
Reconstruction
Image
• SNR (dB)
– 10 log
(Energy/MSE)
Pixel
Reconstruction
Error
Tinggi
Error Pixel
Has Been
Detected
• PSNR (dB)
– 10 log
(MaxPix2/MSE)
Difference
Image
Lebar
DSP Research and Technology Group ITB
IURC-ME ITB
Intra Code Only
JPEG
Coding
Sequence of
Original Images
•
•
Every frame is compressed individually (called intra code)
Advantages
–
–
–
•
Easy bit allocation
Easy random access
Robust
Disadvantages
–
•
Sequence of
JPEG Images
Bit Rate still not very low
Example: Motion JPEG
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG-H.26x Family
Tinggi
Kualitas
M
4
G
PE
6
2
H
3
M
6
2
H
M
1
G
PE
6
2
H
2=
G
PE
2
1
Rendah
0.01 0.05
0.1 0.5 1
Internet,
Mobile,
PSTN
BN-ISDN VCD DVD, ISDN,
DTV HDTV
DSP Research and Technology Group ITB
5
10
50
Bit
Rates
(Mbps)
IURC-ME ITB
Basic Strategy
Irrelevancy Elimination
• Spatial Sampling
– Picture Format
– Video Data
Structure
• Pixel Quantization
Spatial Sampling
Picture
Format
YUV 4:2:0
– Spatial Correlation
• Intraframe
Compression
• DCT and
Quantization
– Temporal
Correlation
• Prediction
• Motion
Compensation
Data
Structure
Pixel Quantization
Spatial Correlation
Intraframe
Correlation
DCT +
Quantization
Temporal Correlation
Prediction
Motion
Comptensation
• Redundancy
Elimination
– Huffman Coding
Redundancy Elimination
VLC: Huffman
DSP Research and Technology Group ITB
IURC-ME ITB
Compression: Image vs Video
Aspect
Reduce Samples
Pixel intensity quantization
Spatial pixel prediction
Temporal pixel prediction
Transform Coding
VLC / Entropy Coding
DSP Research and Technology Group ITB
Image
Yes
Yes
Yes
No
Yes
Yes
Video
Yes
Yes
Yes
Yes
Yes
Yes
IURC-ME ITB
Hybrid Coding
Prediction Error Image
Prediction Error
(Quantized)
Image
DCT & Q
VLC
-
Inverse
DCT & Q
Predicted
Image
Output
+ Reconstructed
Image
I, P, B
Selector
Original
Image
Prediction
(P and B)
0
Buffer
Frame
Motion
Estimation
DSP Research and Technology Group ITB
Buffer
Frame
Motion Vector
IURC-ME ITB
Bit Rate Reduction
•
Factor Determining Bit
rate
–
–
•
Time
Index
1
Time
Index
L
N x M pixel /
frame
Way to reduce bit rates
–
–
–
•
Image: Width x Height
x Bit per Pixel
Video: Width x Height x
Bit Per Pixel x Frame
per second
Width x Height :
Subsampling
Bit Per Pixel :
Quantization, PCM,
DPCM / Prediction,
Transform Coding,
Hybrid
Frame Per Second:
Time Sampling
B bit/pixel
Compression increases
error sensitivities
transmission or storage
N x M x L x B bits video file
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG Compression
• Reduction in the number of
pixels at spatial domain
and temporal domain of
luminance and
chrominance components
• DCT compression for type
I frame (intra and
compensation)
• Prediction image
compression based on
motion compensation, for
type P and type B.
• Huffman coding for motion
vectors and DCT
coefficients
Past
Current
Picture
Picture
Forward
Future
Picture
No
Backward
Bi
Directional
I
P
Type
B
Typical Bit Rates
I (Intra)
156 kbps
P (Forward)
62 kbps
B (Bidirectional)
15 kbps
DSP Research and Technology Group ITB
IURC-ME ITB
3 Types of Frames I, P, B
Backward Prediction
No
Prediction
I
B
B
P
B
B
P
B
B
I
No
Prediction
Forward Prediction
Compression of one GOP, I: Intra, P: Forward
Predicted, B: Bidirectional Predicted Picture
DSP Research and Technology Group ITB
IURC-ME ITB
Use Standard Format
• Conversion from
– NTSC (30 fps)
– PAL (25 fps)
• To YCbCr
– Simple Interchange
Format (SIF)
– Common
Interchange Format
(CIF)
– Quarter CIF (QCIF)
Luminance
RED
GREEN
RGB to
Y C1 C2
Chrominance 1
BLUE
Chrominance 2
Standards
Y
Cb
Cr
NTSC
Y
I
Q
PAL
Y
U
V
SECAM
Y
Db
Dr
DSP Research and Technology Group ITB
IURC-ME ITB
Color Subsampling di YUV
Luminance
Chrominance
4:2:2
•
•
4:2:0
4:2:2, Chrominance components are subsampled with a factor of 2 at
horizontal direction. Hence if Y 720 x 480, then sizes of U and V are 360 x
480
4:2:0, Chrominance components are subsampled with a factor of 2 at both
horizontal and vertical directions. Hence if Y 720 x 480, then sizes of U and
V are 360 x 240
DSP Research and Technology Group ITB
IURC-ME ITB
Color Subsampling
24 bpp
Original
16 bpp
YUV
4:2:2
12 bpp
YUV
4:2:0
10 bpp
YUV
DSP Research and Technology Group ITB
IURC-ME ITB
Digital Video Structure
A GOP consists of
several Frames
A Video signal consists of
Groups of Pictures (GOPs)
A Frame consists of
Groups of macroBlocks
(GOBs)
GOB 1
M
GOB 2
GOP 1
GOP 2
...
GOP 3
GOB L
MB 1
MB2
MB3
B1
B2
B3
B4
MB...
DSP Research and Technology Group ITB
N
A MB
consists
of four
Blocks
8
B4
8
A GOB consists of several Macro
Blocks (MBs)
One Block
has 64 pixels
IURC-ME ITB
Macroblocks and YUV File
Data Bits of
YUV file start
here
Y
Frame 1
Y
Frame 2
16
8
U
4:2:0 Macroblock from
a YUV File
V
8
Y
8
16
8
U
Frm 1
U
Frm 2
V
Frm 1
V
Frm 2
Data
Bits end
here
4:2:0 YUV File
DSP Research and Technology Group ITB
IURC-ME ITB
Motion Prediction
Motion Vector
Best Matching
Gambar Jangkar
DSP Research and Technology Group ITB
Macroblock Yang hendak di cari
Gambar Hendak Diprediksi
IURC-ME ITB
Compression of I-Image
Original Image
(Quantized)
Original Image
-
DCT & Q
VLC
Inverse
DCT & Q
Prediction Result:
Empty
+
0
Buffer
Frame
Original
Image
Output
Reconstructed
Image Stored
in Buffer
Buffer
Frame
DSP Research and Technology Group ITB
IURC-ME ITB
Compression of P-Image
Current Picture
(x,y)
Predicted
Picture
(x,y)
Past Picture
(x,y)
Best Matching
Macroblock
Current
Macroblock
Original
Image
Prediction P
Buffer
Frame
Motion Vector
Buffer
Frame
Motion
Estimation
DSP Research and Technology Group ITB
IURC-ME ITB
Compression of B-Image
“Current” Picture
(x,y)
Predicted
Picture
Past Picture
(x,y)
Current
Macroblock
(x,y)
Best Matching
Macroblock
Buffer
Frame
(x,y)
Motion
Vector 1
Pick One or
Average
Original
Image
Future Picture
Buffer
Frame
Motion
vector 2
Motion
Estimation
DSP Research and Technology Group ITB
IURC-ME ITB
An Actual Macroblock Coding
• I Picture: I Macroblock
• P Picture: P & I Macroblock
– Optional: Skipped MV and Skipped Coding
• B Picture: B, P, & I Macroblock
– Optional: Skipped Coding
[Bhaskaran et al 1997]
Picture
Type
I
I
P
3300
897
B
60
Macroblock Type
P
B
Zero
MV
8587
7365
5128
22845
DSP Research and Technology Group ITB
Skipped
568
429
IURC-ME ITB
Reordering Caused by B-Picture
Backward Prediction
No
Prediction
I
B
B
P
B
B
P
B
B
I
No
Prediction
Forward Prediction
Input
I1 B2 B3 P4 B5 B6 P7 B8 B9 I10
Input
Buffer
B2 B2 B2 B5 B5 B5 B8 B8 B8
B3 B3 B3 B6 B6 B6 B9 B9 B9
Transmit
Prediction
Buffer
I1
I1
I1
P4 B2 B3 P7 B5 B6 I10 B8 B9
I1
I1 I1 I1 P7 P7 P7 P7 P7 P7
P4 P4 P4 P4 P4 P4 I10 I10 I10
DSP Research and Technology Group ITB
IURC-ME ITB
Level and Profile MPEG-2
Profile@
Level
Simple
High
(60 fps)
(60 fps)
(30 fps)
Low
(30 fps)
4:2:2
MP@HL:
4:2:0;
1920x1152;
80Mbps; I,
P, B
4:2:2@HL:
4:2:2;
1920x1080;
300Mbps; I,
P, B
SNR
Scalable
SP@ML:
4:2:0;
720x576;
15Mbps; I,
P
MP@ML:
4:2:0;
720x576;
15Mbps; I,
P, B
Spatial
Scalable
SSP@H1440:
4:2:0;
1440x1152;
60 Mbps; I,
P, B
4:2:2@ML:
4:2:2; 720 x
608; 50
Mbps; I, P,
B
MP@LL:
4:2:0;
352x288; 4
Mbps; I, P,
B
DSP Research and Technology Group ITB
High
HP@HL:
4:2:0, 4:2:2;
1920x1152;
100 Mbps;
I, P, B
MP@H1440:
4:2:0;
1440x1152;
60Mbps; I,
P, B
High-1440
Main
Main
SNR@ML:
4:2:0; 720 x
576; 50
Mbps; I, P,
B
HP@H1440:
4:2:0, 4:2:2;
1440x1152;
80 Mbps; I,
P, B
HP@ML:
4:2:0, 4:2:2;
720x576;
20 Mbps; I,
P, B
SNR@LL
4:2:0;
352x288; 4
Mbps; I, P,
B
IURC-ME ITB
MPEG-2: Scalable
• MPEG 2 allows a
bitstream to contain
two layer: base and
enhancement
(auxiliary).
Enhancement layer
will improve quality on
complete decoders
• Multiview, for example
for three dimensional
movies
Enhancement
Layer
DSP Research and Technology Group ITB
Base Layer
Mix
IURC-ME ITB
MPEG 4
• Anticipation of
merging
– Communication
(Telkom)
– Interactivity
(Computer)
– Broadcasting
(TV/Film)
Communication
Computer
DSP Research and Technology Group ITB
TV/Film
IURC-ME ITB
MPEG- 4 System
AudioVideo
Objects
Objects
Library
A
Encoder
Mux
and
Demux
Mux
and
Demux
Local
Library
C
Decoder
Compo
sitor
B
Source
User
A
Display
DSP Research and Technology Group ITB
B
Sound
C
Interaction
IURC-ME ITB
MPEG-4: Video Object Plane
Original
Mask
VoP 1
VoP 2
DSP Research and Technology Group ITB
IURC-ME ITB
Example of MPEG-4 Process
Local
Background
VoP 1 contour,
motion, texture
coding
Segmentasi
Dan Layering
VoP 2 contour,
motion, texture
coding
Layer Encoding
DSP Research and Technology Group ITB
Bitstream
Layer VoP1
Bitstream
Layer VoP2
Separate Decoding
IURC-ME ITB
MPEG VOP Coding
• Shape: Shape
Information is stored in
Alpha Plane losslessly.
• Motion: Macro Block
based motion
estimation on VOP
• Texture: Use DCT.
• Multiplex them: Alpha
plane (14%), Motion
(4.2%), Texture (75%)
Reference Window
Shift
Standard
Macroblock
DSP Research and Technology Group ITB
VOP Window
Contour
Macroblock
Pixels
Outside
Active
Area
IURC-ME ITB
MPEG-4 Scalability
• Object
Scalability
• Quality
Scalability
• Spatial
Scalability
• Temporal
Scalability
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG-4 Flexibility
• Video tools:
– Algorithm, shape
coding module,
motion comp.
module, texture
coding module
Tools
• MSDL: MPEG-4
System Description
Language:
– Definition for
Interfaces of Coding
Tools
– Mechanism to
combine coding tools
and to construct
algorithms and
profiles
– Mechanism to
download new tools
DSP Research and Technology Group ITB
Algorithms
Profiles
IURC-ME ITB
MPEG-4 Decoder Flexibility
• Flex_0
– Standard audio, video,
and system tools (non
programmable)
• Flex_1
– Sets of audio, video,
system tools and
standard interfaces.
Downloadable algorithm
Download and
configure
MPEG 4
Player
Algorithms
Media Stream
• Flex_2
– Arbitrary algorithms
made of arbitrary tools
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG 4 For Akses Media
Information Infrastructure
PSTN
Leased
Lines
CDMA/GSM
IP
Cloud
Private
IP
Transmux
MPEG-4
Framework
User
Services
Billing
Control/
Gatekeeper
Media Switch
Local
Media
Server
Info Switch
Gateway
Media Channel
Control Channel
Info Channel
Local
Info
Server
Transmux
Analog
WLL WLAN Ethernet RS232/
Phone + V34
RS422
Fiber
Powerline
User
DSP Research and Technology Group ITB
IURC-ME ITB
Future Development
MPEG – 7
Emotion Media
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG 7 and 21
• MPEG – 7 defines
Metadata
– Title, Owner, actors,
format, color-space
– Shooting location, time,
zoom angle
– Names, events, transcripts
– Objects, faces
– Contour, textures shapes
• MPEG – 21 Structure for
Management and Digital
Media usage
– Including business
transaction, event
reporting, IP protection,
and content management
DSP Research and Technology Group ITB
IURC-ME ITB
The Future
Integrated by MPEG-4, 7, and 21
Broadcasters
Satellite &
DVB
Telephone
The Internet
Network
Wireless
Cellular
Individual
User
Individual
User
Audio
Video
Server
Audiences
Fiber Optics
Application
Server
Game
Server
Data
Server
WEB
Server
Individual
User
Individual
User
CD / TAPE / DVD /
VCD / FLASH / HD
DSP Research and Technology Group ITB
IURC-ME ITB
MPEG-7: Content Descriptions
Visual
Groups
Color
Shape
Texture
Motion
Descriptors
Notes
Color Space
RGB, YCrCb, HSV, HMMD, liniear transform of RGB
Color Quantization
Linear, nonlinear, lookup table, quantization parameters
Dominant Color
Number of DC, confidence measure, value of each color component
and each percentage
Color Histogram
Common CH, Group of Frames histogram, color-structure histogram
Compact Color Descriptor
Haar transform of color histogram
Color Layout
DCT of Y/Cb/Cr of dominant color of 8x8 blocks of reduced image
Object Bounding Box
Description of tightest rectangular box enclosing objects
Contour Based Descriptor
Curvature scale space representation of specific 2D closed boundary
Region-Based Shape Descriptor
Shape description of any 2D regions using Zernike moments
Homogeneous Texture
Energy distribution in different orientations and frequency band
Texture Browsing
Regularty, coarsenes, and directionality of texture appearances
Edge Histogram
Edge orientation distribution in the image
Camera motion
Panning, tracking, tilting, booming, zooming, dollying, rolling
Motion Trajectory
Key points of trajectories of non rigid moving objects
Parametric Object Motion
Translation, rotation/scaling, affine, planar perspective, parabolic
Motion Activity
Intensity, direction, spatial distribution activity, temporal distribution
DSP Research and Technology Group ITB
IURC-ME ITB
Emotion Digitization
Very
Active
Fear
Surprised
Extacy
Glad
Agitated
Spirited
Upset
Sure
Critical
Hoping
Possesive
Suspicious De
Accord
Serene
Deppresed
Panic
Angry
Very
Negative
Disapointed
Trust
Obey
Bored
Joy
Very
HappyPositive
Anticipation
Calm
Sad
Retreat
Very
Passive
DSP Research and Technology Group ITB
Pasrah
[Cowie et al, 2001]
IURC-ME ITB
Kesimpulan
• Multimedia technology is ‘the nextrevolution’
• The key is in media digitization
• Compression is a must for efficiency
• It is just a beginning …..
DSP Research and Technology Group ITB
IURC-ME ITB