Transcript Document

Introduction to
MPEG-4
MC2008
2015/7/21
MC2009
1
Outline
•
•
•
•
Multimedia
MPEG-4 Profiles
Key Features of MPEG-4 Systems
MPEG-4
– Systems
– DMIF
•
•
•
•
Audiovisual Objects and Scene Graph
Editing, Composition and Rendering
Coding Basics
Coding Techniques
2015/7/21
MC2009
2
Multimedia
• What is multimedia?
– Combination of audio, video, image, graphic,
and text.
– Coverage of all human I/O’s.
• Why does multimedia need to be coded?
2015/7/21
MC2009
3
2015/7/21
MC2009
4
Multimedia Coding for Different
Applications
• Mobile devices
– Low data-rate, error resilience, scalability
• Streaming service
– Scalability, low to medium data-range, interactivity
• On-disk distribution (DVD)
– Interactivity
• Broadcast
– On-demand services
2015/7/21
MC2009
5
Profiles in MPEG-4
•
•
•
•
•
•
Visual Profiles
Audio Profiles
Graphics Profiles
Scene Graph Profiles
MPEG-J Profiles
Object Descriptor Profile
2015/7/21
MC2009
6
NewPred
2015/7/21
MC2009
7
H.263 Baseline
2015/7/21
MC2009
8
Key Features of MPEG-4 Systems
• Provides a consistent and complete
architecture for the coded representation of
the desired combination of streamed
elementary audio-visual information.
• Covers a broad range of applications,
functionality and bit rates.
– Through profile and level definitions, it
establishes a framework that allows consistent
progression from simple applications (e.g., an
audio broadcast application with graphics) to
more complex ones (e.g., a virtual reality home
theater).
2015/7/21
MC2009
9
Key Features of MPEG-4 Systems (2)
A set of tools for the representation of the
multimedia content
1. a framework for object description (the OD
framework),
2. BIFS: a binary language for the
representation (format) of multimedia
interactive 2D and 3D scene description,
3. SDM and SyncLayer: a framework for
monitoring and synchronizing elementary
data stream, and
4. MPEG-J: programmable extensions to
access and monitor MPEG-4 content.
2015/7/21
MC2009
10
Key Features of MPEG-4 Systems (3)
MPEG-4 System defines an efficient mapping
of the MPEG-4 content on existing delivery
infrastructures.
1. FlexMux: an efficient and simple
multiplexing tool to optimize the carriage of
MPEG-4 data (into different QoS channels),
2. Extensions allowing the carriage of MPEG4 content on MPEG-2 and IP systems, and
a flexible file format for authoring,
streaming and exchanging MPEG-4 data.
2015/7/21
MC2009
11
MPEG-4
IS0/IEC 14496 Terminal Architecture
m edia aw are
delivery unaw are
IS O /IE C 14496-2 V isual
IS O /IE C 14496-3 A udio
C om pression L ayer
E lem entary
S tream
Interface
(E S I)
m edia unaw are
delivery unaw are
S ync L ayer
D M IF
A pplication
Interface
(D A I)
IS O /IE C 14496-1 S ystem s
m edia unaw are
delivery aw are
IS O /IE C 14496-6 D M IF
2015/7/21
D elivery L ayer
MC2009
12
Systems
D isp lay an d
U ser
In teraction
Inte ractive A ud io visual
S cen e
•
•
•
•
Timing Model
Buffer Model
Multiplexing of Streams
Synchronization of
Streams
• The Compression Layer
– Object Description
Framework
– Scene Description
Streams
– Audio-visual Streams
– Upchannel Streams
C om po sition and R end ering
...
O bject
D escrip tor
S cen e
D escrip tio n
Inform atio n
U pstrea m
Inform atio n
A V O b ject
d ata
E lem en tar y S tr ea m s
SL
SL
SL
SL
SL
C om pressio n
L ayer
SL
E lem en tary S trea m In terfa ce
...
SL
Sy nc
L ayer
S L-P a c ke tiz ed Stre a m s
D M IF A p plic ation In te rfac e
F lex M u x
(P E S )
M P E G -2
TS
F lex M u x
(R T P )
UDP
IP
AAL2
ATM
F lex M u x
H 22 3
PSTN
DAB
M ux
D elivery
L ayer
...
M ultiple xe d S tre am s
2015/7/21
MC2009
T ransm issio n/S torage M edium
13
Systems Decoder Model
D M IF A p p licat ion Int erface
D e c o din g
Buf f e r D B1
D e c o de r
1
CB
D e c o din g
Buf f e r D B2
D e c o de r
Co m p o sit io n
M em o ry
2
2
D e c o din g
Buf f e r D B3
(encaps ulates
D em ultiplexer )
2015/7/21
Co m p o sit io n
M em o ry
1
D e c o din g
Buf f e r D Bn
D e c o de r
n
CB
C om positor
Co m p o sit io n
M em o ry
n
CB
E le m e n ta ry S tre a m In te rfa c e
MC2009
14
2015/7/21
MC2009
15
IS0/IEC 14496 Terminal
Architecture
m edia aw are
delivery unaw are
IS O /IE C 14496-2 V isual
IS O /IE C 14496-3 A udio
C om pression L ayer
E lem entary
S tream
Interface
(E S I)
m edia unaw are
delivery unaw are
S ync L ayer
D M IF
A pplication
Interface
(D A I)
IS O /IE C 14496-1 S ystem s
m edia unaw are
delivery aw are
IS O /IE C 14496-6 D M IF
2015/7/21
D elivery L ayer
MC2009
16
Network-based Multimedia
System
2015/7/21
MC2009
17
The Objectives of DMIF
Delivery Multimedia Integration Framework
• to hide the delivery technology details from the
DMIF User
• to manage real time, QoS sensitive channels
• to allow service providers to log resources per
session for usage accounting
• to ensure interoperability between end-systems
2015/7/21
MC2009
18
D M I F
T h e m u ltim ed ia con ten t d elivery in tegration fram ew ork
B roadcast T echnology
Interactive
N etw ork T echnology
C able,
S atellite,
etc.
CD,
DVD,
etc.
Internet,
ATM,
etc.
2015/7/21
S torage T echnology
MC2009
19
D M IF F ilter
DMIF Communication Architecture
O rig inating
App
O rig inating
D M IF
fo r B ro ad cast
T arg et D M IF
T arg et A p p.
O rig inating
D M IF
fo r L ocal Files
T arg et D M IF
T arg et A p p.
O rig inating
D M IF
fo r R em o te srv
DAI
S ig
m ap
B ro ad cast
so u rce
L ocal
S torage
signaling
N etw o rk
DNI
T arg et
App
T arg et D M IF
S ig
m ap
DNI
DAI
F low s b etw een ind epen dent sy stem s (no rm ative)
2015/7/21
MC2009
F low s in tern al to a sing le sy stem (either in form ativ e o r o u t of D M IF sco pe)
20
High View of a Service Activation
O riginating P eer
T arget P eer
4
A pp 1
A pp
1
A pp 2
3
2
D M IF
In stance
D M IF
In stance
O riginating peer
T arget peer
A pp1
A pp A
D M IF
In stance
D M IF
In stance
A pp B
A pp2
C on trol plan e conn ectiv ity
U ser plane connectivity
2015/7/21
MC2009
21
Audiovisual Objects
• Audiovisual scene is with “objects”
• Mixed different objects on the screen
• Visual
–
–
–
–
Video
Animated face & body;
2D and 3D animated meshes
Text and Graphics
• Audio
–
–
–
–
General audio – mono, stereo, and multichannel
Speech
Synthetic sounds (“Structured audio”)
Environmental spatialization
2015/7/21
MC2009
22
Example of MPEG-4 Video
Objects
Rectangular shape
video object
2015/7/21
Arbitrary shape
video object
MC2009
Animated Face
23
From Olivier Avaro
2015/7/21
MC2009
24
The Scene Graph
2015/7/21
MC2009
25
1.
2.
3.
4.
5.
Composition
Description & Synchronization
Delivery of streaming data
Interaction with media objects
Management and identification
of intellectual property
2015/7/21
MC2009
26
Major Components
2015/7/21
MC2009
27
Media Objects
Composition
Rendering
Scene Graph
2015/7/21
MC2009
28
Adding or Removing Objects (1)
–
=
+
2015/7/21
MC2009
29
Adding or Removing Objects (2)
2015/7/21
MC2009
30
From Igor S. Pandžić
Adding or Removing Objects (3)
• Applications
– Video conferencing
• Real-time, automatic
• Separate foreground (communication partner) from
background
– Object tracking in video
• May allow off-line and semi-automatic
• Separate moving object from others
2015/7/21
MC2009
31
MPEG-4 Coding Basics
2015/7/21
MC2009
32
Toolbox Approach
tools for
natural
scenes
tools for
synthetic
scenes
TOOLS
ALGORITHMS
PROFILES
2015/7/21
MC2009
33
Coding Techniques
• Video objects
– Shape
– Motion vectors
– texture
• Audio objects
– MPEG
– AAC (Advanced Audio Coder)
– TTS (Text-To-Speech)
• Face and Body
– Animation parameters
• 2D Mesh
– Triangular patches
– Motion vector
2015/7/21
MC2009
34
Content-based Audio-Visual
Representation
• Audio-Visual Object (AVO)
• Video object component (video object
plane, VOP)
– natural or synthetic
– 2D or 3D
• Audio object component
– mono, stereo or multi-channel
2015/7/21
MC2009
35
Video Object Planes (VOP)
• Characteristics of VOP
– may have different spatial temporal resolutions
– may be associated with different degrees of
accessibility  sub-VOPs
– may be separated or overlapping
• VOP type
– Traditional I, P, B type
– S-VOP (Sprite) for background
2015/7/21
MC2009
36
Video Object Plane Type
S-VOP
Time
S-VOP
B-VOP
B-VOP
B-VOP
I-VOP
2015/7/21
B-VOP
B-VOP
P-VOP
MC2009
B-VOP
P-VOP
37
Content-based Object
Manipulation
• Object manipulation
–
–
–
–
–
–
change of the spatial position of a VOP
application of a spatial scaling factor to a VOP
change of the speed with which an VOP moves
insertion of new VOPs
deletion of an object in the scene
change of the scene area
2015/7/21
MC2009
38
Segmentation Process
• Depending on applications, segmentation can
be perform
– Online (real-time) or offline (non-real-time)
– Automatic or semi-automatic
• Examples
– Video conferencing
• real-time, automatic
• separate foreground (communication partner) from
background
– Object Tracking in Video
• May allow off-line and semi-automatic
• separate moving object from others
2015/7/21
MC2009
39
Compression
• Improved coding efficiency
– 5-64 kbps for mobile applications
– up to 20Mbps for TV/film applications
– subjectively better quality compared to existing
standard
• Coding of multiple concurrent data streams
– can code multiple views of a scene efficiently,
e.g. stereo video
2015/7/21
MC2009
40
Coding VO in MPEG-4
• Reduce temporal
redundancy
• Motion estimation for
arbitrary shaped
VOPs
– padding and modified
block (polygon)
matching motion
estimation
P-VOP
IVOP
2015/7/21
MC2009
BVOP
time
41
Encoding of Visual Objects
• Binary alpha block
– Motion vector
– Context-based
arithmetic encoding
• Texture
2015/7/21
MC2009
– Motion vector
– DCT
42
New Coding Features
• For each macroblock, the motion vectors can be
computed on a 16  16 or 8  8 block basis
• Unrestricted motion estimation: prediction can
extend over image boundary
• Overlapped block motion compensation
• Each component of texture can range from 1 to
12 bits
• More robust coding
2015/7/21
MC2009
43
Robust Video Coding
• Resynchronization
– Allow insertion of resync marker within each VOP
– Video packet header: include macroblock number,
qunatizer value and timing information
• Data partition
– Allow shape, motion and texture data to be separated
within a packet
• Reversible VLC
– Offer partial recovery from errors.
2015/7/21
MC2009
44
Sprite VOP
• Represent background image
• Can be used for very efficient coding of
scenes involving camera pan and zoom
• Much larger than the size of image and
thus require more memory
2015/7/21
MC2009
45
Example of Sprite VOP
2015/7/21
MC2009
46
Object Mesh
• Useful for animation, content manipulation,
content overlay, merging natural and synthetic
video and others
• Tesselate with triangular patches
• Define motion vector for each node
– 2D motion of video objects are represented by the
motion vectors of the node points
– Motion compensation is achieved by warping of
texture map corresponding to patches by affine
transform
2015/7/21
MC2009
47
Example of Object Mesh
2015/7/21
MC2009
48
Face Animation
• Face model
– Default face model
– Download from the encoder
• Low-level facial animation
– A set of 66 facial animation parameters
• High-level facial animation
– A set of primary facial expression like joy, sadness,
surprise and disgust
• Speech animation
– 14 visemes for mouth shape
– Text-to-speech synthesizer
2015/7/21
MC2009
49
Facial Animation
2015/7/21
MC2009
50
From Eine Übersicht
Still Texture Coding
• Discrete Wavelet Transform (DWT)
– Spatial and quality scalability
• Use 2D Daubechies (9, 3)-tap biorthogonal filter
• Lowest band is lossless coded by arithmetic
coding
• Higher bands are coded by multilevel
quantization, zero-tree scanning and arithmetic
coding
2015/7/21
MC2009
51
Audio Coding
• Different bit-rates, different types of source
material and different algorithms
• Combination of parameter based coding, LPCbased coding, time/frequency based coding
• High quality speech with 2 kbps: Harmonic
Vector eXcitation Coding (HVXC)
• Text-to-Speech (TTS)
2015/7/21
MC2009
52
Natural Audio Coder
Quality
General audio
(AAC, TwinVQ)
CD
FM
AM
Parametric audio
(HILN)
Parametric speech
(HVXC)
High quality speech
(CELP)
Telephone
Cellular
2
2015/7/21
4
8
MC2009
16
32
64
kbit/s
53
From Olivier Dechazal
Multiview Video
2015/7/21
MC2009
54
Stereo Sequence Coding
• Multiview profile of MPEG-2
Right
view
P
B
B
B
Left
view
I
B
B
P
• Coding left view seqence Sl, first, for the right view sequence, each
frame is predicated from the corresponding frame in Sl, based on an
estimated disparity field and the prediction error image are coded.
2015/7/21
MC2009
55
Intermediate View Synthesis
xc,n
xl,n
x c ,n 
2015/7/21
D cr
D cl  D cr
xr,n
x l ,n 
D cl
D cl  D cr
MC2009
x r ,n
56
Original left
Original right
Regular mesh
on the left
image
Corresponding
mesh on the
right image
Predictive right
image by BMA
(32.03 dB)
Predictive right
image by mesh
(27.48 dB)
The mesh-based scheme yields a visually more accurate prediction
2015/7/21
MC2009
57
MPEG-4 Coding Techniques
Shape Coding
Shape-adaptive DCT
Object-based Inter-frame Coding
Overlapped Motion Estimation
Bit-plane Coding and FGS
2015/7/21
MC2009
58
Object-Based Coding
2015/7/21
MC2009
59
Shape Coding
• Bitmap Coding
– Context-Based Arithmetic Encoding (CAE)
• Contour Coding
– Chain Coding
– Baseline Shape Coding
– Polygon Approximation
– Skeleton-Based Shape Coding
• Quadtree Coding
2015/7/21
MC2009
60
Context-Based Arithmetic
Encoding
16
16
Transparent
block
Opaque
block
2015/7/21
MC2009
Conditional entropy
coding
Boundary
blocks
BOUNDING
BOX
61
Context-Based Arithmetic
Encoding
16
16
Transparent
block
Boundary
blocks
Conditional entropy coding
Opaque
block
2015/7/21
BOUNDING
BOX
MC2009
62
Chain Coding
starting points
003 00 33 323 32 2
212 12 11 0 01 1
1
0
2
3
2
1
3
4
0
5
4 - connected
2015/7/21
8 - connected
MC2009
7
6
63
Chain Coding
starting points
070665644332012
1
0
2
3
2
1
3
4
0
5
4 - connected
2015/7/21
8 - connected
MC2009
7
6
64
Differential Chain Code
• DCC records the move (forward, leftward or
rightward) regarding two consecutive
directional links.
F
L
2015/7/21
R
F F R LF
MC2009
65
Baseline Shape Coding
Trace and get distances
S1
S2 S3 S4 S5 S6
S7
S25
S9 S8
S24
S23
S10
S11
S22
S21
Distance between contour
sample S23 and the
baseline
2015/7/21 : D(S23)
S12
S13
S20 S19 S18 S17 S16
S15
S14
Baseline (horizontal)
MC2009
: TPs (S7, S9, S12, S22)
66
Polygon Approximation
d2
d1
d3
• Select vertices that are optimal in the rate-distortion sense.
• Splines are adopted to approximate the contour.
2015/7/21
MC2009
67
Skeleton-Based Shape
Coding
S  {(x,y,d)|m
k
(x  d  1,y)  0 and m k (x  d  1,y)  0 and m k (x  a,y)  1
and m k (x  a,y)  1
2015/7/21
for  a with 0  a  d}.
MC2009
68
Quadtree Coding
2015/7/21
MC2009
69
Shape-adaptive DCT
2015/7/21
MC2009
70
Inter-frame Coding
Reconstruction of Object Shape
O bject in previous fram e
D on’t care region
O bject shape of current fram e
O bject shape of previous fram e
D ifference > Threshold
Polygon m atching
O bject region
O bject in current fram e
Identified by In -O ut type
Identified by polygon m atching
MVS = MVPS + MVDS
MVS: MV for shape
MVPS: predication
MVDS: difference (BAC)
2015/7/21
MC2009
71
The context for Inter-frame Coding
O bject shape of
current fram e
A lread y coded
N ot yet coded
A lread y coded
E stim ated object
shape
B ackground pixel
O bject pixel
2015/7/21
P ixel to be coded now
MC2009
72
Overlapped Motion Estimation
2015/7/21
MC2009
73
Weighting Coefficients in
Overlapped Motion Estimation
2015/7/21
MC2009
74
Fine Granularity Scalable
Good
Moderate
Bad
Low
2015/7/21
Channel bitrate
MC2009
High
75
FGS Video Encoder Structure
zig-zag
scan
record
sign bits
DCT residuals
Input video
-
DCT
bit-plane
coding
enhancement
layer bitstream
VLC
Q
VLC
Base layer
bitstream
Q 1
Motion
Compensation
Motion
Estimation
2015/7/21
IDCT
Frame
memory
MC2009
76
Bit-plane Coding
quantized residual
5 7 8 7 6 2 0 4 3 8 1 2 3 0 3 5
4 6 8 6 6 2 0 4 2 8 0 2 0 0 0 4
binary transfer
MSB 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 1
1 0 0
1 0
0 1 0 1 1 1 0 0 1 0 0 1 0
LSB 0
1 0
1 0 0
1 0 0 0 0 0
1 0 0
1 0 0
1 0 0
1 0
1
reordering
001000000100000011011001000000010101110010011010110
1000010101011………
run-length coding
Enhancement layer bitstream
2015/7/21
MC2009
77
FGS Video Decoder Structure
enhancement
layer bitstream
Base layer
bitstream
VLD
VLD
bit-plane
decoding
Q 1
add
sign bits
inverse
zig-zag
enhancement
layer video
base layer
video
IDCT
Motion
Compensation
Frame
memory
2015/7/21
MC2009
78
Binary Shape Encoder
2015/7/21
MC2009
79
Padding
2015/7/21
MC2009
80