Click to add title

Download Report

Transcript Click to add title

Architecture of Global Motion Compensation for
MPEG-4 Advanced Simple Profile
Yi-Hau Chen, Ching-Yeh Chen, Liang-Gee Chen,
IEEE International Symposium Circuits and Systems 2005
김 용 환
2015-07-17
1
CONTENT
• ABSTRACT
• INTRODUCTION
• GLOBAL MOTION COMPENSATION IN MPEG-4 VM
• PROPOSED GMC HARDWARE ARCHITECTURE
• SIMULATION RESULTS
• CONCLUSION
2
ABSTRACT
•
Global motion compensation (GMC) is an important coding tool in MPEG-4
Advanced Simple Profile (ASP).
•
In this paper, we propose an efficient GMC hardware architecture for
MPEG-4 ASP@L5.
•
Based on analysis of affine model, the proposed memory arrangement and
cascaded scheduling reduce the impact of irregular memory access and
improve processing ability.
3
INTRODUCTION (1)
•
COMPARISON OF MPEG-4 SP AND ASP
Simple Profile
Advanced Simple Profile
• The basis of all MPEG-4 video profiles; to some extent
compatible with H.263
• Based on Simple Profile, adds advanced coding tools to SP
• Suited for low bit rates, from 10 kbit/s upwards,
and for low latency applications.
• Supports a wide range of bit rates from narrowband (56 kbit/s) to
broadband (300-750 kbit/s), and broadcast SD to HD (1-8+ Mbit/s).
• Industrial quality control, low complexity desktop video,
mobile video, use close circuit video surveillance,
teleconferencing or video telephony, more.
• Broadcast, Unicast and Multicast applications, advanced Internet Streaming,
media asset management,browsing, VOD, education, security, more
• Adopted by 3GPP for wireless video streaming and ISMA (Level 0)
for narrowband internet streaming.
• Adopted by ISMA (Level 1) for broadband internet streaming.
Used in many consumer devices such including some DVD players and PDAs.
• Includes error resilience tools
• Includes interlace Support
• Real-time software encoding easy to achieve
• Better coding efficiency at higher quality levels and bitrates.
• Can be Implemented in small or resource-constrained devices.
• Encoder and decoder is more complex;
encoding and decoding in software possible on modern personal computers
• Coding of Intra (I) and Predicted (P)Frames
• Adds support for coding of B-Frames (Bidirectionally interpolated)
• ½ Pixel accurate motion compensation
• ¼ Pixel accurate motion compensation
• Block-based (16x16, 8x8 blocks) motion compensation
• Adds global motion compensation (GMC) with up to 6 parameter affine model
4
INTRODUCTION (2)
• The number of reference points is up to four in MPEG-4 ASP.
• In this case, a perspective model can be defined as :
x′= (m0x + m1y + m2)/(m6x + m7y + 1)
Y′= (m3x + m4y + m5)/(m6x + m7y + 1)
– (x,y) : coordinate of a pixel in the current frame,
– (x′,y′) : coordinate of the corresponding pixel in the reference
• Global motion parameters
–
–
–
–
m0
m1
m2
m6
,m4
,m3
,m5
,m7
:
:
:
:
scaling factors
rotation factors
translation factors
tilt factors
5
GLOBAL MOTION COMPENSATION IN MPEG-4
VERIFICATION MODEL (1)
• Global Motion Models for GMC
– MPEG-4 ASP defines five kinds of motion models
• Stationary, translational
– simpler, represent video or translation of camera like LMC
• Isotropic, affine models
– more complex, both can support scaling and rotation
• Perspective model
– Affine model can be defined as follow
• x = m0 x + m 1 y + m 2
• y = m3 x + m4 y + m5
– In this paper, our GMC hardware implementation targets on
processing four global motion models including stationary,
translation, isotropic, and affine models.
6
GLOBAL MOTION COMPENSATION IN MPEG-4
VERIFICATION MODEL (2)
•
Analysis of Affine Model
– Needs six equations
• Calculate the affine parameters
–
–
–
–
–
–
–
–
i′(x, y) = (m0H′x + m1W′y)/(W′H′) + m2
j′(x, y) = (m3H′x + m4W′y)/(W′H′) + m5
m0 = (W′(x′1 − x′0))/W
m1 = (W′(x′2 − x′0))/W
m2 = x′0
m3 = (H′(y′1 − y′0 ))/H
m4 = (H′(y′2 − y′0 ))/H
m5 = y′1
–
W, H
» width and height of video sequences
W = 2α, H = 2β, W′ ≥ W, H′ ≥ H, α > 0, β > 0
–
• Operation of affine transformation in one 16×16
luma block is reduced from 1024 multiplications
and 1024 additions to only 4 multiplications and
514 additions
7
PROPOSED GMC HARDWARE ARCHITECTURE (1)
• Based on the GMC algorithm in MPEG-4 VM and analysis in
Section II-B, a hardware architecture for GMC in MPEG-4
ASP@L5 is proposed
• Four major components
–
–
–
–
GMC Controller
Global Motion Parameter Generator
Macroblock Setting
Warping with Local Memory
8
PROPOSED GMC HARDWARE ARCHITECTURE (2)
• Global Motion Parameter Generator
– One frame only has one set of global motion parameters
– Global motion parameter generator is executed once in each frame
– One processing element to calculate global motion parameters is
enough
– This processing element can handle four motion models except
perspective model
– Multipliers are shared with Macroblock Setting and Warping to reduce
overall overhead
9
PROPOSED GMC HARDWARE ARCHITECTURE (3)
•
Macroblock Setting
– (a) shows a prototyping pixel distribution of LMC
– (b) shows the pixel distribution of GMC which leads to irregular memory
access.
– In Macroblock Setting, the processing element generates the left-top corner
pixel’s location and then calculates the deformation block’s boundaries
– For example, if m3 < 0(rotation), we can choose the right-top corner point to
decide upper boundary without considering left-top corner point as shown in
(b)
10
PROPOSED GMC HARDWARE ARCHITECTURE (4)
• Warping
– Warping Controller
• Reading reference frame from external memory
• Accessing neighboring pixels from local memory
– External Address Generator
• Addresses of reference pixels of the block
– Corresponds to the current macroblock in raster scan order
– Memory Location Decision arranges the reference data in local memory
– Warping Address Generator
• Corresponding positions of current pixels in the reference frame
– Scaling, rotation, and translation factors
• Corresponding pixel
–
–
Usually not located in integer grid
The luminance and chrominance of the reference pixels are interpolated by bi-linear
Interpolation Filter
Block diagram of the scanline-based x-coordinate address PE.
m0, m1, m3, and m4 are affine parameters
11
PROPOSED GMC HARDWARE ARCHITECTURE (5)
• Local Memory
– In GMC algorithm, the irregular memory access increases the
difficulty of accessing reference data from external frame memory
– The interpolation of GMC
• Needs four neighboring pixels
• Results in high external memory bandwidth
– The whole flow of accessing memory data is as follows.
• All four local memory banks are separated into two parts.
– The first four row data
» first part of four local memory
– later four row data
» second part of four memory
– Hence, at most eight row of data are saved in local memory at same time
– As soon as the loaded reference data in local memory is enough for
Interpolation Filter to access
12
PROPOSED GMC HARDWARE ARCHITECTURE (6)
• Cascaded scheduling
– Global motion compensation
• predicted luminance block
• two chrominance blocks
– For example, after reconstructing luminance block, the warping
address generator is stalled until enough reference data of Cbblock are loaded into local memory
– Once the reference data of luminance block have all been loaded
into local memory, the warping controller finds out the reference
data that have been out of corresponding region of current row
according to scaling and rotation factors
– It can reduce processing cycles of GMC for one macroblock and
make it more suitable to be integrated into MPEG-4 ASP
encoders/decoders
13
SIMULATION RESULTS (1)
• Proposed Architecture
– The simulation results of memory bandwidth and processing ability of
proposed hardware architecture under 25 MHz
– By use of local memory and interleaved memory arrangement, it can
access four neighboring pixels in one cycle, the external memory ban
dwidth can be reduced 60.94%
– With cascaded scheduling, the processing ability of proposed architec
ture can improve about 12%, and it can process about 31 fps at worki
ng frequency of 25 MHz
14
SIMULATION RESULTS (2)
• Hardware Implementation
– MPEG-4 ASP@L5.
– frame size
• 720×576 with 30 fps
–
–
–
–
frequency is only 25 MHz
Verilog-HDL and synthesized with SYNOPSYS Design Vision
0.18um cell
total gate
• 19.3 K
– internal memory
• 1.28 Kb
15
CONCLUSION
• In MPEG-4 ASP, global motion compensation is an important
coding tool. But there are few GMC hardware architectures
• In this paper, we simplify the computation of GMC and propose an
efficient hardware architecture
• By use of interleaved memory structure and cascaded scheduling,
the external memory bandwidth saved 60.94%
• This architecture is suitable to be integrated into MPEG-4 ASP
encoders and decoders
16
Global Elimination Algorithm and Architecture Design
for Fast Block Matching Motion Estimation
•
Designer: Y.-W. Huang
–
–
–
–
–
–
–
Technology: TSMC 0.35μm 1P4M
Chip size: 3.679mm x 4.001mm
Supply voltage: 3.3V
Working frequency: 25.0MHz (normal) 27.8MHz (max.)
Power consumption: 272.3mW @ 25.0MHz
Transistor counts: 357551
CIF 38fps [-16, +15], quality near full-search
17