Transcript PowerPoint

DAC50, Designer Track, 156-VB543
Parallel Design Methodology for Video Codec LSI with
High-level Synthesis and FPGA-based Platform
Kazuya YOKOHARI, Koyo NITTA,
Mitsuo IKEDA, and Atsushi SHIMIZU
NTT Media Intelligence Laboratories
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
1
Outline
•
•
•
•
•
Introduction
Proposed Design Methodology
Case Study: 4K HEVC Intra Codec
Evaluation
Conclusion
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
2
Video Codec LSI
• MPEG-2 and H.264/AVC are major standards of video coding.
• We have developed MPEG-2 video codec LSI (VASA) and
H.264/AVC codec LSI (SARA).
• The development of video codec LSI needs many simulations.
Bit Stream
Codec LSI
Test data
VASA
SARA
(MPEG-2) (H.264/AVC)
(Coded Image)
Objective evaluation examples:
BD-Bitrate, SSIM, PSNR
• Coded image should be evaluated by subjective and objective evaluation.
• Degradations of some coded images are not detected by objective evaluation.
• Subjective evaluation in real-time is important to find these degradations.
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
3
Existing LSI Design Flow
• Even behavioral design which is fastest simulation environment
needs 100 times simulation time, at the existing design flow.
• Fast simulation environment is important, since many
simulations are needed at the video codec LSI design.
SystemC
source codes
Behavioral design
Fail
Verification
Existing architecture
exploration loop
X100 (on CPU)
Pass
Stimulus
Behavioral Synthesis
RTL design
Verilog-RTL
codes
Verification
Fail
Pass
Verilog-RTL
codes
(already verified)
ASIC
6/5/2013
Simulation
Speed
Logic Synthesis
Gate-level design
X10,000 (on CPU)
X1,000 (on emulator)
P&R
FPGA
Technology
Library
X1,000 (on CPU)
X100 (on emulator)
IP core
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
4
The Problems of The Video Codec LSI Development
• Many simulations are needed at the development of the video
codec LSI.
• The simulation needs 100 times simulation time at the existing
LSI design.
• To resolve above problems, simulation and circuit design
environments are important to check and improve codec LSI
performance smoothly.
• Simulation environment: FPGA-based platform.
Real-time simulation becomes possible using FPGA.
• Circuit design environment: High-level synthesis.
Rapid prototyping becomes possible using high-level synthesis.
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
5
Video Codec Design Platform
• The video codec design platform is able to run large scale
circuit simulation in real-time using many FPGAs.
• The proposed platform enables input and output image data in
real-time using some SDI interfaces.
FPGA1
FPGA2
FPGA
(Center)
FPGA3
FPGA4
SDI interface
• The proposed platform has many FPGAs, since the scale of a product level video
codec LSI is very large.
• This platform enables simulations of a product level circuit using many FPGAs.
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
6
Proposed Video Codec Design Flow (1/2)
• Proposed design flow enables rapid prototyping using high-level synthesis.
• Proposed design flow enables real-time simulation using the proposed platform.
• Feedback time is needed by repetition of each design steps when single
architecture exploration loop is used.
SystemC
source codes
Behavioral design
Fail
Verification
Existing architecture
exploration loop
Proposed
architecture
exploration loop
X100 (on CPU)
Pass
Stimulus
Behavioral Synthesis
RTL design
Verilog-RTL
codes
Verification
Fail
Pass
Verilog-RTL
codes
(already verified)
ASIC
6/5/2013
Simulation
Speed
Logic Synthesis
Gate-level design
X10,000 (on CPU)
X1,000 (on emulator)
P&R
FPGA
Technology
Library
X1,000 (on CPU)
X100 (on emulator)
IP core
X1
(on video codec design platform)
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
7
Proposed Video Codec Design Flow (2/2)
• Circuits design is subdivided and parallel design is performed, in
order to reduce feedback time by repetition of each design steps.
• Using parallel design, architecture exploration is realized at high
speed.
SystemC
source codes
Behavioral design
Fail
Verification
Existing architecture
exploration loop
Proposed
architecture
exploration loop
X100 (on CPU)
Pass
Stimulus
Behavioral Synthesis
RTL design
Verilog-RTL
codes
Verification
Fail
Pass
Verilog-RTL
codes
(already verified)
ASIC
6/5/2013
Simulation
Speed
Logic Synthesis
Gate-level design
X10,000 (on CPU)
X1,000 (on emulator)
P&R
FPGA
Technology
Library
X1,000 (on CPU)
X100 (on emulator)
IP core
X1
(on video codec design platform)
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
8
Summary of The Proposed Design Methodology
The proposed parallel design methodology has three features.
1. High-level synthesis.
–
Using high-level synthesis, a target circuit architecture can be easily
changed and tuned compared with a RTL design methodology.
2. Video codec design platform.
–
Using video codec design platform, a subjective image evaluation can
be performed, since the proposed platform can perform simulation in
real-time.
3. Parallel design.
–
Using parallel design and high-level synthesis, the function addition in
smaller unit becomes possible that leads to the reduction of a
feedback time.
Combining these three features, an effect of subjective image quality for
each function can be evaluated and used for architecture exploration.
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
9
Case Study: 4K HEVC Intra Codec
• HEVC (High Efficiency Video Coding) is a next generation video
coding standard.
• HEVC intra codec consists of three blocks, intra prediction,
transform and quantization, and entropy coding block.
Input
Data
Intra
Prediction
Transform
and
Quantization
Entropy
Coding
Output
Stream
Video Coding
Intra Prediction
generates prediction
difference image from
input data and
predicted image data.
6/5/2013
Transform and Quantization
generates quantized values
from transformed difference
image and reconstruction
image from quantized values.
Entropy Coding
generates bit stream
from quantized values.
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
10
The Specifications of the HEVC Intra Codec
STEP1
STEP2
(LOOP#1)
STEP2
(LOOP#2)
STEP2
(LOOP#3)
Intra
Prediction
•PU: 32x32
•Prediction Mode: 4
•PU: 64x64, 16x16
Transform and
Quantization
•TU: 32x32
•TU: 16x16
Entropy Coding
•CU: 32x32
•CU: 64x64
Base
Algorithm
•HM3.0
•Prediction Mode: 7
•HM7.0
This slide’s scope.
*CU stands for Coding Unit.
*PU stands for Prediction Unit.
*TU stands for Transform Unit.
*HM is a reference software of HEVC
• Prediction Mode
18
26
34
10
0: Planar
1: DC
2
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
11
Evaluation (1/2)
Area
Circuits Performances and Design Period
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
STEP1
1
2
3
STEP2
LOOP#1
Subjective
Evaluation Period
4
5
6
7
8
9
10
11
Area 1.2
1
The
main changed points of each block.
0.8
• LOOP#1: Version up base algorithm of each block
0.6
• LOOP#2: Functional expansion of IPD
0.4
• LOOP#3: Functional expansion of each block
0.2
Cycle
500000
IPD
450000
TQ
400000
EC
350000
Cycle
300000
250000
200000
150000
100000
50000
0
17
18
Cycle
3500
Subjective
Evaluation Period
12
13
14
15
16
Feedback data is available
3000
2500
2000
1500
1000
500
0
STEP2
LOOP#2
0
1
Area
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Cycle
3500
1.2
•1 The circuit performances of each expanded
0.8
function are evaluated at STEP2.
0.6
• The feedback data is available from other design
0.4
loops at STEP2.
0.2
3000
2500
STEP2
LOOP#3
2000
1500
1000
500
0
0
1
6/5/2013
2
3
4
5
6
7
8
9
10
11
Design Period (Month)
12
13
14
15
16
17
18
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
12
Evaluation (2/2)
• Using the proposed parallel design methodology, three design
loops were able to be tried in only seven months.
• Using the proposed parallel design methodology, the number
of cycle*area was reduced to 1/5 in four months after
preliminary design of the LOOP#1 and 1/4 in three months
after preliminary design of the LOOP#2.
Cycle*Area
1.2
STEP1
STEP2
STEP1, STEP2(LOOP#1)
STEP2(LOOP#2)
STEP2(LOOP#3)
1
0.8
0.6
LOOP#1
80% down
(four months)
90% down
0.4
LOOP#2
75% down
(three months)
0.2
0
1
6/5/2013
2
3
4
5
6
7
8
9
10
11
Design Period (Month)
12
13
14
15
16
17
18
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
13
Conclusion
• We proposed that the new design methodology for
video codec LSI. Using the proposed design
methodology, we are able to reduce feedback time and
run simulation and evaluate coded image in real-time.
• Using the proposed design methodology, three design
loops were able to be tried in only seven months.
• Using the proposed design methodology, the number
of cycle * area was reduced to 1/5 in four months after
preliminary design of the LOOP#1 and 1/4 in three
months after preliminary design of the LOOP#2.
• In order to realize a HEVC codec, we need to add or
expand some functional tools, checking subjective
evaluation of these tools.
6/5/2013
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation
14