Implementation in Hardware of Video Processing Algorithm Supervisor :

Download Report

Transcript Implementation in Hardware of Video Processing Algorithm Supervisor :

Implementation in Hardware of
Video Processing Algorithm
Semesterial project
SPRING 2008
Performed by:
Supervisor :
Yony Dekell & Tsion Bublil
Mike Sumszyk
1
High Speed Digital System Lab
Project Goals
 Real time video signal filtering based on
nonlinear diffusion algorithm.
• Studying the algorithm of nonlinear
diffusion.
• Studying the work environment of Synplify
DSP.
• Implementing on FPGA, a real time video
processing algorithm.
2
NON LINEAR DIFFUSION
• It aims at filtering an image
• The filtered image is the solution of a
nonlinear differential equation
• This equation is called the nonlinear
diffusion equation:

1
I t  div 
 1  I

2




• Since no analytic solution is known, we
need to solve it iteratively
3
ALGORITHM FEATURES
• It smoothes the image without damaging the edges
• It is an iterative algorithm
• The more iterations, the more effect you get
• Highly computationally demanding
• Real time implementation possible only in hardware
Original image
15 iterations
40 iterations
80 iterations
4
CONTENTS
Part 1: Simulation in Simulink
environment
Part 2: Working directly with the
FPGA
5
PART 1
Short reminder of what has been
done till the midterm presentation
6
FROM Simulink to
SynplifyDSP
We had to change the original Matlab/Simulink
design because:
1) We choose not to use any buffer between the
DVI connection and the processing of the input
2) In the Simulink design we use matrixes to
represent images, but SynplifyDSP can only
use vectors
7
REAL TIME DERIVATION
Matrix derivation
Vector derivation
0
0
false result
0
0
0
0
true result
false result
8
SynplifyDSP DESIGN
 R 
G 
 B 
9
SynplifyDSP DESIGN
 R 
G 
 B 
10
FIXED POINT PRECISION
• In Matlab and Simulink we work at full precision
• But when we working with FPGA, one needs to
use fixed point precision
• We check for each block how many bits it use.
11
SynplifyDSP SIMULATION
12
SynplifyDSP SIMULATION
13
SIMULATION RESULTS
ORIGINAL IMAGE
SynplifyDSP RESULT
AFTER 30 ITERATIONS
14
Matlab AND SynplifyDSP
COMPARISON
• We measure the error between the Matlab code output
and the SynplifyDSP output.
• For 30 iteration: relative root MSE = 1.9481% per pixel
15
ß PARAMETER
Let’s simplify the diffusion equation:

1
I t  div 
 1  I

2

  It  F


Now let’s show how one can get an iterative solution:
I (t  1)  I (t )
It  F 
F
dt
 I (t  1)  I (t )  dt  F
 I (t  1)  I (t )  dt  F
We define ß in the following way:   dt
In our implementation this parameter can be changed
online !!!
16
Original image
17
NLD image ß=0.1 iterations=10
18
WORK FOLW – Matlab & Simulink
STAGE
DSP
PRO
• Algorithm design
• Simulation and error measurement
19
WORK FOLW – SynplifyDSP
STAGE
DSP
PRO
• VHDL code generation
20
WORK FOLW – SynplifyPRO
STAGE
DSP
PRO
• Synthesis the VHDL code to logic schema
• Creats a VQM file
21
WORK FOLW – ProcWizard
STAGE
DSP
PRO
• Built the VHDL code of the board interface
22
WORK FOLW – Quartus
STAGE
DSP
•
•
•
•
PRO
Configuration of the interface VHDL code
Link the VHDL interface to the VQM file
Place and route
Creates RBF file
23
WORK FOLW – ProcWizard
STAGE
DSP
PRO
• Load the FPGA with the RBF file
24
PART 2
Working with the FPGA
25
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking check.
26
MEMORY LACK
We use ROM block to implement “pow” function
•There isn’t enough ROM to load more than one
iteration on FPGA
•High MSE
27
MEMORY LACK
We replaced the ROM by “DIV” and 3 “CONVERTER”
This solution give us a 0.2% MSE for one iteration. 28
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking check
29
FREQUENCY PROBLEM (PIPELINE)
•Highest frequency 18MHz.
•Pipeline at the hardware not at the logic level.
30
FREQUENCY PROBLEM (PIPELINE)
To implement a correct pipeline we use the
SynplifyDSP program:
This solution gave us a frequency of 107MHz
31
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
32
SIMPLE DESIGNS
• We got noise
• To understand the problem we built 2 simple
designs: 1. ”delay”, 2. “overhead_test”
1.delay
2.Overhead test
• We still had noise that come from the DVI input 33
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
34
CHECKING THE CARD AT GIDEL
• Cleaning the card at the lab and switching the
DVI cables
• Checking the card at Gidel:
1. Automatic card test
2. Test with a new PSDB
• The board and the daughter board worked fine
35
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
36
DVI CONNECTION
Sites on DVI PSDB
FPGA
From
graphics
card
24 data bit
12 double data
rate bit
DVI
TRANSMITTER
DVI
RECEIVER
3 control bit
Clk
24 bit
24 bit
To screen
3 control bit
Clk
24 bit
12 MSB
12 MSB
12 LSB
12 MSB
12 LSB
12 LSB
37
INVALID DVI CONNECTION
Sites on DVI PSDB
FPGA
From
graphics
card
24 data bit
DVI
RECEIVER
3 control bit
Clk
12 MSB data
bit
Synplify
DSP VHDL
code
12 LSB data
bit
12 double data
rate bit
MUX
To screen
DVI
TRANSMITTER
3 control bit
Clk
Clk
38
VALID DVI CONNECTION
Sites on DVI PSDB
FPGA
From
graphics
card
24 data bit
DVI
RECEIVER
3 control bit
Clk
12 MSB data
bit
Synplify
DSP VHDL
code
12 LSB data
bit
12 double data
rate bit
DVI
TRANSMITTER
DDR
3 control bit
To screen
3 control bit
Clk
Phased Clk
Clk
‘1’
‘0’
DDR
PLL
39
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
40
WHY DOES IT WORK
DDR
12 MSB
FF
12-BIT DATA
TO SCREEN
TRANSMITTER
12 LSB
Transmitter
clk
FF
clk
DDR clk
Tpd -ff
Tcd -ff
DDR
1# LSB Tpd -mux
Data out
Tcd -mux
Tsu
Thold
Tpd -ff
Tcd -ff
1#
MSB
2# MSB
Tpd -mux
2#LSB
Tpd -mux
Tcd -mux
Tcd -mux
Tsu
2#
MSB
3# MSB
Thold
Transmitter
clk
41
WHY DOES IT WORK
DDR
12 MSB
FF
12-BIT DATA
TO SCREEN
TRANSMITTER
Phased clk
12 LSB
FF
PLL
clk
DDR clk
Tpd -ff
Tcd -ff
DDR
1# LSB Tpd -mux
Data out
Tpd -ff
Tcd -ff
1#
MSB
2# MSB
2#LSB
Tpd -mux
Tcd -mux
Tcd -mux
Tcd -mux
Tsu
Tpd -mux
Thold
Tsu
2#
MSB
3# MSB
Thold
Transmitter
clk
Tsu
Thold
Phased clk
42
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
43
DVI CONNECTION
FPGA
From
graphics
card
24 data bit
12 double data
rate bit
To screen
DVI
TRANSMITTER
DVI
RECEIVER
3 control bit
Clk
3 control bit
Clk
44
DVI CONNECTION
FPGA
From
graphics
card
24 data bit
12 double data
rate bit
To screen
DVI
TRANSMITTER
DVI
RECEIVER
3 control bit
Clk
3 control bit
Clk
45
RECEIVER CONFIGURATION
•“overhead_test” worked perfect
• But “delay” and “NLD” still had noise
• We found that the solution is to configure
differently the receiver
BAD CONFIGURATION
Valid data and
control signal
GOOD CONFIGURATION
Valid data and
control signal
FPGA clk
obtained
from the
receiver
46
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
47
IDEAL CONTROL SIGNAL
WAVEFORMS
• “NLD” works perfect
• Need to check the control signals with scope
48
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem (pipeline)
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
49
CLOCK SIGNAL
AT SCOPE
EXPECTED CLOCK
50
CONTROL SIGNALS AT SCOPE
hsync
vsync
19”TFT LCD SXGA monitor data
sheet
enable
This signal
caused when
vsync=‘1’.
51
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem –pipeline
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
52
ITERATION AND FREQUENCY
TRADEOFF
The more we pipelined our design in order to get higher
frequencies, the less iterations we can load
7 iterations
and 53MHz
12 iterations and
24.01MHz
PROJECT PROGRESS
•
•
•
•
•
•
•
•
•
•
•
Memory lack
Frequency problem –pipeline
Simple designs
Checking the card at Gidel
Gidel’s advice
Why does it work
Receiver configuration
Ideal control signal waveforms
Control signals on scope
Maximum iteration minimum frequency
Blanking test
54
BLANKING CHECK
We built a special design which count the clock
cycles for row, blanking and data.
AND THE BIG BONUS……..
56
JOINING OF FOUR FPGA’S
DVI PSDB
1’st FPGA
Control signals
11
ITR
(vqm)
Data
REC
clk
PLL
clk
TRNS
PLL
DDR
3’rd FPGA
2’nd FPGA
clk
clk
Control signals
11
ITR
(vqm)
Data
PLL
clk
PLL
clk
4’th FPGA
Control signals
11
ITR
(vqm)
Data
PLL
PLL
clk
11
ITR
(vqm)
PLL
clk
Control and Data signals
THANKS…
•
•
•
•
Our great supervisor Mike Sumszyk
Lab staff
Michael Yampolsky
Gadi Tuchman
58
Our God in the sky
We are happy to invite you to
our demonstration at the lab 
59
APPENDIX
60
Image processing
Gaussian Smoothing
Beltrami Smoothing
61
Comparison between SynplifyDSP
and direct VHDL implementation
Pros:
• The SynplifyDSP tool plugs into the familiar Simulink
environment.
• The development is fast.
Cons:
• Hard to obtain an optimal implementation (non optimal
critical path)
• VHDL code that is hard to understand and therefore it is
difficult to make changes
62
DVI
The Digital Visual Interface (DVI) is a video
interface standard designed to maximize the visual
quality of digital display devices
Simulink design
    


  R   
    


    


  G   
    


    


  B   
    


64
Simulink design
    


  R   
    


    


  G   
    


    


  B   
    


65
SynplifyDSP – VHDL code
66
Synplify Pro
67
Synplify Pro
68
Procwizard + Quartus
• In the ProcWizard we create the interface
between the FPGA and daughter board
DVI port.
• The Quartus performs the place and route
according to the Procwizard interface and
the SynplifyPRO node-level netlist.
69
Project stages
• Simulink design of an existing Matlab code
• Adaptation of the Simulink design to SynplifyDSP
components and constraints.
• Synthesis of the VHDL code produced by SynplifyDSP
using SynplifyPro
• Integration of the above RTL component within the Gidel
card architecture using Quartus II and ProcWizard
• Place and route by using Quartus II
• Loading RBF file to Gidel’s Procstar II card using
ProcWizard
70