Transcript Document

Diseño de Filtros FIR usando
FPGA
IEEE-Argentina
Parte 2
07/07/2015
Ing Daniel A Jacoby
1
Outline
• Power of Parallelism
• Platform FPGA Virtex-II/Virtex-II Pro Series
• Why should I use FPGAs for DSP?
Introduction - 1 - 2
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Essence of a DSP Processor
Cycles expended
making decisions
and controlling flow
Program Counter and
Control
I/O
Cycles expended
communicating with
outside world or other
processors
Introduction - 1 - 3
Program must be stored in ROM
and many instructions do not directly
contribute to processing
Program
Memory
Registers
Instruction
Decode
ALU
ALU supports many
operations but only
one or a few can be
used at one time
© 2003 Xilinx, Inc. All Rights Reserved
Fixed bit
width.
Algorithm
may not
require all
bits
Memory
All values currently
not in use must be
retained
ALU contains a fixed set of
operations and multiple
operations (cycles) required
to achieve desired effect
For Academic Use Only
Multiply Accumulate
Single Engine
• Sequential processing limits data
throughput
– Time-shared MAC unit
– High clock frequency creates difficult
system challenge
• 256 Tap FIR Filter
– 256 multiply and accumulate (MAC)
operations per data sample
– One output every 256 clock cycles
Introduction - 1 - 4
© 2003 Xilinx, Inc. All Rights Reserved
Data In
Reg
Loop
Algorithm
256 times
MAC
unit
Data Out
For Academic Use Only
Sequential Processing
Limits System Performance
Sample Rate
(MSamples/s)
40
Single 300 MHz Processor
Two 300 MHz Processor
35
30
Channel
Density
or
Sample
Rate
25
Fixed Processor Clock Rate
=
Number of operations per sample
20
Max Sample Rate
15
10
5
2 46 8
16
24
32
40
48
56
64
72
80
88
96
104
No. of
coefficients
Algorithmic Complexity
Introduction - 1 - 5
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Multiply Accumulate
Multiple Engines
• Parallel processing maximizes
data throughput
– Support any level of parallelism
– Optimal performance/cost
tradeoff
Data In
C0
Reg1
Reg0
C1
Reg2
C2
Reg255
.... C255
• 256 Tap FIR Filter
– 256 multiply and accumulate
(MAC) operations per data
sample
– One output every clock cycle
All 256 MAC
operations in one
clock cycle
Data Out
• Flexible architecture
– Distributed DSP resources (LUT,
registers, multipliers, & memory)
Introduction - 1 - 6
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II Platform FPGA (1)
Active Interconnect™
Powerful CLB
Slice S3
Switch
Matrix
• Fully buffered
• Fast, predictable
CLB,
IOB,
DCM
Slice S2
Switch
Matrix
Slice S0
BRAM
Block RAM
• 18KBit True Dual Port
• Up to 3 Mbits / device
Introduction - 1 - 8
Slice S1
Multipliers
• 18b x 18b multiplier
• 200+ MHz pipelined
© 2003 Xilinx, Inc. All Rights Reserved
• 8 LUTs
• 128b distributed RAM
• Wide input functions (32:1)
• Support for slices based
multipliers
For Academic Use Only
Virtex-II Platform FPGA (2)
16 Global Clocks
• Eight clocks to any quadrant
• Switch glitch-free between clocks
DCM
16 Clocks
DCI
• On-chip termination
• Guaranteed signal integrity
• Eliminates 100s of resistors
Introduction - 1 - 9
© 2003 Xilinx, Inc. All Rights Reserved
• Zero delay clock
• Precision phase shift
• Frequency synthesis
• Duty cycle correction
• Clock multiply and divide
For Academic Use Only
Virtex-II Memory Hierarchy
Distributed RAM
16x1
16x1
16x1
16x1
16x1
16x1
16x1
16x1
High-Performance
External Memory Interfaces
DDR
SDRAM
16k x 1
ZBT®
SRAM
8k x 2
4k x 4
2k x 9
1k x 18
512 x 36
QDR
SRAM
True-Dual Port™
Synchronous Block RAM
Introduction - 1 - 10
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II CLB
• Flexible resources
COUT
TBUF
TBUF
— Wide-input functions
• 16:1 multiplexer in 1 CLB
COUT
Slice S3
— Fast arithmetic functions
• Two dedicated carry chains
— Cascadable shift registers in LUT
• 128-b shift register in 1 CLB
• Ease of Performance
Slice S2
Switch
Matrix
SHIFT
Slice S1
— Direct routing enabling high speed
Slice S0
CIN
Introduction - 1 - 11
© 2003 Xilinx, Inc. All Rights Reserved
CIN
For Academic Use Only
Fast
Connects
Virtex-II Slice
• Each slice contains two:
— Four inputs lookup tables
— 16-bit distributed SelectRAM
— 16-bit shift register
RAM16
SRL16
MUXFx
LUT
G
• Each register:
— D flip-flop
— Latch
Register
CY
MUXF5
• Dedicated logic:
LUT
F
CY
Register
Arithmetic Logic
Introduction - 1 - 12
© 2003 Xilinx, Inc. All Rights Reserved
— Muxes
— Arithmetic logic
— MULT_AND
— Carry Chain
For Academic Use Only
Unique Distributed RAM
• LUTs used as memory inside the fabric
• Flexible, can be used as RAM, ROM, or
shift register
• Distributed memory with fast access
time
RAM16
• Cascadable with built-in CLB routing
SRL16
• Applications
–
–
–
–
–
Linear feedback shift register
Distributed arithmetic
Time-shared registers
Small FIFO
-1
Digital delay lines (Z )
64b
64b
Dual Port
RAM
1 CLB
128b
Single Port
RAM
16b
LUT
1 CLB
Shift register
128b
16b
Introduction - 1 - 13
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
1 CLB
The SRL16E
• The 16 SRAM cells have been organized into a shift register
–
–
The ‘CE’ is used, in conjunction with the clock, to write data into the first flip-flop
and for all other data to move right by one position
Because this is a predictable operation, no address is required for writing
• The SRL16E is excellent in implementing efficient DSP Functions
–
–
A very efficient way to delay data samples
Shifting samples and scanning at faster rate
D Q15
CE
A Q
SRLC16E
D Q
Cascadable
CE
D
CE
D Q
A[3:0]
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
CE
D Q
0000
Q15
1111
Q
Introduction - 1 - 14
CE
D Q
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Multiplier Unit
•
•
Embedded 18-bit x 18-bit multiplier
Quantity:
–
•
•
–
•
•
•
•
36 Bit
2V40 : 4
2V8000 : 168
18 Bit
Virtex-II Pro : up to 556
•
•
18 Bit
Virtex-II : up to 168
2VP2 : 12
2VP125 : 556
2s complement signed operation
4- to 18-bit operands
Combinational & pipelined options
Operates with block RAM and fabric to
implement MAC function
Introduction - 1 - 15
Signed Multiply Performance
Virtex-II
18 x 18
Virtex-II Pro
18x18
245 MHz
300 MHz
Pipelined multiplier
with registered
inputs
and outputs
Preliminary
V1.60 Speeds
File
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II Family
Virtex-II
Part Number
XC2V
40
XC2V
80
XC2V
250
XC2V
500
XC2V
1000
XC2V
1500
XC2V
2000
XC2V
3000
XC2V
4000
XC2V
6000
XC2V
8000
LUTs + FFs
512
1,024
3,072
6,144
10,240
15,360
21,504
28,672
46,080
67,584
93,184
BRAM (kb)
72
144
432
576
720
864
1,008
1,728
2,160
2,592
3,024
Multipliers
4
8
24
32
40
48
56
96
120
144
168
DCM Units
4
4
8
8
8
8
8
12
12
12
12
392
456
484
528
624
824
824
824
912
1,104
1,108
684
684
684
Package
Available SelectIO
CS144
88
92
92
FG256
88
120
172
172
172
200
264
324
FG456
FG676
FF896
432
FF1152
720
FF1517
BG575
328
392
408
BG728
456
516
BG957
624
684
Introduction - 1 - 16
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Virtex-II Pro versus Virtex-II
•
•
•
•
Lower cost
Up to 24 RocketIO™ embedded
multi-gigabit transceivers
Up to four PowerPC
More Memory
–
–
•
10 MBits in block RAM
1,738 KBits in Distributed RAM
•
More block RAMs and multiplier
blocks
–
–
•
Smaller technology
–
More I/O pins per package (up
to 1704)
Introduction - 1 - 17
556 embedded multipliers
556 Block RAMs
© 2003 Xilinx, Inc. All Rights Reserved
0.13 µm Nine-Layer Copper
Process with 90 nm high-speed
transistors
For Academic Use Only
Virtex-II Pro Family
Virtex-II Pro
2VP2
2VP4
2VP7
2VP20
2VP30
2VP40
2VP50
2VP70
2VP100
2VP125
Logic Cells
3,168
6,768
11,088
20,880
30,816
43,632
53,136
74,448
99,216
125,136
PPC405
0
1
1
2
2
2
2
2
2
4
MGT3.125Gb
4
4
8
8
8
12*
16*
20
20*
24*
BRAM (kb)
216
504
792
1,584
2,448
3,456
4,176
5,904
7,992
10,008
Multipliers
12
28
44
88
136
192
232
328
444
556
DCM Units
4
4
4
8
8
8
8
8
12
12
1040
1040
1164
1200
Package
MGT
Available SelectIO
FG256
4
140
140
FG456
8
156
248
248
FF672
8
204
348
396
FF896
8
FF1152
12
FF1148*
396
556
556
564
692
692
692
0
804
812
FF1517
16
804
852
FF1704
20
FF1696*
0
964
996
* FF1148 and FF1696 special bond option: No MGT with Maximum SelectIO
Introduction - 1 - 18
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
FPGAs Mean Parallelism
Reason 1: FPGAs handle high computational workloads
FPGA
Conventional DSP Device
(Von Neumann architecture)
Data In
Data In
Reg
Reg1
Reg0
C0
C1
Reg2
C2
Reg255
.... C255
MAC unit
Data Out
Data Out
256 Loops needed to process samples
All 256 MAC operations in 1 clock cycle
256 Tap FIR Filter Example
Introduction - 1 - 19
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
FPGAs are Ideal for
Multi-channel DSP designs
20MHz
Samples
•
LPF
ch1
LPF
ch2
LPF
ch3
LPF
ch4
80MHz
Samples
LPF
Multi Channel
Filter
FPGAs are also ideally suited for multi-channel DSP designs
–
–
Many low sample rate channels can be multiplexed (e.g. TDM) and processed in the
FPGA, at a high rate
Interpolation (using zeros) can also drive sample rates higher
Introduction - 1 - 20
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Why FPGAs for DSP? (2)
Reason 2: Tremendous Flexibility
A
Q = (A x B) + (C x D) + (E x F) + (G x H)
can be implemented in parallel
B
C
D
E
F
G
H
×
×
×
×
+
+
Q
+
+
+
But is this the only way in the FPGA?
Introduction - 1 - 21
© 2003 Xilinx, Inc. All Rights Reserved
+
For Academic Use Only
Customize Architectures to Suit
Your Ideal Algorithms
Parallel
×
×
Semi-Parallel
Serial
+
+
+
+
×
×
+
×
×
+
+
DQ
+
+
+
DQ
×
+
+
Speed
Optimized for?
Cost
FPGAs allow Area (cost) / Performance tradeoffs
Introduction - 1 - 22
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
DSP Design Flows
in FPGA
This material exempt per Department of Commerce license exception TSU
© 2003 Xilinx, Inc. All Rights Reserved
Objectives
After completing this module, you will be able to:
•
•
•
•
•
•
Describe the advantages and disadvantages of three different design flows
Use HDL, CORE Generator, or System Generator for DSP depending
on design requirements and familiarity with the tools
Explain why there is a need for an integrated flow from system design
to implementation
Describe the System Generator and the tools it interfaces with
Build a model, simulate it, generate VHDL, and go through the design flow
Describe how Hardware in the Loop verification is beneficial in complex
system design
Introduction - 1 - 24
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outline
•
•
•
•
•
•
•
•
•
•
Introduction - 1 - 25
Using HDL
Using the Xilinx CORE Generator
Using the Xilinx System Generator for DSP
Lab 1: Creating a 12x8 MAC
HDL Co-Simulation
Hardware Verification
In System Debug
Resource Estimator
Summary
Simulink Tips and Tricks
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
HDL Design Verification
HDL
Behavioral
Simulation
Synthesis
Functional
Simulation
Implementation
Timing
Simulation
Download
In-Circuit
Verification
Introduction - 1 - 26
© 2003 Xilinx, Inc. All Rights Reserved
Implement your
design using
VHDL or Verilog
For Academic Use Only
Synthesis Design Verification
HDL
Behavioral
Simulation
Synthesis
Functional
Simulation
Implementation
Timing
Simulation
Download
In-Circuit
Verification
Introduction - 1 - 27
© 2003 Xilinx, Inc. All Rights Reserved
Synthesize the
design to create
an FPGA netlist
For Academic Use Only
Implementation
Design Verification
HDL
Behavioral
Simulation
Synthesis
Functional
Simulation
Implementation
Timing
Simulation
Download
In-Circuit
Verification
Introduction - 1 - 28
© 2003 Xilinx, Inc. All Rights Reserved
Translate, place
and route, and
generate a
bitstream to
download in the
FPGA
For Academic Use Only
Outline
•
•
•
•
•
•
•
•
•
Introduction - 1 - 29
Using HDL
Using the Xilinx CORE Generator
Using the Xilinx System Generator for DSP
Lab 1: Creating a 12x8 MAC
HDL Co-Simulation
Hardware Verification
Resource Estimator
Summary
Simulink Tips and Tricks
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
CORE Generator
Design Verification
HDL
COREGen
Synthesis
Behavioral
Simulation
Functional
Simulation
Implementation
Timing
Simulation
Download
In-Circuit
Verification
Introduction - 1 - 30
Instantiate
optimized IP within
the HDL code
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Synthesize, Implement, Download
Design Verification
HDL
COREGen
Synthesis
Behavioral
Simulation
Functional
Simulation
Implementation
Timing
Simulation
Download
In-Circuit
Verification
Introduction - 1 - 31
© 2003 Xilinx, Inc. All Rights Reserved
Synthesize,
Implement,
and Download
the bitstream,
similar to the
original design
flow
For Academic Use Only
Xilinx IP Solutions
DSP Functions
Math Functions
Memory Functions
P Multiplier Generator
P Asynchronous FIFO
$P Additive White Gaussian Noise (AWGN)
- Parallel Multiplier
P Block Memory modules
$P Reed Solomon
Dyn
Constant
Coefficient
Mult
P Distributed Memory
$ 3GPP Turbo Code
- Serial Sequential Multiplier
P Distributed Mem Enhance
$P Viterbi Decoder
- Multiplier Enhancements
P Sync FIFO (SRL16)
P Convolution Encoder
P
Pipelined
Divider
P Sync FIFO (Block RAM)
$P Interleaver/De-interleaver
P CORDIC
P CAM (SRL16)
P LFSR
P CAM (Block RAM)
P 1D DCT
Base Functions
P 2D DCT
P DA FIR
P Binary Decoder
P MAC
P Twos Complement
P MAC-based FIR filter
P Shift Register RAM/FF
Fixed FFTs 16, 64, 256, 1024 points
P Gate modules
P FFT 16- to 16384- points
P Multiplexer functions
P FFT - 32 Point
P Registers, FF & latch based
P Sine Cosine Look-Up Tables
IP CENTER
P Adder/Subtractor
$P Turbo Product Code (TPC)
http://www.xilinx.com/ipcenter
P Accumulator
P Direct Digital Synthesizer
P Comparator
P Cascaded Integrator Comb
P Binary Counter
P Bit Correlator
P Digital Down Converter
Key: $ = License Fee, P = Parameterized, S = Project License Available,
BOLD = Available in the Xilinx Blockset for the System Generator for DSP
Introduction - 1 - 32
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx CORE Generator
List of available IP from
or
Fully
Parameterizable
Introduction - 1 - 33
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx CORE Generator
Introduction - 1 - 34
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx CORE Generator
Introduction - 1 - 35
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx CORE Generator
Mi_filtro.coe
Introduction - 1 - 36
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx CORE Generator
Habilita directivas para el
posicionamiento de la MAC
Introduction - 1 - 37
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Xilinx Smart-IP Technology
•
Pre-defined placement and routing enhances performance and predictability
Fixed Placement
Relative Placement
Other logic has no
effect on the core
•
I/Os
Guarantees I/O and
Logic Predictability
Fixed Placement &
Pre-defined Routing
Guarantees
Performance
Performance is independent of:
200 MHz
200 MHz
Core Placement
Number of Cores
Device Size
200 MHz
Introduction - 1 - 38
200 MHz
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Outputs
•
•
•
.EDN (EDIF implementation netlist)
.XCO (core implementation data file / log file)
Optional:
–
–
–
–
–
Introduction - 1 - 39
.ASY Foundation or Innoveda symbols
.VEO Verilog instantiation template
.V Verilog behavioral simulation model
.VHO VHDL instantiation template
.VHD VHDL behavioral simulation model
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Challenges for a
DSP Software Platform
•
Industry Trends
–
–
–
–
–
•
Trend towards platform chips (FPGAs, DSP) resulting in greater complexity
Highly flexible systems required to meet changing standards
Multiple design methodologies - control plane/datapath
Challenges in modeling and implementing an entire platform
Hardware in the loop verification is useful in complex system design and System
Generator supports it
System Design Challenges
–
–
–
Introduction - 1 - 40
Leveraging legacy HDL code
Modeling & implementing control logic and datapath
No expert exists for all facets of system design
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Traditional Simulink
FPGA Flow
System Verification
System Architect
GAP
Simulink
FPGA Designer
HDL
Synthesis
Implementation
Download
Introduction - 1 - 41
Functional Simulation
Verify Equivalence
Timing Simulation
In-Circuit Verification
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator for
DSP v8.1 – An Overview
•
Industry’s system-level design environment (IDE) for FPGAs
–
–
Integrated design flow from Simulink to bit file
Leverages existing technologies
•
•
•
•
•
Simulink library of arithmetic, logic operators and DSP functions (Xilinx
Blockset)
–
•
Matlab/Simulink from The MathWorks
HDL synthesis
IP Core libraries
FPGA implementation tools
Bit and cycle true to FPGA implementation
Arithmetic abstraction
–
–
Introduction - 1 - 42
Arbitrary precision fixed-point, including quantization and overflow
Simulation of double precision as well as fixed point
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator for
DSP v8.1 – An Overview
VHDL code generation for Virtex-4™, Virtex-II Pro™, Virtex™-II,
Virtex™-E, Virtex™, Spartan™-3E, Spartan™-3, Spartan™-IIE and
Spartan™-II devices
–
–
–
–
–
–
–
–
•
Hardware expansion and mapping
Synthesizable VHDL with model hierarchy preserved
Mixed language support for Verilog
Automatic invocation of CORE Generator to utilize IP cores
ISE project generation to simplify the design flow
HDL testbench and test vector generation
Constraint file (.xcf), simulation ‘.do’ files generation
HDL Co-Simulation via HDL C-Simulation
Verification acceleration using Hardware in the Loop
Introduction - 1 - 43
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator for DSP
Platform Designs
ISIM
Introduction - 1 - 44
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The SysGen Design Flow
DSP Development Flow
1. Develop Algorithm &
System Model
Simulink MDL
2. Automatic Code
Generation
RTL VHDL & Cores
3. Xilinx Implementation
Flow
HDL Test Bench
Simulation
ISIM
Bitstream
Download to FPGA
Introduction - 1 - 45
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator Based
Design Flow
MATLAB/Simulink
HDL
System Generator
System Verification
Synthesis
Functional Simulation
Implementation
Timing Simulation
Download
In-Circuit Verification
Introduction - 1 - 46
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator Based
Design Flow
MATLAB/Simulink
HDL
System Generator
System Verification
Synthesis
Functional Simulation
Implementation
Timing Simulation
Files Used
•Configuration file
•VHDL
•IP
•Constraints File
HDL-CoSimulation
Download
*
In-Circuit Verification
ModelSim helper block not needed when ISIM simulator is used
Introduction - 1 - 47
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator Based
Design Flow
Files Used
•Configuration file
•VHDL
•IP
•Constraints File
MATLAB/Simulink
HDL
System Generator
System Verification
Synthesis
Functional Simulation
Implementation
Timing Simulation
Download
Introduction - 1 - 48
In-Circuit Verification
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Creating a System
Generator Design
•
•
•
•
Invoke Simulink library browser
To open the Simulink library browser,
click the Simulink library browser
button
or type “Simulink” in MATLAB console
The library browser contains all the
blocks available to designers
Start a new design by clicking the
new sheet button
Introduction - 1 - 49
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Creating a System
Generator Design
•
•
Build the design by dragging and dropping blocks
from the Xilinx blockset onto your new sheet.
Design Entry is similar to a schematic editor
Connect up blocks by
pulling the arrows on the
sides of each block
Introduction - 1 - 50
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Finding Blocks
•
•
Use the Find feature to search ALL
Simulink libraries
Xilinx blockset has nine major sections
–
Basic elements
•
–
Communication
•
–
Dual Port RAM, Single Port RAM
Tools
•
Introduction - 1 - 51
Multiply, accumulate, inverter
Memory
•
–
All Xilinx blocks – quick way to view all blocks
Math
•
–
FDATool, FFT, FIR
Index
•
–
Convert, Slice
DSP
•
–
MCode, Black Box
Data Types
•
–
Error correction blocks
Control Logic
•
–
Counters, delays
ModelSim, Resource Estimator
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Configure Your Blocks
•
Double-click or go to Block Parameters
to view a block’s configurable parameters
–
–
–
–
–
–
–
•
Arithmetic Type: Unsigned or twos complement
Implement with Xilinx Smart-IP Core (if possible)/
Generate Core
Latency: Specify the delay through the block
Overflow and Quantization: Users can saturate or
wrap overflow. Truncate or Round Quantization
Override with Doubles: Simulation only
Precision: Full or the user can define the number
of bits and where the decimal point is for the block
Sample Period: Can be inherent with a “-1” or
must be an integer value
Note: While all parameters can be simulated,
not all are realizable
Introduction - 1 - 52
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Values Can Be Equations
•
•
•
You can also enter equations in the
block parameters, which can aid
calculation and your own understanding
of the model parameters
The equations are calculated at the
beginning of a simulation
Useful MATLAB operators
–
–
–
–
–
–
–
Introduction - 1 - 53
+ add
- subtract
* multiply
/ divide
^ power
 pi (3.1415926535897.…)
exp(x) exponential (ex)
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Important Concept 1:
The Numbers Game
•
Simulink uses a “double” to represent numbers in a simulation. A double is a “64-bit twos
complement floating point number”
–
•
Because the binary point can move, a double can represent any number between +/- 9.223 x
1018 with a resolution of 1.08 x 10-19 …a wide desirable range, but not efficient or realistic for
FPGAs
Xilinx Blockset uses n-bit fixed point number (twos complement optional)
2
-2 2
1
1
0
Integer
2
0
1
2
-1
1
2
-2
0
2
-3
1
2
-4
1
2
-5
1
2
-6
1
2
-7
0
2
-8
2
1
Fraction
-9
0
2
-10
2
0
-11
1
2
-12
0
2
-13
1
Value = -2.261108…
Format = Fix_16_13
(Sign: Fix = Signed Value
Format = Sign_Width_Decimal point from the LSB
UFix = Unsigned value)
Design Hint: Always try to maximize the dynamic range of design by using only the required number of bits
Thus, a conversion is required when communicating with Xilinx blocks with Simulink blocks
(Xilinx blockset  MATLAB I/O  Gateway In/Out)
Introduction - 1 - 54
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
What About All Those
Other Bits?
•
The Gateway In and Out blocks support parameters to control the
conversion from double precision to N - bit fixed point precision
DOUBLE
6
-2
.... 1
4
5
2 2
1 1
2
1
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10 -11 -12 -13
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
....
1 0 1 1 0 1 1 1 1 0 1 0 0 1 0 1
OVERFLOW
QUANTIZATION
- Saturate
- Truncate
2
1
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-2 2 2 2 2 2 2 2 2 2 2 2
1 0 1 1 0 1 1 1 1 0 1 0
FIX_12_9
Introduction - 1 - 55
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Other Type: Boolean
•
•
The Xilinx Blockset also uses the
type Boolean for control ports like
CE and RESET
The Boolean type is a variant on
the 1-bit unsigned number in that
it will always be defined (High or
Low). A 1-bit unsigned number
can become invalid; a Boolean
type cannot
Introduction - 1 - 56
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Knowledge Check
Using the technique shown, convert the following fractional values…
•
Define the format of the following twos complement binary fraction and calculate
the value it represents
Format = <
1
•
1
0
0
0
1
0
1
0
1
1
_
>
Value =
What format should be used to represent a signal that has:
a) Max value: +0.999…
Min value: -0.999…
Quantized to 12 bit data
Format = <
•
1
_
Fill in the table:
Introduction - 1 - 57
_
_
b) Max value: 0.8
Min value: 0.2
Quantized to 10 bit data
>
Format = <
_
Operation
<Fix_12_9> + <Fix_8_3>
<Fix_8_7> x <Ufix_8_6>
© 2003 Xilinx, Inc. All Rights Reserved
_
c) Max value: 278
Min value: -138
Quantized to 11 bit data
>
Format = <
_
Full Precision Output Type
For Academic Use Only
_
>
Answers
Using the technique below, convert the following fractional values
•
Define the format of the following twos complement binary fraction and calculate
the value it represents
Format = < Fix_12_5 >
1
•
•
1
0
0
0
1
1
0
1
0
1
1
Value = -917 = -28.65625
32
What format should be used to represent a signal that has:
a) Max value: +1
Min value: -1
Quantized to 12-bit data
b) Max value: 0.8
Min value: 0.2
Quantized to 10-bit data
c) Max value: 278
Min value: -138
Quantized to 11-bit data
Format = < FIX _12_10 >
Format = <UFIX_10_10>
Format = < FIX _11_1>
Fill in the table:
Operation
<Fix_12_9> + <Fix_8_3>
<Fix_8_7> x <Ufix_8_6>
Introduction - 1 - 58
© 2003 Xilinx, Inc. All Rights Reserved
Full Precision Output Type
<Fix_15_9>
<Fix_16_13>
For Academic Use Only
Creating a System
Generator Design
Simulink
Sources
SysGen Data Path and
helper blocks
Simulink
Sinks
Gateway blocks used to interface
between Simulink and SysGen blocks
Introduction - 1 - 59
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator Design
Start
simulation by
pressing the
play button
• All SysGen design must contain
a System Generator block
• Used to set global netlisting
attributes
Introduction - 1 - 60
© 2003 Xilinx, Inc. All Rights Reserved
• Designs may have levels of
hierarchy
• Double click to “push” into a
subsystem
For Academic Use Only
Picking Bits: Why We Do It
•
•
•
•
To combine two data buses together to form a new bus
To force a conversion of data type including the number of bits and
binary bits
To reinterpret unsigned data as signed, or the converse
To extract certain bits of data, especially when there is bit growth
Introduction - 1 - 61
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Xilinx Blocks
•
–
•
•
Concat
•
– Available in basic elements, math,
and index libraries
Introduction - 1 - 62
– Available in basic elements, data
types, math, and index libraries
Available in basic elements, data
types, and index libraries
Reinterpret
Convert
Slice
– Available in basic elements, control
logic, data types, and index libraries
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Concat Block
•
•
Performs a concatenation of two bit vectors
Both inputs must be unsigned integers
–
•
•
i.e., two unsigned numbers with binary points at position zero
Reinterpret block provides signed to unsigned conversion
capabilities that can extend the functionality of the concat block
Does not use Xilinx LogiCORE and hardware resources
Introduction - 1 - 63
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Convert Block
•
The Xilinx convert block converts each input sample to a
number of a desired arithmetic type
–
–
–
–
Introduction - 1 - 64
A number can be converted to a signed (twos complement)
or unsigned value
Total number of bits and binary point are specified by the user
Rounding and quantization options apply to the output value
Does not use Xilinx LogiCORE but may use additional hardware depending
on the overflow and quantization options
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Convert Block
•
What is it doing?
–
–
–
Introduction - 1 - 65
User specifies the total number of bits, where the binary
point is, and the arithmetic type (signed or unsigned)
First it lines up the binary point between input and output
port types
Next, the total number of bits and binary point the user specifies
are used, and depending if overflow and quantization options
are used the output may change, as opposed to dropping bits
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Convert Block
•
The following through the convert block would result
in the same value using a different number of bits and
binary point
FIX_10_8
0
0
1
1
0
0
0
0
1
1
0
0
0
0
0
0
0
FIX_7_4
Introduction - 1 - 66
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Convert Block
•
•
Saturating the overflow may change the fractional number
to get the saturated value
Rounding the quantization may also affect the value to the
left of the binary point (the whole number)
Introduction - 1 - 67
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Convert Block
•
When we convert to a Fix_6_0, how do we get
two different values?
OVERFLOW
- Wrap
- Saturate
- Flag Error
QUANTIZATION
0
1
1
0
0
0
0
0
0
- Truncate
- Round
FIX_10_8
0 0
0
0
1
0
Round to decimal +2
Add ‘1’ to round
0
1
Truncate to decimal +1
Drop the bits
FIX_6_0
0 0
0
0
FIX_6_0
Introduction - 1 - 68
0
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Reinterpret Block
•
•
•
•
•
Forces its output to a new type without any regard for
retaining the numerical value represented by the input
Total number of bits in = total number of bits out
Allows for unsigned data to be reinterpreted as signed
data, and the converse
Also allows scaling of the data through the repositioning
of the binary point
Does not use Xilinx LogiCORE and hardware resources
Introduction - 1 - 69
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Reinterpret Block
•
Reinterpret the UFIX_10_8 number and force the
binary point to position 5
0
1
1
0
0
0
0
0
0
0
+1.5
0
0
0
0
+12
FIX_10_8
0
1
1
0
0
0
FIX_10_5
Introduction - 1 - 70
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Slice Block
•
•
The Xilinx slice block allows you to slice off a sequence
of bits from your input data and create a new data value
The output data type is unsigned with its binary point
at zero
Introduction - 1 - 71
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
The Slice Block
•
Take a slice of the FIX_10_8 number by taking a 4-bit slice
and offsetting the bottom bit of the slice by 5 bits
0
1
1
0
0
0
0
0
0
0
+1.5
12
•
Upper Bit Location + Width: Offset of top bit from MSB = 0 and width = 4
0
•
1
1
0
6
Two Bit Locations: Offset of top bit from MSB of Input = -1 and Offset of
Bottom bit from LSB of Input = 5
12
Introduction - 1 - 72
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Coffee Break
07/07/2015
Ing Daniel A Jacoby
73
Important Concept 2:
Sample Period
•
•
•
Every SysGen signal must be “sampled”; transitions occur at equidistant
discrete points in time called sample times
Each block in a Simulink design has a “Sample Period” and it
corresponds to how often that block’s function is calculated and the
results outputted
This sample period must be set explicitly for:
•
•
•
Gateway in
Blocks w/o inputs (note: constants are idiosyncratic)
Sample period can be “derived” from input sample times for other blocks
Introduction - 1 - 74
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Important Concept 2:
Sample Period
•
The units of the sample period can be thought of as arbitrary, BUT a lot
of Simulink source blocks do have an essence of time
–
•
•
For example, a sample period of 1/44100 means the block’s function will be
executed every 1/44100 of a sec
Remember Nyquist Theorem (Fs  2fmax) when setting sample periods
The sample period of a block DIRECTLY relates to how that block will be
clocked in the actual hardware. More on this later
Introduction - 1 - 75
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Setting the Global Sample Period
• The Simulink System
Period MUST be set in the
System Generator token.
For single rate systems it
will be the same as the
Sample Periods set in the
design. More on Multi Rate
designs later
Sample Period = 1
Introduction - 1 - 76
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
SysGen Token
Slave
Controls
Master
Controls
“Simulink System
Period” MUST be
set correctly for
simulation to work
Introduction - 1 - 77
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Using the Scope
•
•
Click Properties to change the number of
axes displayed and the time range value
(X-axis)
Use the Data History tab to control how
many values are stored and displayed
on the scope
–
•
•
Also can direct output to workspace
Click Autoscale to quickly let the tools
configure the display to the correct axis
values
Right-click on the Y-axis to set its value
Introduction - 1 - 78
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Design and Simulate
in Simulink
Push “play” to simulate the design. Go to “Simulation
Parameters” under the “Simulation” menu to control
the length of simulations
Introduction - 1 - 79
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Generate the HDL Code
Once complete, double-click
the System Generator token
•
•
•
•
•
Select the target device
Select Synthesis Tool
Set the FPGA clock period desired
Select to generate the testbench
Generate the VHDL
Introduction - 1 - 80
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
System Generator
Output Files
•
Design files
–
–
–
–
•
Project files
–
–
–
•
<design>.VHD/ .V : HDL design files
<design>_cw.VHD/.V: Top-level HDL wrapper that contains clock circuit and SysGen
Design
.EDN and .NGC: Core implementation file
<design>_cw.XCF : Xilinx constraints file for timing and location constraints
<design>_cw.ISE : Project Navigator project file
.PRJ: Synthesis project files for XST and Synplify
.TCL : Scripts for Synplify and Leonardo project creation
Simulation files
–
–
–
Introduction - 1 - 81
.DO : Simulation scripts for MTI
.DAT : Data files containing the test vectors from System Generator
<design>_tb.VHD/.V : Simulation testbench
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Black Box
•
•
•
Allows a way to import HDL models into System
Generator
Allows co-simulation of black box HDL with
Simulink by using either ModelSim or ISE
Simulator
Integrates the imported HDL and implementation
files (EDN, NGC) with the netlist generated from
System Generator
Introduction - 1 - 82
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Choice of Target Hardware
•
JTAG Hardware-in-the-loop development platforms:
–
Any board with a JTAG header
•
•
Configure your JTAG-based board in 20 minutes
Xilinx
–
XtremeDSP Development kit (Virtex-4, Virtex-II Pro)
»
–
–
•
ML-401 and ML402 Boards (Virtex-4)
Multimedia Board (Virtex-II)
Distributors:
–
•
Avnet, Insight, Nu Horizons
Key board vendors:
–
Alphadata, Annapolis, Nallatech, Lyrtech
»
•
Also features high bandwidth interface via PCI
High bandwidth interfaces
Ethernet Hardware Co-simulation
Introduction - 1 - 83
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Add Your Own Board
in 20 Minutes
•
•
Integrate your board into System
Generator for
hardware in the loop
co-simulation
Simple wizard collects information for
your
target platform
Introduction - 1 - 84
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Hardware Co-simulation
(1) Begin with a model
that is ready to be
compiled for hardware
co-simulation.
(2) Open the SysGen
GUI and select a
compilation target.
(4) A post-generation
function is invoked to
produce an FPGA
configuration file.
(3) Press the Generate
button in the SysGen
GUI.
Introduction - 1 - 85
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Hardware Co-simulation
(5) The post-generation
script creates a new
library containing a
parameterized run-time
co-simulation block.
(6) The co-simulation runtime block is copied into
the original model.
Introduction - 1 - 86
(7) The hardware output
is bit and cycle accurate
when compared to the
original model.
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Ethernet Hardware Co-simulation
•
Networked Ethernet hardware cosimulation
–
–
–
Introduction - 1 - 87
Provides remote access over
a 10/100 Mbps network
Communication handled by
EMAC OPB peripheral
Reconfiguration via Ethernet
connection using MicroBlaze +
SystemACE
© 2003 Xilinx, Inc. All Rights Reserved
FPGA
Configuration
For Academic Use Only
WaveScope
•
Debugging tool to visualize signals
–
–
–
Displays data in analog(a la Scope) and logic mode
Supports hex, decimal and binary radices
Allows cross-referencing signals in the model
Introduction - 1 - 88
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Debugging the FPGA at
System Speed (1)
•
•
•
•
Insert Chipscope Pro blocks into
your Simulink design
Configure FPGA using JTAG
interface
Chipscope probes will be inserted
into the FPGA
Perform in-system debug at system
speed
Introduction - 1 - 89
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Debugging the FPGA at
System Speed (2)
•
New Shared Memory Interfaces
–
•
What types of resources?
–
–
–
–
•
Allow multiple, independent resources
(inside and outside Simulink) to share a
common address space
FPGA hardware (co-simulation)
Simulink Blocks
MATLAB Console
Command Line Applications
Common
Address
Space
This makes for easy, in-system
debugging
Introduction - 1 - 90
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Resource Estimator
•
•
The block provides fast estimates of
FPGA resources required to
implement the subsystem
Most of the blocks in the System
Generator Blockset carries the
resources information
–
–
–
–
–
–
Introduction - 1 - 91
LUTs
FFs
BRAM
Embedded multipliers
3-state buffers
I/Os
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Resource Estimator
•
Three types of estimation
–
Estimate Area
•
–
Quick Sum
•
–
Uses the resources stored in
block directly and sum them
up (no sub-levels functions
are invoked)
Post-Map Area
•
Introduction - 1 - 92
This option computes
resources for the current level
and all sub-levels
Opens up a file browser and
let user select map report file.
The design should have been
generated and gone through
synthesis, translate, and
mapping phases.
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary
•
Full VHDL/Verilog (RTL code)
–
Advantages:
•
•
•
–
Disadvantages:
•
•
•
•
Introduction - 1 - 93
Portability
Complete control of the design implementation and tradeoffs
Easier to debug and understand a code that you own
Can be time-consuming
Don’t always have control over the Synthesis tool
Need to be familiar with the algorithm and how to write it
Must be conversant with the synthesis tools to obtain optimized design
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary
•
Full VHDL/Verilog (Instantiating Primitives)
–
Advantages:
•
•
•
–
Disadvantages:
•
•
•
Introduction - 1 - 94
Full access to all architecture features
Carry on further with optimization
Best optimization
Not as portable as RTL VHDL/Verilog
Must be an FPGA expert and know the architecture
Time-consuming
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary
•
CORE Generator
–
Advantages
•
•
•
–
Disadvantages
•
•
•
Introduction - 1 - 95
Can quickly access and generate existing functions
No need to reinvent the wheel and re-design a block if it meets specifications
IP is optimized for the specified architecture
IP doesn’t always do exactly what you are looking for
Need to understand signals and parameters and match them to your
specification
Dealing with black box and have little information on how the function is
implemented
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
Summary
•
System Generator for DSP
–
Advantages
•
•
•
•
•
•
–
Disadvantages
•
•
•
•
Introduction - 1 - 96
Huge productivity gains through high-level modeling
Ability to simulate the complete designs at a system level
Very attractive for FPGA novices
Excellent capabilities for designing complex testbenches
HDL Testbench, test vector and golden data written automatically
Hardware in the loop simulation improves productivity and provides quick verification of
the system functioning correctly or not
Minor cost of abstraction: doesn’t always give the best result from an area usage point of
view
Customer may not be familiar with Simulink
Not well suited to multiple clock designs
No bi-directional bus supported
© 2003 Xilinx, Inc. All Rights Reserved
For Academic Use Only
End Day 2
07/07/2015
Ing Daniel A Jacoby
97