DSP for FPGA SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Download Report

Transcript DSP for FPGA SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

DSP for FPGA
SYSC5603 (ELG6163) Digital Signal
Processing Microprocessors, Software
and Applications
Miodrag Bolic
Objectives
•
•
•
•
•
Comparison between PDSP and FPGA
Virtex II Pro
Altera Stratix FPGA
Stratix DSP Block and its configuration
Altera design flow
What Is an FPGA?
• Field Programmable Gate Array
• Device that Has a Regular Architecture
(Set of Blocks) that Can Be Programmed
for Various Functions
• “Glue” Logic
• Customizable Hardware Solution
• Configurable Processors
Why Use FPGAs in DSP Applications?
• 10x More DSP Throughput Than
DSP Processors
– Parallel vs. Serial Architecture
• Cost-Effective for Multi-Channel
Applications
• Flexible Hardware Implementation
• Single-Chip Solution
DSP System
Software
DSP
FPGA
– System (Hardware/Software)
Integration Benefits
Software
Embedded
Processor
FPGA
DSP Processors vs. FPGAs
High Speed DSP
Processor
MAC MAC
MAC

1-8 Multipliers


MAC
Needs looping for more than 8
multiplications
Needs multiple clock cycles
because of serial computation

200 Tap FIR Filter would need
25+ clock cycles per sample
with an 8 MAC unit processor
High Level of Parallel
Processing in FPGA
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC
MAC

Can implement hundreds of MAC
functions in an FPGA
 Parallel implementation allows for
faster throughput
–
200 Tap FIR Filter would need 1
clock cycle per sample
Extending Range of Altera
Reconfigurable DSP Solutions
Performance (MMACs/sec)
New!
600 -
100 -
Embedded
Processors
Embedded Processors
Hardware Acceleration
Complete Hardware
Implementation
Comparison of DSP Devices
Data
Programmable DSP Processors
Reconfigurable DSP
Benefits
• Easy to Use
• Programmed Via C-Code or
Assembly
• Fast Development Time
• Easy to Use
• Programmed via C-Code,
Assembly, or HDL
• Efficient for Recursive
Algorithms Using DSP IP Cores
• Higher Levels of Integration
Weaknesses
• Fixed Architecture
• Inefficient for Highly Recursive
Algorithms Unless Hardware
Accelerated
• Potential Bus Bottlenecks
• Other Devices (FPGAs) Often
Used on Board for Other
Functions
• Longer Development
Time (But Getting Shorter!)
Objectives
•
•
•
•
•
Comparison between PDSP and FPGA
Virtex II Pro
Altera Stratix FPGA
Stratix DSP Block and its configuration
Altera design flow
Stratix EP1S10 [2]
TriMatrix™ Memory [1]
M512 Blocks




Small FIFOs

Shift Register

Rake Receiver
Correlator

FIR Filter Delay Line

M-RAM
M4K Blocks
Header / Cell Storage
Channelized
Functions
ATM cell–packet
processing
Nios Program Memory
Dedicated External
Memory Interface





Packet / Data Storage
Nios Program Memory
System Cache
Video Frame Buffers
Echo Canceller Data
Storage



Look-Up Schemes
Packet & Cell Buffering
Cache
More Bits For Larger Memory Buffering
512 Kbits per
block + parity
512 bits per
block + parity
4 Kbits per
block + parity
More Data Ports for Greater Memory Bandwidth
Memory Bandwidth Summary
Stratix Device Family [1]
Device
Total RAM
Bits
EP1S10
920,448
EP1S20
M-RAM
Blocks
M4K
Blocks
M512
Blocks
1
60
94
1,245,024
1,669,248
2
82
194
2,096,928
EP1S25
1,944,576
2
138
224
2,894,400
EP1S30
3,317,184
4
171
295
3,750,192
EP1S40
3,423,744
4
183
384
4,384,800
EP1S60
5,215,104
6
292
574
6,762,528
EP1S80
7,427,520
9
364
767
8,784,720
Maximum
Bandwidth
(Mbps)
Logic Element (LE) [2]
LUT Chain
Input
Register
Chain Input
addnsub
Register Control
Signals
cin
data1
data2
data3
(2)
Sync Load
& Clear
Logic
4-Input
LUT
D
DATA
data4
Row, Column
& DirectLink
Routing
Local Routing
Register
Feedback
LUT Chain
Output
Note:
1)
2)
Register Chain
Output
Functional Diagram Only. Please See Datasheet for more Details.
Addnsum & data1 connected via XOR logic
Dynamic Arithmetic Mode
LAB Carry-In
Carry-In0
Carry-In1
Register Register Control
Chain Input
Signals
Carry-In
Logic
addnsub
data1
data2
data3
Sync Load
& Clear
Logic
Sum
Calculator
D
DATA
Carry
Calculator
Carry-In0
Carry-In1
Carry-Out
Logic
Row, Column
& DirectLink
Routing
Local Routing
Register Chain
Output
Carry-Out1
Carry-Out0
Note: Functional Diagram Only. Please See Datasheet for more Details.
Logic Array Blocks (LAB) [2]
30 LAB Input Lines
10 LE Feedback Lines
4
4
Local Interconnect
• 10 LEs
• Local Interconnect
• LAB-Wide Control
Signals
Control
Signals
4
4
4
4
4
4
4
4
LE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
LE9
LE10
Avalon Switch Fabric Contents
• Avalon Switch Fabric provides the following to
peripherals it connects
–
–
–
–
–
–
–
Data-Path Multiplexing
Address Decoding
Wait-State Generation
Dynamic Bus Sizing
Interrupt-Priority Assignment
Latent Transfer Capabilities
Streaming Read and Write Capabilities
• Avalon Switch Fabric tailors transactions to the
characteristic of peripherals that are attached
SOPC Design Example
CPU 32 Bit
Inst
Master
DMA Controller With Streaming
Data
Master
Control
Port
(Slave)
Read Port
(Master –
Streaming)
Write Port
(Master –
Streaming)
Allows for Masters and Slaves to communicate
without knowledge of each others interface details
Instruction
Memory
32bit Data path
Data Memory
32-bit Data path
UART
Avalon Tri-State
Bridge
External
FLASH
1 MB
16-bit
Datapath
External
SRAM
256 KB
32-bit
Datapath
VGA Controller
Avalon
Switch Fabric
Data
Path
Multiplexing
&
Slave
Arbitration
CPU 32 Bit
Inst
Master
Data
Master
DMA Controller With Streaming
Control
Port
(Slave)
1. Data-Path Multiplexing
Read Port
(Master –
Streaming)
Write Port
(Master –
Streaming)
Avalon
Switch Fabric
MUX
2- Slave Arbitration
Arbiter
Instruction
Memory
32bit Data path
Data Memory
32-bit Data path
3- Address Decoding
UART
Avalon Tri-State
Bridge
External
FLASH
1 MB
16-bit
Datapath
External
SRAM
256 KB
32-bit
Datapath
VGA Controller
Objectives
•
•
•
•
•
Comparison between PDSP and FPGA
Virtex II Pro
Altera Stratix FPGA
Stratix DSP Block and its configuration
Altera design flow
DSP Blocks
•
•
•
Eight 9 × 9 bit multipliers
Four 18 × 18 bit
multipliers
One 36 × 36 bit multiplier
DSP Blocks (cont.)
The DSP block consists of
• A multiplier block
• An adder/subtractor/accumulator block
• A summation block
• An output interface
• Output registers
• Routing and control signals
Stratix DSP Blocks
• High Performance Dedicated Multiplier Circuitry
– 18x18 Functions at 280 MHz
+
+-S
Output Register Unit
– Signed & Unsigned
Operations
– Dynamically Change
between Add & Subtract
– Supports DSP Requirements
Including Complex Numbers
+-S
Output Multiplexer
• Add, Accumulate or
Subtract
Optional Pipelining
– 9x9 (8 Max.)
– 18x18 (4 Max.)
– 36x36 (1 Max.)
Input Register Unit
• Variable Operand Widths with Full Precision Outputs
DSP Block for 18 x 18-bit Mode
Shift Register Chain
Adder/Output Block
Time-Domain Multiplexed FIR Filters
Operation of TDM Filter
Resource Savings with DSP Blocks
• DSP Block
–
–
–
–
Reduces LE Usage
Reduces Routing Congestion
Reduces Power
Maintains Performance
18
18
90% of your problems are
hidden under the surface!
18
18
X
X
36
36
36
+
36
+
+
38
SAVES 652
ROUTING
NETS!
Design Flow
Design Flow Overview
1)
2)
3)
4)
5)
6)
7)
8)
Create Design in Simulink Using Altera Libraries
Simulate in Simulink
Add SignalCompiler to Model
Create HDL Code & Generate Testbench
Perform RTL Simulation
Synthesize HDL Code & Place & Route
Program Device
Signal Tap II Logic Analyzer
Step 1- Create Design in Simulink
Using Altera Libraries
• Drag & Drop Library Blocks into
Simulink Design & Parameterize
Each Block
Parameterization of IP Megacores
Step 2 - Simulate in Simulink
Step 3 - Add “Signal Compiler” to
Model to Generate HDL code
•
•
•
•
•
•
•
APEX20K/E/C
APEX II
Stratix & Stratix GX
Cyclone & ACEX 1K
Mercury
FLEX10K & FLEX 6000
DSP Boards
•
•
Leonardo Spectrum
Synplify
• Quartus II
Speed vs. Area
Testbench Generation
Message Window
Step 4 - Create HDL Code & Generate
Testbench
AltrFir32.mdl
Enable "Generate
Stimuli for VHDL
Testbench" Button
AltrFir32.vhd
HDL Code Generation
DSP Builder Report File
• Lists All Converted
Blocks
– Port Widths
– Sampling Frequencies
– Warnings & Messages
Step 5 – Perform RTL Simulation
( ModelSim )
1) Set working directory (File =>
Change Directory)
2) Run TCL file (Tools =>
Execute Macro)
Perform Verification
ModelSim
vs
Simulink
Step 6 - Synthesize HDL &
Place & Route
• Leonardo
Spectrum
• Synplify
• Quartus II
– Synthesis
– Quartus II
Fitter
Step 7 – Program Device
Download Design
to DSP
Development Kits
Stratix DSP Development Board
Nios Expansion
Prototype Connector
MAX 7000 Device
Prototyping Area
D/A Converters
Mictor-Type Connectors
for HP Logic Analyzers
A/D Converters
Analog SMA
Connectors
Texas Instruments Connectors
on Underside of Board
40-Pin Connectors
for Analog Devices
Stratix DSP Board – Key Features
• Stratix EP1S25F780C5 Device (Starter Version)
• Stratix EP1S80B956C7 Device (Professional
Version)
• Analog I/O
– Two 12-bit, 125 MHz A/D Converters
– Two 14-bit, 165 MHz D/A Converters
• Digital I/O
– Two 40-pin Connectors for Analog Devices A/D Converter
Evaluation Boards
– Connector for TI TMS320 Cross-Platform Daughter Card
– 3.3V Expansion/Prototype Headers
– RS-232 Serial Port
• Memory
– 2 Mbytes of 7.5-ns Synchronous SRAM
– 32 Mbytes of FLASH
Step 8 - SignalTap II Logic Analyzer
• Embedded Logic
Analyzer
– Downloads into Device
with Design
– Captures State of
Internal Nodes
– Uses JTAG for
Communication
SignalTap II Logic Analyzer
Analysis of
Imported Data
Imported Data
Imported Plot