Using Programmable Logic to Accelerate DSP Functions

Download Report

Transcript Using Programmable Logic to Accelerate DSP Functions

Using Programmable Logic to
Accelerate DSP Functions
“An Overview“
Greg Goslin
Digital Signal Processing Applications Manager
Corporate Applications Group
15OCT95
Agenda

When to use FPGAs for DSP, an Overview
– What is Digital Signal Processing (DSP)?
– Where is DSP Used?
– Traditional DSP Approaches.

The Promise of Programmable Logic
– Case Study: Finite Impulse Response Filter.
– Case Study: Viterbi Decoder.

Building Fast Filters in FPGAs, a Tutorial
– Efficient Algorithms for FPGAs.
– Using Distributed Arithmetic for Filter Designs.
– How to use an FPGA to Building Filter Designs.

Design Methodologies for DSP in FPGAs
– Design Entry and Third Party Software Tools.
What is Digital Signal Processing (DSP)?

DSP is the arithmetic processing
of digital signals sampled at
regular intervals

DSP can be reduced to three trivial
operations:
– Delay
Filter
3 MACs
– Add
– Multiply

Accumulate = Add + Delay

MAC = Multiply + Accumulate

The MAC is the engine behind DSP
– More MACs = Higher Performance, Better
Signal Quality
– MACs vs. MIPS, not always equal
50* MACs
100 MACs
Where is DSP Used?
DSP has many names and acronyms:

Filtering – FIR
– IIR
– Viterbi

Compression – Decompression
– MPEG
$0
$100
$200
$300

Convolution

Correlation

Modulation
($US MILLION)
$500
$600
Communication
Computer
Instrumentation
MIL/Aero
1994 Consumption
– JPEG
– ADPCM
$400
Consumer
Industrial
Office
Automation
(Source: Forward Concepts)
Traditional DSP Approaches

Digital Signal Processor IC
– Software programmable, like a microprocessor
– Single MAC unit
– All processing done sequentially
– Fit the algorithm to the architecture
‘Traditional’ DSP Processor
Analog input
ADC
MAC
Memory
Data Controller

Analog output
DAC
Digital output
ASIC (gate array)
– Fit the architecture to the algorithm
– Significantly higher performance than DSP processor
– High cost and high risk to develop
– Usually only for high-volume applications
The Promise of Programmable Logic
ASIC
FPGA
DSP Processor
Best from both worlds
plus:
Pros
Pros

High performance

Efficient IC architecture

High flexibility

High density

System features

Good adaptability

One chip solution

Short design cycle

Low design risk

Automatic migration to
low cost HardWire
Cons
Cons

High design risk

Performance

Long design cycle

Hardware
Complexity
XC4000E Configurable Logic Blocks (CLBs)
Simplified Block Diagram
Look Up Tables can be
defined as any
4-input function
including 16x1 SRAM
C1 C2 C3 C4
H1
G4
G3
G2
G1
F4
F3
F2
F1
S/R
control
EC
M
U
X
M
U
X
F'
logic
func.
of
F',G',
and
H1
F'
G'
H'
G'
H'
D
SD
Q
EC
RD
1
G'
F'
H'
D
SD
Q
DIN
EC
K
(clock)
YQ
GY
H'
F'
Muxes allow 3
independent inputs
to “H” function
generator
S/R
DIN
logic
func. G'
of G1
to G4
logic
func.
of F1
to F4
DIN
XQ
RD
1
S/R
control
FX
XC4000E Dual-Port RAM

Each CLB can be configured as 16x1 dual-port, synchronous SRAM

Simultaneous read access through ADDR_F and ADDR_G

Write address, data, and control are synchronized to write clock
Common
Read/Write
Address
ADDR_F
4
DQ
DECODER
WE
D Q
CE
0
D Q
CE
Bit_0
0
Bit_1
4
Synchronization Registers
DOUT_G
0
D Q
CE
ADDR_G
DOUT_F
15
15
WCLK
MUX
DQ
MUX
MUX
DIN
Bit_15
F
15
G
Read-Only
Address
XC6200 System Features Meet
Embedded Coprocessing Requirements
1000x improvement in reconfiguration
time from external memory
CPU
FastMAPtm assures
high speed access to
all internal registers
Memory
Reconfigurable
Coprocessor
XC6200
All registers accessed via
built-in low-skew
FastMAPtm busses
Partial Reconfiguration
fully supported
I/O
Microprocessor interface
built-in
High capacity distributed memory
permits allocation of chip
resources to logic or memory
I/O
DSP Functions Are Parallel Algorithms

8-Bit, 16-Tap Finite Impulse Response (FIR) Filter
REG
Data Input
X[7:0]
REG
REG
0
15
Multiply by
C0
Filter
Co-Efficients
REG
1
C1
REG
14
REG
2
C2
REG
13
REG
REG
3
C3
12
REG
4
C4
REG
11
REG
REG
5
C5
10
Filter
Taps
REG
6
C6
REG
9
7
8
C7
Accumulate
Values

Equation:
Data Output
Y[9:0]
n
Yj   ck x kj  c0 x0  c1 x1  c2 x2  c3 x3 c3 x12  c2 x13  c1 x14  c0 x15
k 1
Symmetrical Coefficients
FPGAs Outperform
‘Traditional’ DSP Processors
Performance Relative to 50 MHz Fixed-Point DSP
25
8-Bit, 16-Tap FIR Filter
Performance Comparisons
22.00
Parallel Distributed Arithmetic
(PDA)
(est.)
(External Performance)
20
16.00
15
FPGA
10
Serial Distributed Arithmetic
(SDA)
FPGA
4.00
5
2.60
0.24
1.00
FPGA
MCM
0
133 MHz
Pentium™
Processor
750 KHz
Single
50 MHz
DSP
3 MHz
XC4003E-3
FPGA
(68% util.)
8 MHz
Four
50 MHz
DSPs
12 MHz
XC4010E-3
FPGA
(98% util.)
56 MHz
XC4013E-2
FPGA
(75% util.)
66 MHz
Case Study: Viterbi Decoder
(FPGA-based DSP Co-Processor)
Old_1
+
+
R
E
G
+
-
I/O Bus
INC
+
M
U
X
R
E
G
R
E
G
New _1
MSB
R
E
G
Diff_1
I/O Bus
+
-
Old_2
+
+
Diff_2
R
E
G
MSB
+
-
M
U
X
R
E
G
R
E
G
R
E
G
Prestate Buffer
Optional
Pipelining
Registers
Relative Performance
New _2
24-bit
24-bit
1 0 Bit
24-bit
3
2
2.67 tim es bette r
pe rform ance w ith
FPGA-as sis te d DSP
135 ns
1
360 ns
0
Tw o 66 MHz DSPs
Six 15 ns RAMs
66 MHz DSP+FPGA
Three 15 ns RAMs
DSP-Only
DSP + FPGA
8 DEVICES
Two 66 MHz DSPs
Six 15 ns SRAMs
System logic
4 DEVICES
One 66 MHz DSP
XC4013E-3 FPGA (44%)
Three 15 ns SRAMs
What to Look for in Your DSP Application

Identify Parallel Data Paths

Find Operations that Require Multiple Clock Cycles

Processor Bottlenecks
Flexibility
Parallel Data Paths
Scaleable Bandwidth
Design Modification
Device Expansion
= YES
= NO
When to Use FPGAs for DSP
50

Data Rate (with 50 MHz system clock)
45
Number of DSPs
4 DSPs
3 DSPs
2 DSPs
1 DSP
40
35
– Up to 66 MHz with XC4000E-2


Short word lengths
– DA algorithm gets faster with shorter
word length
25
FPGA
Region

15
Lots of filter taps
– FPGA processes all taps in parallel,
faster than DSP
10
5
Low sample rates
– Integrate DSP + system logic in a lowcost DSP using serial sequential
algorithm
30
20
High sample rates
DSP
Region
0
1 4 8 12 16 20 24 28 32 36 40 44 48
Arithmetic Operations Per Sample
(MACs)

Fast correlators

Single-chip solution required

HardWire gate array migration
path for high-volume designs
Information on DSP Applications

Greg Goslin
– Digital Signal and Image Processing Applications Manager

Email:
[email protected]

WEB:
http://www.xilinx.com/dsp.htm

Fax:
408-879-4442