Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic Outline • Introduction to the course • Computer architectures for signal processing • Design.

Download Report

Transcript Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic Outline • Introduction to the course • Computer architectures for signal processing • Design.

Introduction
SYSC5603 (ELG6163) Digital Signal Processing
Microprocessors, Software and Applications
Miodrag Bolic
1
Outline
• Introduction to the course
• Computer architectures for signal processing
• Design cycle
2
Course Outline
Hardware
• DSP Systems, A/D and D/A
converters
• Architectural Analysis of a DSP
Device, TMS320C6x,
TigerSharc, Blackfin
• FPGA for signal processing
(Altera, Xilinx),
• Application domain specific
instruction set processors
• SoC, DSP Multiprocessors
• Signal processing arithmetic
units
Algorithm design and
transformations
• Scheduling, Resource
Allocation, Synthesis
• Finite-word length effects
• Algorithmic transformations
• FIR filter design
• FFT design
• IIR filter design
• Adaptive filter design
3
Course Conduct
• Course notes will be posted on the course web page
• Assignments with solutions will be provided and will not
be graded
• There is no text-book
• The exam will be prepared based on lecture slides,
references and assignments
4
Paper Analysis and Presentation
• Topics are related to the studied material
• Each student will present for 15 minutes
• Discussion will follow after the presentation
• Each student has to choose one topic before January 16th at 7pm.
• Each student have to send a document (from 8-10 pages) font 12
single spaced three days before the presentation.
• The document has to be revised after my comments
• 15 presentation slides max (10 minutes, 15min max)
• The mark is 50% document, 50% presentation
• Some preliminary time schedule is given on the course web page.
This time schedule will be updated on January 16th
• Your reports will be posted on the course Web page. Please see the
paper on plagiarism: How to Handle Plagiarism: New Guidelines
5
Presentation topics- Computer architectures
• Configurable processors for DSP applications
– The analysis of processors with configurable instructions sets.
Analysis of the tools. Include Tensilica, Altera and Coware
solutions (Lisatek). An example of existing designs using
configurable processors.
• Multiprocessors for DSP
– Analysis of papers including [Kumar05] and [Wiangtong05].
Analysis of current hardware solutions. Analysis of tools
including CMPWARE. An example of existing designs using
multi-processors.
• IP core design.
Current standards related to IP core design. Standard buses
used for IP cores. Advantages and disadvantages of hard and
soft IP cores. DSP processor cores. DSP hardware cores.
6
Presentation topics- Tools
• Design space exploration tools
– The analysis of the tools for design space exploration. Simulink based
tools AccelChip vs. C-based tools (Coware). Performance and
differences.
• Direct mapping from algorithms to hardware
– Analysis of different tools (Simulink, Synopsys System Studio,
CoWare's SPW 5-XP) and design processes used for automated
implementation of signal processing algorithms to FPGA. Analysis of
quality and speed of these automated implementations.
• Comparison between HandleC, SpecC and SystemC
– What is the main difference of these languages. Which language should
be taken for which application? Which of these languages have total
support from algorithm design to the implementation (example
Synopsys SystemC solution).
• Tools for the analysis of the optimal-word length
– Analyze the tools for floating to fixed point precision. Compare solutions
from Mathworks, Synopsys and AccelChip.
• TI standard for writing algorithms - eXpressDSP Algorithm
7
Presentation topics - Applications
• Software-defined radio
– Analysis of signal processing algorithms used for software defined
radios. Computer architectures for software defined radios. List of
commercial platforms and development tools.
• Signal processing for wireless sensor networks
– Analysis of signal processing algorithms used for wireless sensor
networks: positioning, tracking, data fusion, sensor processing. Analysis
of DSP architectures used in sensor networks. Specifics of algorithm
designs for wireless sensor networks.
• Tracking applications
– Detailed analysis of different tracking and navigation application
including: aircraft positioning, target tracking for radar and sonar
applications, car collision detection, and positioning and tracking in
homeland security applications. Define the requirements for each
application such as sampling rate, accuracy, latency, range. Discuss
about the algorithms and about the hardware platforms used for each
applications
8
Project
•
•
•
•
Project proposals are expected by February 6th.
Deadline for project demonstration: March 31
Deadline for project report: March 27
Grade: 20% Project Proposal, 20% Project Report, 20% Project
Presentation, 40% Demonstration
•
•
You propose the algorithm and the application
Two defined projects
– Float-to-fixed point analysis and implementation of particle filters (Simulink or
Synopsys System Studio) using FPGA
– Comparison of different implementations of atan function using PDSP and FPGA
platforms (VHDL)
•
Project platforms and tools:
1. Implementing signal processing algorithms using configurable processors with
DSP blocks (Tensilica and NIOS II1)
2. The analysis of VLIW architectures and simulators for signal processing
(Hardware design)
3. System level design using Simulink & Altera's DSP Builder1
4. System level design using SystemC under Synopsys System Studio
5. Multiprocessing using CMPWARE (Java, NIOS II)
9
1 – might be the license problem
Project topics
•
Implementations of different algorithms on the same platform for the
purpose of comparison of the algorithms
Examples:
– Implementation of multimedia signal processing algorithm in programmable dsp
chips (TI TMS 32060) using the algorithm transformation techniques and
compare to existing implementations. It is requried to discuss the VLIW
instructure architecture and demonstrate how algorithm transformation/mappling
techniques are being used to generate the code.
– Comparison of different implementations of atan function using PDSP and FPGA
platforms (VHDL).
•
Implementation of a DSP algorithm on new platforms.
Examples:
– Comparison of performance of Kalman filter implementations on configurable
processors
– Development of parallel Kalman filtering algorithm suitable for multiprocessor
implementation.
•
Implementation of complex algorithms on FPGAs
– It requires full implementation cycle from the implementation of these algorithms
on Matlab/Simulink to their implementation. Mapping between the algorithms and
the hardware have to be performed. Floating to fixed point analysis have to be
performed
10
Project report
Proposal: The purposes of writing a project proposals are: (i) to determine the topic, (ii) to show that preliminary study
of the subject materials have been done, (iii) to assess the likelihood of success of the project, (iv) to give the plan
to carry out the project. You should submit a three to five pages proposal to the instructor for approval of the
project. A face to face discussion lasting 5-10 minutes between the instructor and the student is required.
This discussion should take place during one of the office hours of the instructor. At the end of this discussion, the
instructor will either approve the proposal and assign a grade, or reject the proposal and let the team know the
reason. In the latter case, the team must come up with an revised proposal or an alternate new proposal before a
deadline specified in the course outline. Preliminary discussion and the instructor can also be held in advance
during their office hours. However, the opinion expressed by the teaching staff during these preliminary
discussions are only suggestions. The team members are responsible to use their best judgement to prepare the
proposal for approval.
The format of the proposal is as follows:
•
title of the project
•
project highlight -- explain what you want to do in this project,
•
Motivation -- explain the significance of the proposed project and the relevance of the project to this course
•
Prior art -- listing at least three previous works (papers, books, etc.) that reported work most closely related to the
current project. Briefly review their approaches, advantages and shortcomings.
•
Approach -- outline proposed approaches. Including preliminary analytical result, or implementation prototype as
appropriate, a schedule of tasks to be performed, etc.
•
expected results -- what can be promised in the final project report that is not part of the proposal.
•
Task planning --specify when you will do what.
Report: A type-written, hardcopy project report, as well as an electronic version (including source code, design files
developed) are to be submitted at the end of the semester. The length of the report is not restricted. However, the
report must be include the following sections:
•
Introduction: Motivation and backgrounds.
•
Main body of report. Depending on types of project, this part may include method used, approaches taken,
problem description, etc.
•
Conclusion and discussion: Highlight your achievement in this project and things may be done in the future.
More details about the project will follow
Copied from http://homepages.cae.wisc.edu/~ece734/project/index.html
11
Course Objectives … To
• Understand tradeoffs in implementing DSP algorithms
• Know basic DSP architectures
• Know some reduced complexity strategies for algorithms
mainly on FPGA.
• Know about commercial DSP solution
• Know and understand system-level design tools
• Understand research topics related to algorithmic
modifications and algorithm-architecture matching
12
Why this course?
There is the demand to derive more information per signal.
“More” means
• Faster: Derive more information per unit time;
– Faster hardware
– Newer algorithms with fewer operations
• Cheaper: Derive information at a reduced cost in
processor size, weight, power consumption, or dollars;
• Better: Derive higher quality information, (higher
precision, finer resolution, higher signal-to-noise ratio)
[Richards04 ]
13
Hardware and software elements
Progress in signal processing capability is the product of
progress in IC devices, architectures, algorithms and
mathematics.
[Richards04 ]
14
Moore’s Law
Predicts doubling of circuit density every 1.5 to 2 years.
http://www.icknowledge.com/trends/uproc.html
15
What is Signal Processing?
• Ways to manipulate
signal in its original
medium or an abstract
representation.
• Signal can be abstracted
as functions of time or
spatial coordinates.
• Types of processing:
–
–
–
–
–
–
–
–
–
Transformation
Filtering
Detection
Estimation
Recognition and
classification
Coding (compression)
Synthesis and reproduction
Recording, archiving
Analyzing, modeling
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
16
Digital Signal Processing
• Signals generated via
physical phenomenon are
analog in that
– Their amplitudes are
defined over the range of
real/complex numbers
– Their domains are
continuous in time or
space.
• Digital signal processing
concerns processing
signals using digital
computers.
– A continuous time/space
signal must be sampled to
yield countable signal
samples.
– The real-(complex) valued
samples must be quantized
to fit into internal word
length.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
17
Signal Processing Systems
A/D
Digital Signal
Processing
D/A
The task of digital signal processing (DSP) is to process sampled
signals (from A/D analog to digital converter), and provide its output
to the D/A (digital to analog converter) to be transformed back to
physical signals.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
18
Stratix DSP Development Board
Nios Expansion
Prototype Connector
MAX 7000 Device
Prototyping Area
D/A Converters
Mictor-Type Connectors
for HP Logic Analyzers
A/D Converters
Analog SMA
Connectors
Texas Instruments Connectors
on Underside of Board
[AlteraDSP]
40-Pin Connectors
for Analog Devices
19
Example DSP Applications….

COMMUNICATIONS
Echo
Cancellation
PBXs
Line Repeaters
Modems
Global Positioning
Sound/Modem/Fax Cards
Cellular Phones
Speaker Phones
Video Conferencing
ATMs
Digital

 VOICE/SPEECH

Recognition
Speech Processing/Vocoding
Speech Enhancement
Text-to-Speech
Voice Mail
AV
Editing
Mixers
Home Theater
Pro Audio
Digital

Detectors
Tools
Digital Audio / TV
Music Synthesizers
Toys / Games
Answering Machines
Digital Speakers
Power
DSP
INSTRUMENTATION
Analyzers
Processors
Digital Oscilloscopes
Mass Spectrometers
Seismic
INDUSTRIAL/CONTROL
Robotics
Numeric
Control
Line Monitors
Motor/Servo Control
Power
CONSUMER
Radar
Spectrum

PRO-AUDIO
Speech
 MEDICAL
Patient
Monitoring
Ultrasound Equipment
Diagnostic Tools
Fetal Monitors
Life Support Systems
Image Enhancement

MILITARY
Secure
Communications
Processing
Image Processing
Radar Processing
Navigation, Guidance
Sonar
20
www.analog.com/dsp
Implementation of DSP Systems
• Platforms:
– Native signal processing
(NSP) with general purpose
processors (GPP)
• Multimedia extension
(MMX) instructions
– Programmable digital signal
processors (PDSP)
– Application-Specific
Integrated Circuits (ASIC)
– Field-programmable gate
array (FPGA)
• Requirements:
– Real time
• Processing must be
done before a prespecified deadline.
– Streamed numerical
data
• Sequential processing
• Fast arithmetic
processing
– High throughput
• Fast data input/output
• Fast manipulation of
data
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
21
How Fast is Enough for DSP?
• Real time requirements:
– Example: data capture
speed must match
sampling rate. Otherwise,
data will be lost.
– Processing must be done
by a specific deadline.
• Different throughput rates
for processing different
signals
– Throughput sampling
rate.
– CD music: 44.1 kHz
– Speech: 8-22 kHz
– Video (depends on frame
rate, frame size, etc.) range
from 100s kHz to MHz.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
22
ASIC: Application Specific ICs
• Custom or semi-custom
IC chip or chip sets
developed for specific
functions.
• Suitable for high volume,
low cost productions.
• Example: MPEG codec,
3D graphic chip, etc.
• ASIC becomes popular
due to availability of IC
foundry services. Fabless design houses turn
innovative design into
profitable chip sets using
CAD tools.
• Design automation is a
key enabling technology
to facilitate fast design
cycle and shorter time to
market delay.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
23
Programmable Digital Signal Processors (PDSPs)
• Micro-processors designed
for signal processing
applications.
• Special hardware support
for:
– Multiply-and-Accumulate
(MAC) ops
– Saturation arithmetic ops
– Zero-overhead loop ops
– Dedicated data I/O ports
– Complex address
calculation and memory
access
– Real time clock and other
embedded processing
supports.
• PDSPs were developed to fill a
market segment between GPP
and ASIC:
– GPP flexible, but slow
– ASIC fast, but inflexible
• As VLSI technology improves,
role of PDSP changed over
time.
– Cost: design, sales,
maintenance/upgrade
– Performance
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
24
[Seshan98]
25
PDSP Market – By Company
2002 Market Share
2001 Market Share
20%
24%
Texas Instruments
40%
Motorola
Agere
8%
43%
9%
Analog Devices
Other
14%
16%
12%
14%
Ref: Forward Concepts
http://www.fwdconcepts.com/Pages/press42.htm
26
DSP Market – By Application
Market Share - 2003
6%
4% 3%
WIRELESS
8%
CONSUMER
MULTIPURPOSE
11%
WIRELINE
COMPUTER
68%
AUTOMOTIVE
Ref: Forward Concepts
http://www.fwdconcepts.com/Pages/press42.htm
27
Computing using FPGA
• FPGA (Field programmable
gate array) is a derivative of
PLD (programmable logic
devices).
• They are hardware
configurable to behave
differently for different
configurations.
• Slower than ASIC, but faster
than PDSP.
• Once configured, it behaves
like an ASIC module.
• Use of FPGA
– Rapid prototyping: run
fractional ASIC speed
without fab delay.
– Hardware accelerator:
using the same hardware
to realize different function
modules to save hardware
– Low quantity system
deployment
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
28
Stratix EP1S10
Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004.
29
IP Cores
• Processor cores
Start-Core
– 16-bit fixed-point VLIW DSP core from Lucent/Motorola (a company is
established by Lucent for DSP section called “Agere”)
– First VLIW machine to target low-power applications
– Pipeline relatively simple
– Targeting 198 mW @ 300 MHz, 1.5 V
• Hardware cores
Altera DSP coresDevice
–
–
–
–
–
–
–
Type
FIR Compiler
IIR Compiler
FFT/IFFT Compiler
NCO Compiler
Reed-Solomon Compiler
Constellation Mapper/Demapper
Viterbi Compiler
30
SoC (System-on-Chip)
• With the continuing scaling of
modern IC devices, it is now
possible to incorporate
– Micro-processor cores + ASIC
function blocks
– Analog + digital components
– Computation + communication
functions
– I/O, memory + processor
into the same chip to form a
comprehensive “system”.
Thus, the notion of System-onchip (SoC)
• Soc uses intellectual
properties (IPs) that are predesigned modules.
• Designing SoC thus becomes
a task of system integration.
• Challenge issues in SoC
design:
– Interface among IPs from
different venders
– Verification of function
– Physical design challenges
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
31
Design Issues
• Given a DSP application,
which implementation
option should be chosen?
• For a particular
implementation option,
how to achieve optimal
design? Optimal in terms
of what criteria?
• Software design:
– NSP, PDSP
– Algorithms are implemented
as programs.
• Hardware design:
– ASIC, FPGA
– Algorithms are directly
implemented in hardware
modules.
• S/H Co-design: System level
design methodology.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
32
Design Process Model
• Design is the process
that links algorithm to
implementation
• Algorithm
– Operations
– Dependency between
operations determines a
partial ordering of
execution
– Can be specified as a
dependence graph
• Implementation
– Assignment: Each
operation can be realized
with
• One or more instructions
(software)
• One or more function
modules (hardware)
– Scheduling: Dependence
relations and resource
constraints leads to a
schedule.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
33
A Design Example …
Consider the algorithm:
• Operations:
– Multiplication
– Addition
n
y   a(k ) x(k )
k 1
• Dependency
– y(k) depends on y(k-1)
– Dependence Graph:
Program:
y(0) = 0
For k = 1 to n Do
y(k) = y(k-1)+ a(k)*x(k)
End
y = y(n)
a(1) x(1) a(2) x(2)
y(0)
a(n) x(n)
*
*
*
+
+
+
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
y(n)
34
Design Example cont’d …
• Software Implementation:
– Map each * op. to a MUL
instruction, and each + op. to a
ADD instruction.
– Allocate memory space for
{a(k)}, {x(k)}, and {y(k)}
– Schedule the operation by
sequentially execute
y(1)=a(1)*x(1), y(2)=y(1) +
a(2)*x(2), etc.
– Note that each instruction is
still to be implemented in
hardware.
• Hardware Implementation:
– Map each * op. to a multiplier,
and each + op. to an adder.
– Interconnect them according to
the dependence graph:
a(1) x(1) a(2) x(2)
y(0)
a(n) x(n)
*
*
*
+
+
+
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
y(n)
35
Observations
• Eventually, an
implementation is
realized with hardware.
• However, by using the
same hardware to realize
different operations at
different time
(scheduling), we have a
software program!
• Bottom line – Hardware/
software co-design.
There is a continuation
between hardware and
software implementation.
• A design must explore
both simultaneously to
achieve best
performance/cost tradeoff.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
36
A Theme
• Matching hardware to
algorithm
– Hardware architecture
must match the
characteristics of the
algorithm.
– Example: ASIC architecture
is designed to implement a
specific algorithm, and
hence can achieve superior
performance.
• Formulate algorithm to match
hardware
– Algorithm must be formulated
so that they can best exploit
the potential of architecture.
– Example: GPP, PDSP
architectures are fixed. One
must formulate the algorithm
properly to achieve best
performance. Eg. To minimize
number of operations.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
37
Algorithm Reformulation
• Algorithmic level equivalence
– Different filter structures implementing the same specification
• Exploiting parallelism
– Regular iterative algorithms and loop reformulation
• Well studied in parallel compiler technology
– Signal flow/Data flow representation
• Suitable for specification of pipelining
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
38
Mapping Algorithm to Architecture
• Scheduling and Assignment Problem
– Resources: hardware modules, and time slots
– Demands: operations (algorithm), and throughput
• Constrained optimization problem
– Minimize resources (objective function) to meet demands
(constraints)
• For regular iterative algorithms and regular processor
arrays -> algebraic mapping.
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
39
Implementation process for PDSP
[Wiangtong05]
40
Direct Mapping Techniques
[Wiangtong05]
41
FIR Filters
[DSPPrimer-Slides]
42
Transposed FIR Filter
• Algorithm transform techniques:
– Pipelining and parallelism,
– retiming,
– Unfolding-loop unrolling
[DSPPrimer-Slides]
43
Example: One-to-one mapping and pipelining
A
B
C
D
allocation
A
B
C
D
assignment
A
B
C
D
pipelining
A
B
clocked flip-flop
C
Analyse timing
• if OK then stop
• else pipelining
D

ff
clock
[Meerbergen-Slides]
44
Coware SPW Design Flow
www.coware.com
45
System-level design flow: Simulink-Altera
[AlteraDSP]
46
Arithmetic
• CORDIC
– Compute elementary functions
• Distributed arithmetic
– ROM based implementation
47
Floating to fixed point analysis
• Overflow of the number range
• Large errors in the output signal occur when the available number
range is exceeded— overflow.
• Round-off errors
• Rounding or truncation of products must be done in recursive loops
so that the word length does not increase for each iteration.
• Coefficient errors
• Coefficients can only be represented with finite precision.
•
•
•
•
Design for fixed-point arithmetic:
Peak value estimation
Word-length optimization
Saturation arithmetic
48
References
In order to prepare these slides, the following material is
used:
• Slides from [Hu04-Slides] “Design and Implementation of
Signal Processing Systems: An Introduction” are copied
with permission.
• Slides from [DSPPrimer-Slides] and [Meerbergen-Slides]
• [Richards04], [AlteraDSP], [Seshan98]
• Details about these references can be found at:
http://www.site.uottawa.ca/~mbolic/elg6163/References.htm
49