Multi-Stage Algorithm Infrastructure Design and

Download Report

Transcript Multi-Stage Algorithm Infrastructure Design and

‫ מכון טכנולוגי לישראל‬- ‫הטכניון‬
‫הפקולטה להנדסת חשמל‬
Technion - Israel institute of technology
department of Electrical Engineering
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
‫דו”ח סיכום פרויקט‬
Subject:
Multi-Stage Algorithm Infrastructure
Design and Implementation
Across Linked FPGAs
Performed by: Eran Tuchman, Gad Tuchman
Supervisor: Mr. Michael Yampolsky
2009 ‫סמסטר חורף‬
1
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Abstract
The subject of this project is to design and implement a combined
software/hardware environment in which multiple algorithm computation units
can be linked together across multiple FPGAs according to a certain multi-stage
data flow.
The platform of the environment is based on GIDEL PROCSTAR-III system.
This is a PC card, connected to the PCI Express bus, which contains four Altera
Stratix-III FPGAs and a set of DDR memories
The multi-stage algorithm computation flow demonstrated by this project is the
"Regularized Particle Filter using GPS/INS" algorithm.
This project is a part of combined system which includes: Software application
and hardware infrastructure design (this project), flow controllers and multiple
computation units intended for testing the real time abilities of the implemented
algorithm.
2
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
System description
The system consist of four FPGAs containing computation units linked together by
streaming units intended to ease flow controllers development.
The streaming units allow serialization/de-serialization of structured data words,
handle DDR RAM/FIFO memory connections and also inter/intra-chip connections.
System’s inputs are INS and GPS readings received at the input FIFOs.
The INS reading is received every 10msec while the GPS reading is received once
a second. Real-time requirements demand that both readings be handled within
10msec since arrival.
Each computation iteration involves the transfer of 30,000 data structures called
particles containing 18 state variables, 24 bits each.
3
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
System Block Diagram
X’
X
S.B.
M.B.
_tmp
W’
W
S.B.
S.B.
M.B.
Propagation
Dk
Computation
M.B.
Dk
Regularization
GPS+Neff
X^
Revaluation
GPS
Resampling
M.F.
INS
S.F.
M.F.
Reweight
M.F.
S.F.
S.F.
GPS
IC1
INS
X^
Debug
IC2
IC3
IC4
Debug
Debug
Debug
GPS
4
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Algorithm’s Data Flow
GPS Iteration
INS Iteration
• Same color means
simultaneous computation.
5
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Algorithm’s Timeline
Initialization
for K-1
Propagation
to K
Revaluation
for K
t
K-1
GPS_Update
Revaluation
for K
Neff_Check
Dk_Computation
Resampling
Regularization
Reweight
Propagation
to K+1
Revaluation
for K+1
t
K
Propagation
to K+2
Revaluation
for K+2
t
K+1
Propagation
to K+3
Revaluation
for K+3
t
K+2
GPS_Update
Revaluation
for K+100
Neff_Check
Dk_Computation
Resampling
Regularization
Reweight
Propagation
to K+101
Revaluation
for K+101
t
K+100
Propagation
to K+102
Revaluation
for K+102
t
K+101
Propagation
to K+103
Revaluation
for K+103
t
K+102
6
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Inter-Chip Communication
Wait
Data, Data-Valid
I/O
I/O
I/O
User’s
Logic
shifted
clk
plus
clk
I/O
User’s
Logic
shifted
clk
minus
shifted
clk
plus
PLL
(Based on clk0)
clk
PLL
(Based on clk0)
clk_minus
clk
clk_plus
7
shifted
clk
minus
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Streaming Blocks Architecture
Control
Write request
Input Path
Output
Path
Finished
Start
Data in
Basic
Streaming
Block
Full
Empty
Read request
Data out
• 4 x 24Bit words data bus
• FIFO-Like streaming interfaces ( Request + Empty / Full )
• Controllable Start/Finished activation mechanism
8
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Application GUI
Init
Read
Memory
Input
FIFOs
Write
Memory
Reset
Reload
FPGAs
Output
FIFO
Quick
Launch
Iteration
Counter
9
Next
Step
Last
Duration
‫המעבדה למערכות ספרתיות מהירות‬
High speed digital systems laboratory
Summary and Conclusions
• The suggested multi-stage algorithm infrastructure is suitable for the
specified RPF algorithm.
• Inter-chip communication capable running at 140Mhz using current double
data rate I/Os. (Total 280Mhz)
• Computation units must keep up the average processing rate of
particle/50ns with maximal combined latency of 2ms in order that
the full system keep up with the real-time requirements.
• The implemented streaming units may be used for additional multi-stage
algorithms.
10