A Reconfigurable Load Balancing Architecture for Molecular

Download Report

Transcript A Reconfigurable Load Balancing Architecture for Molecular

Analysis of FPGA based
Kalman Filter Architectures
Arvind Sudarsanam
Dissertation Defense
12 March 2010
03/12/2010
1
Outline




Introduction
Literature review
PolyFSA architecture
Architecture analysis





Area analysis
Error analysis
Performance analysis
Contributions
Future work
03/12/2010
2
Outline




Introduction
Literature review
PolyFSA architecture
Architecture analysis





Area analysis
Error analysis
Performance analysis
Contributions
Future work
03/12/2010
3
Kalman filters for Spacecraft navigation
03/12/2010
4
Kalman filters
03/12/2010
5
Research overview

An FPGA based Polymorphic systolic array architecture
is proposed to accelerate Kalman filters - Portions of this
architecture can be reused for other applications during
run-time

A comprehensive architecture analysis is presented.
Results are presented in terms of area savings for
varying performance and precision error.
03/12/2010
6
Outline




Introduction
Literature review
PolyFSA architecture
Architecture analysis





Area analysis
Error analysis
Performance analysis
Contributions
Future work
03/12/2010
7
Hardware design for Kalman filters
- Systolic arrays

Yeh [7], M. Lu [8] and P. Rao [9] proposed systolic array
architectures for Kalman filters based on Faddeev
algorithm

Cardoso et. al [11] proposed a hardware software
co-processor system


Profiling is used to guide partitioning by designer
C2H [12] tool from Altera used to generate RTL designs
But these architectures are not scalable.

Some efforts [15-20] target individual linear algebra
operations, like matrix inverse.
03/12/2010
8
Error analysis

Initial efforts [28-35] were targeted towards analyzing
variable precision fixed-point arithmetic

Constantinides [36-45] proposed multiple ideas
towards error analysis for fixed-point arithmetic

Availability of FPGAs has caused a surge in work
towards developing variable precision architectures,
especially in the floating point domain [46-53]
03/12/2010
9
Performance and area analysis


Existing performance and area estimation approaches
target a parameter-specific architecture [72]
Parameters include:
 Overall data path width
 Memory size
 Number of processing elements
Proposed research is also parameter-specific, but looks
at latency, precision and input rates of floating point
arithmetic units
03/12/2010
10
Outline



Introduction
Literature review
PolyFSA architecture






Application analysis
Mapping to Systolic array
Architecture details
Architecture analysis
Contributions
Future work
03/12/2010
11
Extended Kalman Filter
03/12/2010
12
Faddeev algorithm



Faddeev algorithm is a method for efficiently
computing the Schur complement (D - CA-1B)
Given matrices A,B,C,D, arrange in matrix M as:
Reduce to row echelon form and D-CA-1B will result
in the lower right corner
D-CA-1B
03/12/2010
13
Faddeev algorithm
03/12/2010
14
Faddeev algorithm – Single node
Boundary node
03/12/2010
Internal node
15
Mapping to systolic array
Simplify data flow
Mapping to 1-D
Systolic array
Folding to make
systolic array scalable
03/12/2010
16
Architecture details for boundary PE
Details for internal PE are similar
03/12/2010
17
Control flow
03/12/2010
18
Results

Target FPGA – Xilinx Virtex 4 SX35

Test case is derived from [Ronnback-2000]

Performance is compared against a software
implementation on a Virtutech Simics PowerPC 750
simulator (Thanks: Rob Barnes [79])
03/12/2010
19
Performance of proposed PolyFSA
Overall execution time of EKF on PolyFSA
based system architecture and PowerPC
03/12/2010
Estimated execution of Faddeev
algorithm for varying number of
PEs and Faddeev Parameters
20
Outline




Introduction
Literature review
PolyFSA architecture
Architecture analysis





Area analysis
Error analysis
Performance analysis
Contributions
Future work
03/12/2010
21
Architecture analysis

During design time, each PE in the proposed PolyFSA is
derived for best performance and with highest precision
QUESTION: By allowing for degradation in performance
and/or tolerating precision error, can we reconfigure the
existing PE with a set of smaller PEs?
03/12/2010
22
Design parameters that can be varied

Precision of




Latency of




Adder unit (madd)
Multiplier unit (mmul)
Divider unit (mdiv)
Adder unit (LatAdd)
Multiplier unit (LatMul)
Divider unit (LatDiv)
Input rate of the divider (c_rate)
03/12/2010
23
Area analysis – Adder unit
03/12/2010
24
Area analysis – Multiplier unit
03/12/2010
25
Area analysis – Divider unit
03/12/2010
26
Area analysis – Divider unit
03/12/2010
27
Error analysis – Top-level flow
03/12/2010
28
Faddeev algorithm - Error vs Precision
03/12/2010
29
Error analysis for EKF
03/12/2010
30
EKF – Area Savings vs Error
03/12/2010
31
Performance analysis
Major portion of
execution time
03/12/2010
32
Calculation of Tfaddeev

Execution time of Faddeev algorithm on the proposed
PolyFSA is computed using a simulation model

We are interested in observing the impact of performance
degradation on resource utilization

Results are shown for overall execution of EKF
03/12/2010
33
Performance analysis – Vary latency
03/12/2010
34
Performance analysis – Vary c_rate
03/12/2010
35
Area versus Performance
03/12/2010
36
3-D Pareto curves
03/12/2010
37
Summary




An FPGA based Polymorphic Faddeev Systolic Array
(PolyFSA) architecture is proposed to accelerate the
compute-intensive kernels of Kalman filters.
Hierarchical analysis of the error introduced in results of
Kalman filter computations due to reduction in precision
is presented.
Simulation model to estimate the overall execution time
of the Kalman filter algorithm is proposed.
Results of architecture analysis are presented in terms of
pareto curves.
03/12/2010
38
Future work



Proposed methodology – architecture design supported
by analysis – can be applied to design for other
applications
Design goals can be extended to incorporate Power
consumption
Design parameters can be extended to include other
options – Implementation type, FPGA family type etc.
03/12/2010
39