High Performance Embedded Computing Software Initiative

Download Report

Transcript High Performance Embedded Computing Software Initiative

High Performance Embedded Computing
Software Initiative (HPEC-SI)
Dr. Jeremy Kepner
MIT Lincoln Laboratory
This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily
endorsed by the United States Government.
Slide-1
HPEC-SI
MITRE
MIT Lincoln Laboratory
www.hpec-si.org
AFRL
Outline
• DoD Need
• Program Structure
• Introduction
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-2
www.hpec-si.org
MITRE
Lincoln
AFRL
Overview - High Performance
Embedded Computing (HPEC) Initiative
Common Imagery Processor (CIP)
DARPA
Shared memory server
Embedded
multiprocessor
Applied Research
Development
HPEC
Software
Initiative
ASARS-2
Demonstration
Programs
Challenge: Transition advanced
software technology and practices
into major defense acquisition
programs
Slide-3
www.hpec-si.org
MITRE
Enhanced Tactical Radar Correlator
(ETRAC)
Lincoln
AFRL
Why Is DoD Concerned with
Embedded Software?
$3.0
Source: “HPEC Market Study” March 2001
Estimated DoD expenditures
for embedded signal and
image processing hardware
and software ($B)
$2.0
$1.0
FY
9
8
$0.0
•
COTS acquisition practices have shifted the burden from “point design”
hardware to “point design” software
•
Software costs for embedded systems could be reduced by one-third
with improved programming models, methodologies, and standards
Slide-4
www.hpec-si.org
MITRE
Lincoln
AFRL
Issues with Current HPEC Development
Inadequacy of Software Practices & Standards
• High Performance Embedded
Predator
Computing pervasive through DoD
applications
U-2
Global Hawk
MK-48 Torpedo
JSTARS
– Airborne Radar Insertion program
MSAT-Air
85% software rewrite for each hardware
platform
Rivet Joint
Standard
Missile
– Missile common processor
F-16
Processor board costs < $100k
Software development costs > $100M
– Torpedo upgrade
NSSN
AEGIS
Two software re-writes required after changes
in hardware design
P-3/APS-137
System Development/Acquisition Stages
4 Years
4 Years
4 Years
Program
Milestones
System Tech.
Development
System Field
Demonstration
Engineering/
manufacturing
Development
Insertion to
Military Asset
Signal Processor
1st gen. 2nd gen. 3rd gen. 4th gen. 5th gen.
Evolution
Slide-5
www.hpec-si.org
MITRE
Today – Embedded Software Is:
• Not portable
• Not scalable
• Difficult to develop
• Expensive to maintain
6th gen.
Lincoln
AFRL
Evolution of Software Support Towards
“Write Once, Run Anywhere/Anysize”
DoD software
development
COTS
development
Application
Application
Middleware
Application
Application
Embedded SW
Standards
Vendor
Software
Vendor
Software
Vendor
Software
Vendor SW
1990
Application
Middleware
Slide-6
www.hpec-si.org
Middleware
MITRE
2000
Application
Middleware
•
Application software has traditionally
been tied to the hardware
•
Many acquisition programs are
developing stove-piped middleware
“standards”
•
Open software standards can provide
portability, performance, and
productivity benefits
•
Support “Write Once, Run
Anywhere/Anysize”
2005
Application
Middleware
Lincoln
AFRL
Overall Initiative Goals & Impact
Program Goals
• Develop and integrate software
•
•
technologies for embedded
parallel systems to address
portability, productivity, and
performance
Engage acquisition community
to promote technology
insertion
Deliver quantifiable benefits
Demonstrate
HPEC
Software
Initiative
Portability:
reduction in lines-of-code to
change port/scale to new
system
Productivity: reduction in overall lines-ofcode
Performance: computation and
communication benchmarks
Slide-7
www.hpec-si.org
MITRE
Interoperable & Scalable
Performance (1.5x)
Lincoln
AFRL
HPEC-SI Path to Success
Benefit to DoD Programs
•
•
•
•
HPEC Software Initiative
builds on
• Proven technology
• Business models
• Better software
practices
Reduces software cost & schedule
Enables rapid COTS insertion
Improves cross-program interoperability
Basis for improved capabilities
Benefit to DoD Contractors
• Reduces software complexity & risk
• Easier comparisons/more competition
• Increased functionality
Benefit to Embedded Vendors
• Lower software barrier to entry
• Reduced software maintenance costs
• Evolution of open standards
Slide-8
www.hpec-si.org
MITRE
Lincoln
AFRL
Organization
Technical Advisory Board
Dr. Rich Linderman AFRL
Dr. Richard Games MITRE
Mr. John Grosh OSD
Mr. Bob Graybill DARPA/ITO
Dr. Keith Bromley SPAWAR
Dr. Mark Richards GTRI
Dr. Jeremy Kepner MIT/LL
Executive Committee
Dr. Charles Holland PADUSD(S+T)
RADM Paul Sullivan N77
Government Lead
Dr. Rich Linderman AFRL
Demonstration
Development
Applied Research
Advanced Research
Dr. Keith Bromley SPAWAR
Dr. Richard Games MITRE
Dr. Jeremy Kepner, MIT/LL
Mr. Brian Sroka MITRE
Mr. Ron Williams MITRE
...
Dr. James Lebak MIT/LL
Dr. Mark Richards GTRI
Mr. Dan Campbell GTRI
Mr. Ken Cain MERCURY
Mr. Randy Judd SPAWAR
...
Mr. Bob Bond MIT/LL
Mr. Ken Flowers MERCURY
Dr. Spaanenburg PENTUM
Mr. Dennis Cottel SPAWAR
Capt. Bergmann AFRL
Dr. Tony Skjellum MPISoft
...
Mr. Bob Graybill DARPA
•
•
Slide-9
www.hpec-si.org
Partnership with ODUSD(S&T), Government Labs, FFRDCs,
Universities, Contractors, Vendors and DoD programs
Over 100 participants from over 20 organizations
MITRE
Lincoln
AFRL
Outline
• Introduction
• Standards Overview
• Future Standards
• Software Standards
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-10
www.hpec-si.org
MITRE
Lincoln
AFRL
Emergence of Component Standards
Parallel Embedded Processor
System
Controller
Node
Controller
Data Communication:
MPI, MPI/RT, DRI
Control
Communication:
CORBA, HP-CORBA
P0
Consoles
P1
Other
Computers
HPEC Initiative - Builds on
completed research and existing
standards and libraries
Slide-11
www.hpec-si.org
MITRE
P2
Lincoln
P3
Computation:
VSIPL
VSIPL++, ||VSIPL++
Definitions
VSIPL = Vector, Signal, and Image
Processing Library
||VSIPL++ = Parallel Object Oriented VSIPL
MPI = Message-passing interface
MPI/RT = MPI real-time
DRI = Data Re-org Interface
CORBA = Common Object Request Broker
Architecture
HP-CORBA = High Performance CORBA
AFRL
The Path to Parallel VSIPL++
(world’s first parallel object oriented standard)
Time
Phase 3
Applied Research:
Self-optimization
Phase 2
Applied Research:
Fault tolerance
Phase 1
Applied Research:
prototype
Unified Comp/Comm Lib
Development:
VSIPL++
Object-Oriented Standards
Development:
prototype
Fault tolerance
Demonstration:
Development:
Parallel
Unified Comp/Comm Lib VSIPL++
Unified Comp/Comm Lib
Demonstration:
Object-Oriented Standards
Demonstration:
Existing Standards
VSIPL++
VSIPL
MPI
Demonstrate insertions into
fielded systems (e.g., CIP)
•
Functionality
•First demo successfully completed
•VSIPL++ v0.5 spec completed
•VSIPL++ v0.1 code available
•Parallel VSIPL++ spec in progress
•High performance C++ demonstrated
High-level code
abstraction
•
Reduce code size 3x
Parallel
VSIPL++
Unified embedded
computation/
communication
standard
•Demonstrate scalability
Demonstrate 3x portability
Slide-12
www.hpec-si.org
MITRE
Lincoln
AFRL
Working Group Technical Scope
Development
Applied Research
VSIPL++
Parallel VSIPL++
-MAPPING (data parallelism)
-Early binding (computations)
-Compatibility (backward/forward)
-Local Knowledge (accessing local data)
-Extensibility (adding new functions)
-Remote Procedure Calls (CORBA)
-C++ Compiler Support
-Test Suite
-Adoption Incentives (vendor, integrator)
Slide-13
www.hpec-si.org
MITRE
Lincoln
-MAPPING (task/pipeline parallel)
-Reconfiguration (for fault tolerance)
-Threads
-Reliability/Availability
-Data Permutation (DRI functionality)
-Tools (profiles, timers, ...)
-Quality of Service
AFRL
Overall Technical Tasks and Schedule
Task Name
VSIPL (Vector, Signal, and
Image Processing Library)
Near
FY01
FY02
CIP
Mid
FY04
FY03
FY05
Demo 2
FY06
Long
FY07
FY08
Applied Research
Development
MPI (Message Passing
Interface)
CIP
Demo 2
Demonstrate
Demo 3
VSIPL++ (Object Oriented)
v0.1 Spec
v0.1 Code
v0.5 Spec & Code
v1.0 Spec & Code
Demo 4
Demo 5
Parallel VSIPL++
v0.1 Spec
v0.1 Code
v0.5 Spec & Code
v1.0 Spec & Code
Demo 6
Fault Tolerance/
Self Optimizing Software
Slide-14
www.hpec-si.org
MITRE
Lincoln
AFRL
HPEC-SI Goals
1st Demo Achievements
Portability: zero code changes required
Productivity: DRI code 6x smaller vs MPI (est*)
Performance: 3x reduced cost or form factor
Demonstrate
Achieved 10x+
Goal 3x
Portability
Achieved 6x*
Goal 3x
Productivity
HPEC
Software
Initiative
Interoperable & Scalable
Slide-15
www.hpec-si.org
MITRE
Lincoln
Performance
Goal 1.5x
Achieved 2x
AFRL
Outline
• Introduction
• Software Standards
• Technical Basis
• Examples
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-16
www.hpec-si.org
MITRE
Lincoln
AFRL
Parallel Pipeline
Signal Processing Algorithm
Filter
XOUT = FIR(XIN )
Beamform
XOUT = w *XIN
Detect
XOUT = |XIN|>c
Mapping
Parallel
Computer
Slide-17
www.hpec-si.org
• Data Parallel within stages
Lincoln stages
Task/Pipeline Parallel across
MITRE•
AFRL
Types of Parallelism
Input
Scheduler
FIR
FIlters
Slide-18
www.hpec-si.org
MITRE
Beamformer 1
Beamformer 2
Detector
1
Detector
2
Lincoln
AFRL
Current Approach to Parallel Code
Algorithm + Mapping
Stage 1
Stage 2
Proc
1
Proc
3
Proc
2
Proc
4
Proc
5
Proc
6
Slide-19
www.hpec-si.org
Code
while(!done)
{
if ( rank()==1 || rank()==2 )
stage1 ();
else if ( rank()==3 || rank()==4 )
stage2();
}
while(!done)
{
if ( rank()==1 || rank()==2 )
stage1();
else if ( rank()==3 || rank()==4) ||
rank()==5 || rank==6 )
stage2();
}
• Algorithm and hardware mapping are linked
Lincoln and non-portable
MITRE
• Resulting code is non-scalable
AFRL
Scalable Approach
Single Processor Mapping
#include <Vector.h>
#include <AddPvl.h>
A =B +C
void addVectors(aMap, bMap, cMap) {
Vector< Complex<Float> > a(‘a’, aMap, LENGTH);
Vector< Complex<Float> > b(‘b’, bMap, LENGTH);
Vector< Complex<Float> > c(‘c’, cMap, LENGTH);
Multi Processor Mapping
b = 1;
c = 2;
a=b+c;
A =B +C
}
Lincoln Parallel Vector Library (PVL)
• Single processor and multi-processor code are the same
• Maps can be changed without changing software
• High level code is compact
Slide-20
www.hpec-si.org
MITRE
Lincoln
AFRL
C++ Expression Templates and PETE
Expression
A=B+C*D
Main
Parse Tree
1. Pass B and C
references to
operator +
BinaryNode<OpAssign, Vector,
BinaryNode<OpAdd, Vector
BinaryNode<OpMultiply, Vector,
Vector >>>
Operator +
2. Create expression
parse tree
Expression Type
+
B&
C&
3. Return expression
parse tree
Parse trees, not vectors, created
4. Pass expression tree
reference to operator
Operator =
5. Calculate result and
perform assignment B+C
A
• Expression Templates enhance performance
by allowing temporary variables to be avoided
Slide-21
www.hpec-si.org
MITRE
Lincoln
AFRL
PETE Linux Cluster Experiments
A=B+C*D
1.3
1.2
VSIPL
PVL/VSIPL
PVL/PETE
1.1
1
0.9
Relative Execution Time
1.2
1.2
1.1
1.1
1
0.9
0.8
0.7
VSIPL
PVL/VSIPL
PVL/PETE
1
0.9
VSIPL
PVL/VSIPL
PVL/PETE
0.8
0.6
Vector Length
A=B+C*D/E+fft(F)
Relative Execution Time
Relative Execution Time
A=B+C
Vector Length
Vector Length
• PVL with VSIPL has a small overhead
• PVL with PETE can surpass VSIPL
Slide-22
www.hpec-si.org
MITRE
Lincoln
AFRL
PowerPC AltiVec Experiments
Results
Hand coded loop achieves good
performance, but is problem specific
and low level
Optimized VSIPL performs well for
simple expressions, worse for more
complex expressions
PETE style array operators perform
almost as well as the hand-coded loop
and are general, can be composed,
and are high-level
•
•
•
A=B+C
A=B+C*D+E*F
A=B+C*D
A=B+C*D+E/F
Software Technology
AltiVec loop
•
•
•
•
•
C
For loop
Direct use of AltiVec extensions
Assumes unit stride
Assumes vector alignment
Slide-23
www.hpec-si.org
MITRE
VSIPL (vendor optimized)
• C
• AltiVec aware VSIPro Core Lite
(www.mpi-softtech.com)
• No multiply-add
• Cannot assume unit stride
• Cannot assume vector alignment
Lincoln
PETE with AltiVec
•
•
•
•
•
C++
PETE operators
Indirect use of AltiVec extensions
Assumes unit stride
Assumes vector alignment
AFRL
Outline
• Introduction
• Software Standards
• Technical Basis
• Examples
• Parallel VSIPL++
• Future Challenges
• Summary
Slide-24
www.hpec-si.org
MITRE
Lincoln
AFRL
A = sin(A) + 2 * B;
•
Generated code (no temporaries)
for (index i = 0; i < A.size(); ++i)
A.put(i, sin(A.get(i)) + 2 * B.get(i));
•
Apply inlining to transform to
for (index i = 0; i < A.size(); ++i)
Ablock[i] = sin(Ablock[i]) + 2 * Bblock[i];
•
Apply more inlining to transform to
T* Bp = &(Bblock[0]); T* Aend = &(Ablock[A.size()]);
for (T* Ap = &(Ablock[0]); Ap < pend; ++Ap, ++Bp)
*Ap = fmadd (2, *Bp, sin(*Ap));
•
Or apply PowerPC AltiVec extensions
• Each step can be automatically generated
• Optimization level whatever vendor desires
Slide-25
www.hpec-si.org
MITRE
Lincoln
AFRL
BLAS zherk Routine
•
•
•
BLAS = Basic Linear Algebra Subprograms
Hermitian matrix M: conjug(M) = Mt
zherk performs a rank-k update of Hermitian matrix C:
C  a  A  conjug(A)t + b  C
•
VSIPL code
A = vsip_cmcreate_d(10,15,VSIP_ROW,MEM_NONE);
C = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);
tmp = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE);
vsip_cmprodh_d(A,A,tmp); /* A*conjug(A)t */
vsip_rscmmul_d(alpha,tmp,tmp);/* a*A*conjug(A)t */
vsip_rscmmul_d(beta,C,C); /* b*C */
vsip_cmadd_d(tmp,C,C); /* a*A*conjug(A)t + b*C */
vsip_cblockdestroy(vsip_cmdestroy_d(tmp));
vsip_cblockdestroy(vsip_cmdestroy_d(C));
vsip_cblockdestroy(vsip_cmdestroy_d(A));
•
VSIPL++ code (also parallel)
Matrix<complex<double> > A(10,15);
Matrix<complex<double> > C(10,10);
C = alpha * prodh(A,A) + beta * C;
Slide-26
www.hpec-si.org
MITRE
Lincoln
AFRL
Simple Filtering Application
int main () {
using namespace vsip;
const length ROWS = 64;
const length COLS = 4096;
vsipl v;
FFT<Matrix, complex<double>, complex<double>, FORWARD, 0, MULTIPLE, alg_hint ()>
forward_fft (Domain<2>(ROWS,COLS), 1.0);
FFT<Matrix, complex<double>, complex<double>, INVERSE, 0, MULTIPLE, alg_hint ()>
inverse_fft (Domain<2>(ROWS,COLS), 1.0);
const Matrix<complex<double> > weights (load_weights (ROWS, COLS));
try {
while (1) output (inverse_fft (forward_fft (input ()) * weights));
}
catch (std::runtime_error) {
// Successfully caught access outside domain.
}
}
Slide-27
www.hpec-si.org
MITRE
Lincoln
AFRL
Explicit Parallel Filter
#include <vsiplpp.h>
using namespace VSIPL;
const int ROWS =
64;
const int COLS =
4096;
int main (int argc, char **argv)
{
Matrix<Complex<Float>> W (ROWS,
Matrix<Complex<Float>> X (ROWS,
load_weights (W)
try
{
while (1)
{
input (X);
Y = IFFT ( mul (FFT(X),
output (Y);
}
}
catch (Exception &e) {cerr << e
}
Slide-28
www.hpec-si.org
MITRE
COLS, "WMap");
COLS, "WMap");
// weights matrix
// input matrix
// some input function
W));
// some output function
<< endl};
Lincoln
AFRL
Multi-Stage Filter (main)
using namespace vsip;
const length ROWS
const length COLS
=
=
64;
4096;
int main (int argc, char **argv) {
sample_low_pass_filter<complex<float> > LPF();
sample_beamform<complex<float> >
BF();
sample_matched_filter<complex<float> >
MF();
try
{
while (1) output (MF(BF(LPF(input ()))));
}
catch (std::runtime_error) {
// Successfully caught access outside domain.
}
}
Slide-29
www.hpec-si.org
MITRE
Lincoln
AFRL
Multi-Stage Filter (low pass filter)
template<typename T>
class sample_low_pass_filter<T>
{
public:
sample_low_pass_filter()
: FIR1_(load_w1 (W1_LENGTH), FIR1_LENGTH),
FIR2_(load_w2 (W2_LENGTH), FIR2_LENGTH)
{ }
Matrix<T> operator () (const Matrix<T>& Input) {
Matrix<T> output(ROWS, COLS);
for (index row=0; row<ROWS; row++)
output.row(row) = FIR2_(FIR1_(Input.row(row)).second).second;
return output;
}
private:
FIR<T, SYMMETRIC_ODD, FIR1_DECIMATION, CONTINUOUS, alg_hint()> FIR1_;
FIR<T, SYMMETRIC_ODD, FIR2_DECIMATION, CONTINUOUS, alg_hint()> FIR2_;
}
Slide-30
www.hpec-si.org
MITRE
Lincoln
AFRL
Multi-Stage Filter (beam former)
template<typename T>
class sample_beamform<T>
{
public:
sample_beamform() : W3_(load_w3 (ROWS,COLS)) { }
Matrix<T> operator () (const Matrix<T>& Input) const
{ return W3_ * Input; }
private:
const Matrix<T> W3_;
}
Slide-31
www.hpec-si.org
MITRE
Lincoln
AFRL
Multi-Stage Filter (matched filter)
template<typename T>
class sample_matched_filter<T>
{
public:
matched_filter()
: W4_(load_w4 (ROWS,COLS)),
forward_fft_ (Domain<2>(ROWS,COLS), 1.0),
inverse_fft_ (Domain<2>(ROWS,COLS), 1.0)
{}
Matrix<T> operator () (const Matrix<T>& Input) const
{ return inverse_fft_ (forward_fft_ (Input) * W4_); }
private:
const Matrix<T> W4_;
FFT<Matrix<T>, complex<double>, complex<double>,
FORWARD, 0, MULTIPLE, alg_hint()> forward_fft_;
FFT<Matrix<T>, complex<double>, complex<double>,
INVERSE, 0, MULTIPLE, alg_hint()> inverse_fft_;
}
Slide-32
www.hpec-si.org
MITRE
Lincoln
AFRL
Outline
• Introduction
• Software Standards
• Parallel VSIPL++
• Fault Tolerance
• Self Optimization
• High Level Languages
• Future Challenges
• Summary
Slide-33
www.hpec-si.org
MITRE
Lincoln
AFRL
Dynamic Mapping for Fault Tolerance
Map1
XOUT
XIN
Nodes: 0,2
Map0
Map2
Nodes: 0,1
Nodes: 1,3
Input
Task
Output
Task
Failure
Parallel Processor
Slide-34
www.hpec-si.org
MITRE
Spare
• Switching processors is
accomplished by switching maps
Lincoln required
• No change to algorithm
AFRL
Dynamic Mapping Performance Results
Relative Time
100
Remap/Baseline
Redundant/Baseline
Rebind/Baseline
10
1
0
32
128
512
2048
Data Size
• Good dynamic mapping performance is possible
Slide-35
www.hpec-si.org
MITRE
Lincoln
AFRL
Optimal Mapping of Complex Algorithms
Application
Input
XIN
Low Pass Filter
XIN
FIR1
FIR2
XOUT
Matched Filter
Beamform
XIN
mult
XOUT
XIN
FFT
IFFT
W1
W4
W3
W2
XOUT
Different Optimal Maps
Intel
Cluster
Workstation
Embedded
Board
PowerPC
Cluster
• Need to automate process of
mapping
MITREalgorithm to hardware
Slide-36
www.hpec-si.org
Embedded
Multi-computer
Hardware
Lincoln
AFRL
Self-optimizing Software
for Signal Processing
Problem Size
Small (48x4K)
Large (48x128K)
25
1.5
Latency
(seconds)
1-1-1-1
1-1-1-1
predicted
achieved
predicted
achieved
1.0
•
1-1-2-2
15
1-2-2-1
1-2-2-2
2-2-2-2
1-2-2-3
0.5
5.0
•
S3P selects correct
optimal mapping
•
Excellent agreement
between S3P predicted
and achieved latencies
and throughputs
10
0.25
1-2-2-2
1-3-2-2
1-3-2-2
1-2-2-2
4.0
0.20
1-2-2-1
1-1-2-2
3.0
0.15
1-1-2-1
Find
– Min(latency | #CPU)
– Max(throughput | #CPU)
1-1-2-1
1-2-2-2
Throughput
(frames/sec)
20
1-1-1-2
1-1-2-1
predicted
achieved
1-1-1-1
1-1-1-1
predicted
achieved
2.0
0.10
4
5
6
#CPU
Slide-37
www.hpec-si.org
MITRE
7
8
4
5
6
7
8
#CPU
Lincoln
AFRL
High Level Languages
• Parallel Matlab need
has been identified
•HPCMO (OSU)
• Required user interface
has been demonstrated
•Matlab*P (MIT/LCS)
•PVL (MIT/LL)
• Required hardware
interface has been
demonstrated
High Performance Matlab Applications
DoD Sensor
Processing
DoD Mission
Planning
Scientific
Simulation
Commercial
Applications
User
Interface
Parallel Matlab
Toolbox
Hardware
Interface
•MatlabMPI (MIT/LL)
Parallel Computing Hardware
• Parallel Matlab Toolbox can now be realized
Slide-38
www.hpec-si.org
MITRE
Lincoln
AFRL
MatlabMPI deployment (speedup)
•
•
•
Image Filtering on IBM SP
at Maui Computing Center
Maui
–
Image filtering benchmark (300x on 304 cpus)
Lincoln
–
–
–
Signal Processing (7.8x on 8 cpus)
Radar simulations (7.5x on 8 cpus)
Hyperspectral (2.9x on 3 cpus)
MIT
–
–
LCS Beowulf (11x Gflops on 9 duals)
AI Lab face recognition (10x on 8 duals)
Other
–
–
–
–
–
–
–
Ohio St. EM Simulations
ARL SAR Image Enhancement
Wash U Hearing Aid Simulations
So. Ill. Benchmarking
JHU Digital Beamforming
ISL Radar simulation
URI Heart modeling
100
Performance (Gigaflops)
•
MatlabMPI
Linear
10
1
0
1
10
100
1000
Number of Processors
• Rapidly growing MatlabMPI user base
demonstrates need for parallel matlab
• Demonstrated scaling to 300 processors
Slide-39
www.hpec-si.org
MITRE
Lincoln
AFRL
Summary
• HPEC-SI Expected benefit
– Open software libraries, programming models, and standards
that provide portability (3x), productivity (3x), and
performance (1.5x) benefits to multiple DoD programs
• Invitation to Participate
– DoD Program offices with Signal/Image Processing needs
– Academic and Government Researchers interested in high
performance embedded computing
– Contact: [email protected]
Slide-40
www.hpec-si.org
MITRE
Lincoln
AFRL
The Links
High Performance Embedded Computing Workshop
http://www.ll.mit.edu/HPEC
High Performance Embedded Computing Software Initiative
http://www.hpec-si.org/
Vector, Signal, and Image Processing Library
http://www.vsipl.org/
MPI Software Technologies, Inc.
http://www.mpi-softtech.com/
Data Reorganization Initiative
http://www.data-re.org/
CodeSourcery, LLC
http://www.codesourcery.com/
MatlabMPI
http://www.ll.mit.edu/MatlabMPI
Lincoln
MITRE
AFRL
Slide-41
www.hpec-si.org