Nallatech Powerpoint Template White

Download Report

Transcript Nallatech Powerpoint Template White

Is it time for Von Neumann
and Harvard to Retire?
Presented by :
Allan Cantle – CEO
www.nallatech.com
1
Commercial In Confidence. Copyright ©2005, Nallatech.
Agenda
» History & Commercial Realities of FPGA Computing
» Thoughts on the future possibilities for FPGA Computing
»
»
»
»
»
FPGA Coprocessor vs FPGA main processor
Optimizing Spatial & Temporal Demands of Computing Problems
Homogeneous vs Heterogeneous vs Polymorphic Computing
Coarse Grain Vs Fine Grain Architectures
Distributed Parallel processing Vs Clustered Parallel processing
» Should Von-Neumann and Harvard Architectures be retired
» Summary
2
Commercial In Confidence. Copyright ©2005, Nallatech.
Introduction
» FPGAs………a 20 year History!
» From “Glue Logic” beginnings to complete “Systems on a
Chip” today.
» Mathematical functions in FPGAs since 1993
» FPGAs pervasive in virtually all electronics equipment
» Many different perspectives on FPGA capability
» This leads to confusion in the market place and mixed
messages
3
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Perceptions
Easy to Program
Flexible
Low Performance
Microprocessor
Difficult to program
Fixed Function
High Performance
DSP
FPGA
ASIC
When an FPGA is viewed as an ASIC……………..…the observer historically saw……
Low performance
Wasted Transistors
4
Never Use in Production
Higher Power Consumption
High Cost
Good for prototyping
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Perceptions
Easy to Program
Flexible
Low Performance
Microprocessor
Difficult to program
Fixed Function
High Performance
DSP
FPGA
ASIC
These were the original views of FPGAs and they have STUCK in many peoples minds
Low performance
Wasted Transistors
5
Never Use in Production
Higher Power Consumption
High Cost
Good for prototyping
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Perceptions
Easy to Program
Flexible
Low Performance
Microprocessor
Difficult to program
Fixed Function
High Performance
DSP
FPGA
ASIC
When an FPGA is viewed as a DSP……………..…today the observer sees ……
Co-Processor
Niche
6
High Performance
I/O Interface
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Perceptions
Easy to Program
Flexible
Low Performance
Microprocessor
Difficult to program
Fixed Function
High Performance
DSP
FPGA
ASIC
When an FPGA is viewed as a processor……………..…Nallatech sees ……
Main Processor
Lower Power Consumption
Floating Point
7
Immature Tools
High Performance
Increased Performance Density
Commercial In Confidence. Copyright ©2005, Nallatech.
Have It Your Way!
Easy to Program
Flexible
Low Performance
Microprocessor
8
Difficult to program
Fixed Function
High Performance
FPGA
ASIC
Commercial In Confidence. Copyright ©2005, Nallatech.
HPEC Vs HPC
» Earlier FPGA adoption within HPEC Community
» FPGA based HPEC is a volume based commercial reality
today
» High Performance Embedded Computing (HPEC)
» Users have an appreciation of underlying hardware technology
» Low level of programming abstraction
» Applications are often severely SWAP restricted
» High Performance Computing (HPC)
»
»
»
»
9
Users are far more software centric
Programming is achieved using high level software languages
Exclusive use of Floating Point arithmetic
Focus on ease of implementation of complex algorithms
Commercial In Confidence. Copyright ©2005, Nallatech.
1993 – Simple Maths Functions
within FPGA
» Simple Mathematical Functions
»
»
»
»
2 bit arithmetic function per Logic Slice
Effective for 1D Pipelined data paths
Very Basic Functions - highly repetitive and Data Intensive
Schematic Hardware designed
» XC4006 Series FPGAs
» 256 Logic Slices
» 128 max user I/O
10
Commercial In Confidence. Copyright ©2005, Nallatech.
Nallatech’s Adoption of FPGAs
for HPEC
Microprocessor
Only
FPGA
coprocessor
Microprocessor
based
11
1990
1993
Pre Nallatech
Professional
Consulting
Services
Commercial In Confidence. Copyright ©2005, Nallatech.
Nallatech’s Early FPGA HPEC
Computing Experience - 1993
»
»
»
Real time, ultra low latency, Imaging Simulator
Embedded Distributed Processing System
Floating point, matrix transformations, convolution, sensor interfacing etc
Gain + Offset Control
Sensor Interface
Convolution
T9
T9
T9
x3
Image Composition
Background Image
Processing
x3
x1
12
x2
T9
T9
x3
FPGA
FPGA
FPGA
x4
C80
C80
C80
C80
FPGA
FPGA
T9
T9
T9
FPGA
x2
Target Image Generation
x2
T9
T9
x6
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
x4
I860
I860
I860
I860
Commercial In Confidence. Copyright ©2005, Nallatech.
1998 – Virtex FPGA Family
» Revolutionary Xilinx Virtex FPGA family Introduced
» 32 x 4Kbits Block RAMs + other mathematical features introduced
» allowed 2D mathematical algorithms to be implemented
» Excellent for Image Processing Algorithms
» Significant DSP capability
» Virtex
» 12,000 Logic Slices
» 804 max user I/O
» 32 4Kbit Block RAMs
13
Commercial In Confidence. Copyright ©2005, Nallatech.
Nallatech’s Adoption of FPGAs
for HPEC
Microprocessor
Only
FPGA
coprocessor
Microprocessor
based
14
1990
1993
Pre Nallatech
Professional
Consulting
Services
Microprocessor
coprocessor
FPGA
based
1998
DIME Product
Family
FPGA Centric
Computing
Architecture
Commercial In Confidence. Copyright ©2005, Nallatech.
2001 – Virtex-II FPGA Family
» Virtex-II FPGA introduced followed by Virtex-II Pro in 2003
» 444 18x18 Multipliers & 18kbit block RAMs introduced
» Gbit Serial I/O Communications & Power PC Processors Introduced
» Complex Floating Point Algorithm Implementation now possible
» Virtex-II / Pro
»
»
»
»
»
»
15
44,000 Logic Slices
444 18Kbits BRAMs
444 18x18 Multipliers
2 PowerPC Processors
20 Gbit I/O
1164 Max User I/O
Commercial In Confidence. Copyright ©2005, Nallatech.
What This Means for HPC
Technology
Clock Speed
Internal Memory
Bandwidth
# Processing Units
Power Consumption
16
Microprocessor
Itanium 2
FPGA
Virtex 2VP100
0.13 Micron
0.13 Micron
1.6GHz
180MHz
102 GBytes per Sec
7.5 TBytes per Sec
5 FPU(2MACs + 1FPU) +
6 MMU
+ 6 Integer Units
130 WATTS
212 FPU or
300+ Integer Units
……….
15 WATTS
Peak Performance
8 GFLOPs
38 GFLOPS
Sustained
Performance
I/O / External Memory
Bandwidth
~2 GFLOPs
~19 GFLOPS
6.4 GBytes/sec
67 GBytes/sec
or
Commercial In Confidence. Copyright ©2005, Nallatech.
Nallatech’s Adoption of FPGAs
for HPEC
Microprocessor
Only
FPGA
coprocessor
Microprocessor
based
17
Microprocessor
coprocessor
FPGA
based
FPGA
Microprocessor
embedded
1990
1993
1998
2001
Pre Nallatech
Professional
Consulting
Services
DIME Product
Family
DIME-II Product
Family
FPGA Centric
Computing
Architecture
FPGA Centric
Computing
Architecture
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Computing
The Whole Solution
» COTS Hardware
» Modular
Hardware
Platform
System
Software
» Multiple-FPGA Systems
» System Management
and control
» APIs
Systems
Communications
» Inter-FPGA Communication
»Abstracts Hardware
Architecture
18
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Communications and Tool
Support
Memory
Open 3rd Party
Component
Support
VHDL
Processors
MATLAB
FPGA
Block Flows
Viva
FPGA
N1
N
PCI Host
From
PCI
E
R
B
19
C Flows - C
NN2
N
N3
Physical
Link
N4
N
N5
N
R
R
N
N6
B
Commercial In Confidence. Copyright ©2005, Nallatech.
Accelerated Hardware
Implementation
Memory
3rd Party
Component
Support
VHDL
Processors
MATLAB
FPGA
Block Flows
Viva
FPGA
N1
N
PCI Host
From
PCI
E
R
B
20
C Flows - C
NN2
N
N3
Physical
Link
N4
N
N5
N
R
R
N
N6
B
Commercial In Confidence. Copyright ©2005, Nallatech.
Commercial Realities for HPC
» Not a Panacea – As with all parallelisation
» Translation of Legacy Code
Legacy Code
Execution Time
100%
FPGA Translated
Execution Time
0%
C Program
21
On Processor
uP Executed
FPGA Executed
Bandwidth &
Latency
Considerations
uP / FPGA Partition
Commercial In Confidence. Copyright ©2005, Nallatech.
Commercial Realities for HPC
» Maturity of High Level Languages
» Good progress has been made in FPGA compilers
» Often Trade off between performance and ease of use
» Parallelising of code
»
»
»
»
Fine Grain parallelism is critical
Still not automatic
Taking implied parallelism from serial code is NOT good enough
HPC Software engineers are well qualified to deal with this
» Development & Debug Time
» Comparable to programming in assembler vs C
» Biggest Hurdle is the Synthesis times –hours to days!
» Where tools can make a significant impact- If they have no bugs!
22
Commercial In Confidence. Copyright ©2005, Nallatech.
Commercial Realities for HPC
» No Real Industry Standardisation
» Requires expertise to “brew your own solutions”
» Difficulty for Beginners
» Bang for Buck – for Floating Point Implementations
» NRE today WILL be more expensive - ~5-10 times
» Can approach 100 times performance for 1/2th the Cost
» Significantly reduced SWAP, >200 times,
» Result in a significant Cost of Ownership savings
23
Commercial In Confidence. Copyright ©2005, Nallatech.
Nallatech’s Adoption of FPGAs
for HPEC
Microprocessor
Only
FPGA
coprocessor
Microprocessor
based
24
Microprocessor
coprocessor
FPGA
based
FPGA
FPGA
Microprocessor
embedded
1990
1993
1998
2001
2003
Pre Nallatech
Professional
Consulting
Services
DIME Product
Family
DIME-II Product
Family
FPGA Based
HPC Solutions
FPGA Centric
Computing
Architecture
FPGA Centric
Computing
Architecture
Commercial In Confidence. Copyright ©2005, Nallatech.
Algorithm Acceleration
Seismic Processing
- Kirchhoff algorithm
- Single Precision Floating
Point
- 64 times faster than a
2GHz Pentium 4
-200 times less power
consumption
Smith-Waterman
- Dynamic Programming
Algorithm used in
Biological Sequencing
- 155 times faster than
SunFIRE 280R processing
unit
25
Commercial In Confidence. Copyright ©2005, Nallatech.
Algorithm Acceleration
Real Time Video
Processing
Gravity Simulation
26
- Single Precision Floating
Point calculations
-36 GFlops + 40 GOPs
sustained Performance on
a single PCI card
- >200 times Power reduction
over Xeon
- N-Body computation
- Single Precision Floating
Point
- 20GFlops/sec sustained
performance
-100 times faster than
2.4GHz Pentium 4 CPU
Commercial In Confidence. Copyright ©2005, Nallatech.
A Young Solar System
27
Commercial In Confidence. Copyright ©2005, Nallatech.
Colliding Galaxies
28
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA Coprocessor
Vs
FPGA Main Processor
29
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA’s as Coprocessors
» Accelerator / Offload Engine for Microprocessor based
solutions
» Advantages
»
»
»
»
»
»
Easy to conceptualise
Pragmatic Approach
Possible large performance improvement for least effort
Port small functions of Compute intensive Legacy Code
Rest of code remains on Existing Host.
Benefit from Existing Host interfaces
» Disadvantages
» Only Applicable to certain functions
» Need to consider bandwidth / latency requirements
30
Commercial In Confidence. Copyright ©2005, Nallatech.
FPGA’s as Main Processor
» The FPGA takes on the complete Compute Function
» Advantages
»
»
»
»
Build the Computing Architecture around Algorithmic Problem
Can provide another order of magnitude increase in performance
Can go back to First principles
Don’t have to port Optimised processor code to FPGA
» Disadvantages
» Rarely Start with a clean sheet of paper
» Tool Maturity
» Only practical for relatively straight forward algorithms
31
Commercial In Confidence. Copyright ©2005, Nallatech.
Main Vs Co - Recommendations
» Co-processor approach is most applicable Today : »
»
»
»
»
For HPC
Whenever a Man Machine interface is required
Whenever low performance industry standard interfaces are required.
If you need to work with legacy code
Quick wins
» A Main Processor Approach is recommended Today : »
»
»
»
»
32
For Stand alone Embedded applications (HPEC)
When starting with a clean sheet of paper
If ultimate performance is a pre-requisite
The best power/performance ratio is required from your system
Relaxed development times
Commercial In Confidence. Copyright ©2005, Nallatech.
Optimizing Spatial &
Temporal Demands
of Computing Problems
33
Commercial In Confidence. Copyright ©2005, Nallatech.
Spatial & Temporal Definitions
»
There are several perspective on the meaning of spatial and
temporal
1. Cluster of Microprocessors
» Temporal = Function runs within one processor
» Spatial = Function spread across many microprocessor nodes
2. Traditional Embedded Computing Hardware
» Temporal = using Microprocessor
» Spatial = Implementing dedicated ASIC accelerators
3. FPGAs
» Temporal = using same logic resources for multiple Functions
» Spatial = Paralleling and pipelining a function across the FPGA Fabric
»
»
34
Ultimate Aim is to ensure that no processor goes idle
And you utilise all the available resources
Commercial In Confidence. Copyright ©2005, Nallatech.
Complexity and Speed
» Any Application will be constructed from several functions
» Each Function will have varying degrees of complexity
» Each Function will also have varying demands on its
execution time
Must Execute Quickly
Medium
Compute
Simple
Complex
Low
Compute
Medium
Compute
Can Execute Slowly
35
Fully spatial
implementation
Compute
Intensive
Traditional
ASIC
Balanced spatial Fully Temporal
and temporal Implementation
implementation
Traditional
Traditional
Vector
Microprocessor
FPGA
All parallel /
pipelined
FPGA
FPGA
Partially Parallel
Soft
Partial reuse Microprocessor
Commercial In Confidence. Copyright ©2005, Nallatech.
Homogenous Computing
Versus
Heterogeneous Computing
Versus
Polymorphic Computing
36
Commercial In Confidence. Copyright ©2005, Nallatech.
Direction of Computing
uP
uP
GP
Global Shared Memory
ASSP
FPGA
Polymorphic Computing
FPGA
SGI – Heterogeneous Architecture
PPC
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
Cell Processor – IBM, Sony, Toshiba
Intel – 16 processor per die
Homogeneous on Silicon
37
Heterogeneous Computing on Silicon
Commercial In Confidence. Copyright ©2005, Nallatech.
Size, Weight, Energy, Performance, Time
SWEPT Efficiency
Polymorphic Computing
FPGA
Bit Level
Polymorphic
Processor
Vector/Streaming
Symbolic
Application Data Types
38
Commercial In Confidence. Copyright ©2005, Nallatech.
Bit Level
Cell Processor
RP – e.g. Clearspeed
Elixent, Picochip
Processor
Support for FPU?
FPGA
Support for DSP
Size, Weight, Energy, Performance, Time
SWEPT Efficiency
Polymorphic Computing
Vector/Streaming
Symbolic
Application Data Types
39
Commercial In Confidence. Copyright ©2005, Nallatech.
What is Polymorphic processor?
Floating
Point
Microprocessor
DSP Unit
Integer
Operator
Logical
Operator
Operator
40
Commercial In Confidence. Copyright ©2005, Nallatech.
A Polymorphic FPGA
» FPGA is the closest concept to Polymorphic computing
» It can morph into the different operators
» However it cannot perform them all with equal efficiency
» Is a polymorphic course grained FPGA Possible
= Polymorphic processing element
Can Morph into : •Microprocessor
•Integer Operator
•DP/SP Floating Point Operator
•Logical Operator
•Text Operator
= Traditional FPGA Fabric
41
Commercial In Confidence. Copyright ©2005, Nallatech.
Coarse Grain Architectures
Versus
Fine Grain Architectures
42
Commercial In Confidence. Copyright ©2005, Nallatech.
Coarse vs Fine Grain
»
»
»
»
Cluster = ultimate in coarse grained parallelism
ASIC = Ultimate in fine grain parallelism
FPGA = Programmable Fine grain parallelism
The Finer the grain, the more you can make the architecture
exactly fit the problem.
» However Fine Grain Programmable FPGA are a sub-optimal
solution as they suffer from
» An inefficient transistor utilisation on coarser grain operations
» A slower clock frequency that could be improved with coarser granularity
43
Commercial In Confidence. Copyright ©2005, Nallatech.
Distributed Parallel Processing (DPP)
Vs
Cluster Parallel Processing (CPP)
44
Commercial In Confidence. Copyright ©2005, Nallatech.
DPP & CPP Definitions
Distributed Parallel Processing
Gain + Offset Control
Sensor Interface
Convolution
T9
T9
T9
x3
Image Composition
Background Image
Processing
x3
x1
x2
FPGA
x2
T9
T9
x3
FPGA
FPGA
FPGA
x4
C80
C80
C80
C80
FPGA
FPGA
T9
T9
T9
Communications
Infrastructure
Target Image Generation
x2
T9
T9
x6
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
x4
I860
I860
I860
I860
» Processing Power is distributed
to where it is needed
» Direct Communications built as
needed
» Computing Architecture
designed to fit the Application
45
Cluster Parallel Processing
= Server node
» Regular processor Architecture
» Regular communications
Infrastructure
» Application must be designed to
fit the computer architecture
Commercial In Confidence. Copyright ©2005, Nallatech.
Application Implementation
Algorithm A
Algorithm B
Algorithm C
Algorithm D
» Example application consisting of
8 algorithms
» Need to map onto hardware for
real-time implementation
» Algorithms each have different
characteristics
Algorithm E
Algorithm F
Algorithm G
Algorithm H
46
Commercial In Confidence. Copyright ©2005, Nallatech.
Fitting Application to Cluster
Application
Cluster
Algorithm A
Algorithm B
Algorithm C
Algorithm D
Algorithm E
Algorithm F
Algorithm G
Algorithm H
Communications
Infrastructure
47
Commercial In Confidence. Copyright ©2005, Nallatech.
Fitting Application to
Distributed FPGA Computer
» VME Blade form-factor
» Five high-density platform
FPGAs
» High-speed external analog
interfaces
» High-speed synchronous SRAM
memory
» Gigabit Ethernet interface
48
Commercial In Confidence. Copyright ©2005, Nallatech.
Application Implementation
Algorithm A
Algorithm B
Algorithm C
Algorithm D
» Example application consisting of
8 algorithms
» Need to map onto hardware for
real-time implementation
» Algorithms each have different
characteristics
Algorithm E
Algorithm F
Algorithm G
Algorithm H
49
Commercial In Confidence. Copyright ©2005, Nallatech.
Same Application Implemented
on FPGAs
VME
FPGA
VHDL
Algorithm A
Algorithm D
C
(MicroBlaze uP)
Algorithm C
Algorithm F
FPGA
FPGA
Algorithm B
FPGA
50
FPGA
GBit Ethernet
C
(PicoBlaze uP)
MATLAB or
Simulink
Algorithm E
Algorithm H
Verilog
Algorithm G
Commercial In Confidence. Copyright ©2005, Nallatech.
Communications network to
connect algorithms
VME
E
R
FPGA
R
N
B
B
N
R
FPGA
N
N
B
FPGAN
B
R
N
R
B
FPGAN
B
N
R
N
51
B
N
GBit Ethernet
FPGAN
B
Commercial In Confidence. Copyright ©2005, Nallatech.
So, should Von-Neuman and Harvard
Architectures be retired?
52
Commercial In Confidence. Copyright ©2005, Nallatech.
Should they Retire?
» Von Neumann and Harvard provide highly efficient use of
silicon real estate whilst still being capable of executing any
computational function.
» Therefore perhaps they should still live on
» However this will be less and less in a hard chip
implementation
» The Intelligent compiler will instantiate a Von-neumann or
Harvard like architecture when they are the most efficient
way to execute an algorithm
Von-Neumann and Harvard will live on as part of the
intelligence within tomorrow’s Compilers.
53
Commercial In Confidence. Copyright ©2005, Nallatech.
Summary
» FPGAs for computing is not new
» 12 Years accelerating maths functions
» Floating Point & Tools make FPGAs viable for HPC
Community
» No coherent Industry Standardisation
» Code development WILL take longer
» Significant potential savings
» Price/Performance
» SWAP
» Cost of Ownership
54
Commercial In Confidence. Copyright ©2005, Nallatech.
And Finally……………
SGI & Nallatech have formally agreed on a
Strategic Collaborative Arrangement
This brings together 12 years of expertise in delivering Real
FPGA computing solutions from Nallatech with the Global
Shared Memory MPP computing from SGI.
Customers now have a path to scale from a commodity cluster
with FPGAs all the way up to a massive HPC system with
thousands of Processors and & thousands of FPGAs
55
Commercial In Confidence. Copyright ©2005, Nallatech.
Thank You for your attention
www.nallatech.com
Copyright © 2005 Nallatech Limited. All rights reserved. Nallatech, the Nallatech logo, the triangles device and “The High Performance FPGA Solutions Company" are
trademarks of Nallatech Limited. All other trademarks acknowledged.
56
Commercial In Confidence. Copyright ©2005, Nallatech.