CSL718 : Architecture of High Performance Systems Introduction 9th January, 2006

Download Report

Transcript CSL718 : Architecture of High Performance Systems Introduction 9th January, 2006

CSL718 : Architecture of
High Performance Systems
Introduction
9th January, 2006
High Performance Architectures
• Who needs high performance systems?
• How do you achieve high performance?
• How to analyse or evaluate performance?
Anshul Kumar, CSE IITD
slide 2
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 3
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
• Flynn’s
[66]
Data Parallel Architectures
• Feng’s
[72]
Process level
Parallel Architectures
• Händler’s
[77]
• Modern
(Sima, Fountain & Kacsuk)
Issues in parallel
architectures
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 4
Flynn’s Classification
Architecture Categories
SISD
SIMD
Anshul Kumar, CSE IITD
MISD
MIMD
slide 5
SISD
IS
C
IS
Anshul Kumar, CSE IITD
P
DS
M
slide 6
SIMD
P
IS
DS
M
C
P
Anshul Kumar, CSE IITD
DS
slide 7
MISD
IS
C
IS
P
DS
M
IS
C
IS
Anshul Kumar, CSE IITD
P
DS
slide 8
MIMD
IS
C
IS
P
DS
M
IS
C
IS
Anshul Kumar, CSE IITD
P
DS
slide 9
Feng’s Classification
16K
•MPP
•STARAN
256
bit slice
length 64
•PEPE
•IlliacIV
•C.mmP
16
1
1
•PDP11
•IBM370
16
32
word length
Anshul Kumar, CSE IITD
•CRAY-1
64
slide 10
Händler’s Classification
< K x K’ , D x D’ , W x W’ >
control data
word
dash  degree of pipelining
TI - ASC
CDC 6600
C.mmP
PEPE
Cray-1
<1, 4, 64 x 8>
<1, 1 x 10, 60> x <10, 1, 12> (I/O)
<16,1,16> + <1x16,1,16> + <1,16,16>
<1 x 3, 288, 32>
<1, 12 x 8, 64 x (1 ~ 14)>
Anshul Kumar, CSE IITD
slide 11
Modern Classification
Parallel
architectures
Data-parallel
Function-parallel
architectures
architectures
Anshul Kumar, CSE IITD
slide 12
Data Parallel Architectures
Data-parallel
architectures
Vector
Associative
architectures
And neural
SIMDs
Systolic
architectures
architectures
Anshul Kumar, CSE IITD
slide 13
Function Parallel Architectures
Function-parallel
architectures
Instr level
Parallel Arch
(ILPs)
Thread level
Parallel Arch
Pipelined VLIWs Superscalar
processors
processors
Anshul Kumar, CSE IITD
Process level
Parallel Arch
(MIMDs)
Distributed
Memory
MIMD
Shared
Memory
MIMD
slide 14
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
Data Parallel Architectures
• Pipelining
Process level Parallel Architectures
• VLIW
Issues in parallel
architectures
• Superscalar
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 15
Pipelining
Simple multicycle design :
•resource sharing across cycles
• all instructions may not take same cycles
IF
D
RF EX/AG M
WB
• faster throughput with pipelining
Anshul Kumar, CSE IITD
slide 16
Hazards in Pipelining
• Procedural dependencies => Control hazards
– conditional and unconditional branches, calls/returns
• Data dependencies => Data hazards
– RAW (read after write)
– WAR (write after read)
– WAW (write after write)
• Resource conflicts => Structural hazards
– use of same resource in different stages
Anshul Kumar, CSE IITD
slide 17
Pipeline Performance
T
S stages
Frequency of interruptions - b
CPI = 1 + (S - 1) * b
Time = CPI * T / S
Anshul Kumar, CSE IITD
slide 18
ILP in VLIW processors
Cache/
Fetch
memory
Unit
Single multi-operation instruction
FU
FU
FU
Register file
multi-operation instruction
Anshul Kumar, CSE IITD
slide 19
ILP in Superscalar processors
Decode
Cache/
Fetch
memory
Unit
and issue
unit
Multiple instruction
FU
FU
FU
Sequential stream of instructions
Instruction/control
Data
FU
Register file
Funtional Unit
Anshul Kumar, CSE IITD
slide 20
Why Superscalars are popular ?
• Binary code compatibility among scalar &
superscalar processors of same family
• Same compiler works for all processors (scalars and
superscalars) of same family
• Assembly programming of VLIWs is tedious
• Code density in VLIWs is very poor - Instruction
encoding schemes
Anshul Kumar, CSE IITD
slide 21
Issues in VLIW Architecture
FU
FU
FU
Register file
•Instruction encoding
•Scalability: Access time, area, power consumption
sharply increase with number of register ports
Anshul Kumar, CSE IITD
slide 22
Tasks of superscalar processing
Parallel Superscalar Parallel Preserving the
decoding instruction instruction sequential
issue
execution consistency of
execution
Anshul Kumar, CSE IITD
Preserving the
sequential
consistency of
exception
processing
slide 23
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
•SIMD Processors
Issues in parallel
architectures
•Vector Processors
•Associative
Processors
Cache coherence
problem
•Systolic
Arrays
Interconnection networks
Anshul Kumar, CSE IITD
slide 24
Data Parallel Architectures
• SIMD Processors
– Multiple processing elements driven by a single
instruction stream
• Vector Processors
– Uni-processors with vector instructions
• Associative Processors
– SIMD like processors with associative memory
• Systolic Arrays
– Application specific VLSI structures
Anshul Kumar, CSE IITD
slide 25
Systolic Arrays [H.T. Kung 1978]
Simplicity, Regularity, Concurrency, Communication
Example :
Band matrix multiplication
 A11 A12 0 0 0 0   B11B12 0 0 0 0 
 A A A 0 0 0  B B B 0 0 0 
 21 22 23
  21 22 23

 A31 A32 A33 A34 0 0   B31B32 B33 B34 0 0 
C   


0
A
A
A
A
0
0
B
B
B
B
0
42 43 44 45
42 43 44 45

 

 0 0 A A A A  0 0 B B B B 
53 54 55 56
53 54 55 56

 

0 0 0 A64 A65 A66  0 0 0 B64 B65 B66 
Anshul Kumar, CSE IITD
slide 26
T=0
B31
A23
A22
A31
B21
A12
A21
A11
B11
B12
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
Issues in parallel architectures
•MIMD Processors
Cache coherence
problem
- Shared
Memory
Interconnection
networks Memory
- Distributed
Anshul Kumar, CSE IITD
slide 28
Why Process level Parallel Architectures?
Data-parallel
architectures
Instruction
level PAs
Built using
general purpose
processors
Anshul Kumar, CSE IITD
Function-parallel
architectures
Thread
level PAs
Process
level PAs
(MIMDs)
Distributed
Memory
MIMD
Shared
Memory
MIMD
slide 29
MIMD Architectures
Design Space
• Extent of address space sharing
• Location of memory modules
• Uniformity of memory access
Anshul Kumar, CSE IITD
slide 30
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
•User’s
perspective
Data Parallel
Architectures
•Architect’s perspective
Process level Parallel Architectures
Issues in parallel architectures
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 31
Issues from user’s perspective
• Specification / Program design
– explicit parallelism or
– implicit parallelism + parallelizing compiler
• Partitioning / mapping to processors
• Scheduling / mapping to time instants
– static or dynamic
• Communication and Synchronization
Anshul Kumar, CSE IITD
slide 32
Parallel programming models
Concurrent
control flow
Functional or
logic program
Vector/array
operations
Concurrent
tasks/processes/threads/objects
With shared variables
or message passing
Anshul Kumar, CSE IITD
Relationship between
programming model
and architecture ?
slide 33
Issues from architect’s perspective
• Coherence problem in shared memory with
caches
• Efficient interconnection networks
Anshul Kumar, CSE IITD
slide 34
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
•Coherence Protocols
Bus or directory based
Data Parallel -Architectures
Invalidate
or update
Process level -Parallel
Architectures
- Definition of states
Issues in parallel architectures
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 35
Cache Coherence Problem
Multiple copies of data may exist
 Problem of cache coherence
Options for coherence protocols
• What action is taken?
– Invalidate or Update
• Which processors/caches communicate?
– Snoopy (broadcast) or directory based
• Status of each block?
Anshul Kumar, CSE IITD
slide 36
Outline
•
•
•
•
•
•
•
Classification
ILP Architectures
Data Parallel Architectures
Process level Parallel Architectures
•Switching and control
Issues in parallel
architectures
•Topology
Cache coherence problem
Interconnection networks
Anshul Kumar, CSE IITD
slide 37
Interconnection Networks
• Architectural Variations:
– Topology
– Direct or Indirect (through switches)
– Static (fixed connections) or Dynamic (connections
established as required)
– Routing type store and forward/worm hole)
• Efficiency:
– Delay
– Bandwidth
– Cost
Anshul Kumar, CSE IITD
slide 38
Books
• D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer
Architectures : A Design Space Approach", Addison Wesley,
1997.
• M.J. Flynn, "Computer Architecture : Pipelined and Parallel
Processor Design", Narosa Publishing House/ Jones and Bartlett,
1996.
• D.A. Patterson, J.L. Hennessy, "Computer Architecture : A
Quantitative Approach", Morgan Kaufmann Publishers, 2002.
• K. Hwang, "Advanced Computer Architecture : Parallelism,
Scalability, Programmability", McGraw Hill, 1993.
• H.G. Cragon, "Memory Systems and Pipelined Processors",
Narosa Publishing House/ Jones and Bartlett, 1998.
• D.E. Culler, J.P Singh and Anoop Gupta, "Parallel Computer
Architecture, A Hardware/Software Approach", Harcourt Asia /
Morgan Kaufmann Publishers, 2000.
Anshul Kumar, CSE IITD
slide 39