Transcript CSE 574 Parallel Processing
Lecture 2: Parallel Computers
Levels of Parallelism
Bit level parallelism
• Within arithmetic logic circuits
Instruction level parallelism
• Multiple instructions execute per clock cycle
Memory system parallelism
• Overlap of memory operations with computation
Operating system parallelism
• Multiple jobs run in parallel on SMP •
Loop level
•
Procedure level
Flynn’s Taxonomy
Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD)
Single Instruction stream Single Data stream (SISD)
instruction Memory data Processor Von Neumann Architecture
Single Instruction stream Multiple Data stream (SIMD)
PE data
Instructions of the program are broadcast to more than one processor
CU instruction PE data PE data
Each processor executes the same instruction synchronously, but using different data
PE data
Used for applications that operate upon arrays of data
instruction
Multiple Instruction stream Multiple Data stream (MIMD)
Each processor has a separate program
An instruction stream is generated for each program on each processor
Each instruction operates upon different data
Multiple Instruction stream Multiple Data stream (MIMD)
Shared memory
Distributed memory
Shared vs Distributed Memory
P P M P P Bus Memory P M P Network M P M P
Shared memory
•
Single address space
•
All processes have access to the pool of shared memory
Distributed memory
•
Each processor has its own local memory
•
Message-passing is used to exchange data between processors
Shared Memory
CU CU CU CU PE data PE data PE data PE data
Each processor executes different instructions asynchronously, using different data
instruction
Shared Memory
P P Bus Memory P P
Uniform memory access (UMA)
•
Each processor has uniform access to memory (symmetric multiprocessor - SMP)
P P P Bus Memory P Network P P P Bus Memory P
Non-uniform memory access (NUMA)
•
Time for memory access depends on the location of data
•
Local access is faster than non local access
•
Easier to scale than SMPs
Distributed Memory
M P NI M P NI M P NI M P NI Network
Processors memory cannot directly access another processor’s
Each node has a network interface (NI) for communication and synchronization
Distributed Memory
Massively parallel processors (MPP)
•
Tightly integrated
•
Single system image (SSI)
Cluster
•
Individual computers connected by software
Interconnection Networks 3-D Hupercube 2-D Mesh Topology 2-D Torus
Interconnection Networks
Latency: How long does it take to start sending a "message"? (in microseconds)
Bandwidth: What data rate can be sustained once the message is started? (in Mbytes/sec)
Distributed Shared Memory
Making the main memory of a cluster of computers look as if it is a single memory with a single address space
Shared memory programming techniques can be used
Different Architectures
Parallel computing:
single systems with many processors working on the same problem
Distributed computing:
related problems many systems loosely coupled by a scheduler to work on
Grid Computing:
many systems tightly coupled by software, perhaps geographically distributed, to work together on single problems or on related problems