CSE 574 Parallel Processing

Download Report

Transcript CSE 574 Parallel Processing

Lecture 2: Parallel Computers

Levels of Parallelism

   

Bit level parallelism

• Within arithmetic logic circuits

Instruction level parallelism

• Multiple instructions execute per clock cycle

Memory system parallelism

• Overlap of memory operations with computation

Operating system parallelism

• Multiple jobs run in parallel on SMP •

Loop level

•

Procedure level

Flynn’s Taxonomy

    Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD)

Single Instruction stream Single Data stream (SISD)

instruction Memory data Processor  Von Neumann Architecture

Single Instruction stream Multiple Data stream (SIMD)

PE data 

Instructions of the program are broadcast to more than one processor

CU instruction PE data PE data 

Each processor executes the same instruction synchronously, but using different data

PE data 

Used for applications that operate upon arrays of data

instruction

Multiple Instruction stream Multiple Data stream (MIMD)



Each processor has a separate program



An instruction stream is generated for each program on each processor



Each instruction operates upon different data

Multiple Instruction stream Multiple Data stream (MIMD)



Shared memory



Distributed memory

Shared vs Distributed Memory

P P M P P Bus Memory P M P Network M P M P 

Shared memory

•

Single address space

•

All processes have access to the pool of shared memory



Distributed memory

•

Each processor has its own local memory

•

Message-passing is used to exchange data between processors

Shared Memory

CU CU CU CU PE data PE data PE data PE data 

Each processor executes different instructions asynchronously, using different data

instruction

Shared Memory

P P Bus Memory P P 

Uniform memory access (UMA)

•

Each processor has uniform access to memory (symmetric multiprocessor - SMP)

P P P Bus Memory P Network P P P Bus Memory P 

Non-uniform memory access (NUMA)

•

Time for memory access depends on the location of data

•

Local access is faster than non local access

•

Easier to scale than SMPs

Distributed Memory

M P NI M P NI M P NI M P NI Network 

Processors memory cannot directly access another processor’s



Each node has a network interface (NI) for communication and synchronization

Distributed Memory



Massively parallel processors (MPP)

•

Tightly integrated

•

Single system image (SSI)



Cluster

•

Individual computers connected by software

Interconnection Networks 3-D Hupercube 2-D Mesh Topology 2-D Torus

Interconnection Networks



Latency: How long does it take to start sending a "message"? (in microseconds)



Bandwidth: What data rate can be sustained once the message is started? (in Mbytes/sec)

Distributed Shared Memory



Making the main memory of a cluster of computers look as if it is a single memory with a single address space



Shared memory programming techniques can be used

Different Architectures



Parallel computing:

single systems with many processors working on the same problem 

Distributed computing:

related problems many systems loosely coupled by a scheduler to work on 

Grid Computing:

many systems tightly coupled by software, perhaps geographically distributed, to work together on single problems or on related problems

CSE 574 Parallel Processing

Transcript CSE 574 Parallel Processing

Lecture 2: Parallel Computers

Directory