CSE 574 Parallel Processing

Download Report

Transcript CSE 574 Parallel Processing

Lecture 2: Parallel Computers

Levels of Parallelism

   

Bit level parallelism

• Within arithmetic logic circuits

Instruction level parallelism

• Multiple instructions execute per clock cycle

Memory system parallelism

• Overlap of memory operations with computation

Operating system parallelism

• Multiple jobs run in parallel on SMP •

Loop level

Procedure level

Flynn’s Taxonomy

    Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD)

Single Instruction stream Single Data stream (SISD)

instruction Memory data Processor  Von Neumann Architecture

Single Instruction stream Multiple Data stream (SIMD)

PE data 

Instructions of the program are broadcast to more than one processor

CU instruction PE data PE data 

Each processor executes the same instruction synchronously, but using different data

PE data 

Used for applications that operate upon arrays of data

instruction

Multiple Instruction stream Multiple Data stream (MIMD)

Each processor has a separate program

An instruction stream is generated for each program on each processor

Each instruction operates upon different data

Multiple Instruction stream Multiple Data stream (MIMD)

Shared memory

Distributed memory

Shared vs Distributed Memory

P P M P P Bus Memory P M P Network M P M P 

Shared memory

Single address space

All processes have access to the pool of shared memory

Distributed memory

Each processor has its own local memory

Message-passing is used to exchange data between processors

Shared Memory

CU CU CU CU PE data PE data PE data PE data 

Each processor executes different instructions asynchronously, using different data

instruction

Shared Memory

P P Bus Memory P P 

Uniform memory access (UMA)

Each processor has uniform access to memory (symmetric multiprocessor - SMP)

P P P Bus Memory P Network P P P Bus Memory P 

Non-uniform memory access (NUMA)

Time for memory access depends on the location of data

Local access is faster than non local access

Easier to scale than SMPs

Distributed Memory

M P NI M P NI M P NI M P NI Network 

Processors memory cannot directly access another processor’s

Each node has a network interface (NI) for communication and synchronization

Distributed Memory

Massively parallel processors (MPP)

Tightly integrated

Single system image (SSI)

Cluster

Individual computers connected by software

Interconnection Networks 3-D Hupercube 2-D Mesh Topology 2-D Torus

Interconnection Networks

Latency: How long does it take to start sending a "message"? (in microseconds)

Bandwidth: What data rate can be sustained once the message is started? (in Mbytes/sec)

Distributed Shared Memory

Making the main memory of a cluster of computers look as if it is a single memory with a single address space

Shared memory programming techniques can be used

Different Architectures

Parallel computing:

single systems with many processors working on the same problem 

Distributed computing:

related problems many systems loosely coupled by a scheduler to work on 

Grid Computing:

many systems tightly coupled by software, perhaps geographically distributed, to work together on single problems or on related problems