Transcript (Superficial!) Review of Uniprocessor Architecture and
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts
CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign Department of Computer Science
Parallel Machines: an abstract introduction
• Our main focus will be on three kinds of machines – Bus-based shared memory machines – Scalable shared memory machines • Cache coherent • Hardware support for remote memory access – Distributed memory machines
Distributed memory m/cs: debate
Should this machine support a shared address space? If not : coordination by “passing messages” If so: how and whether to keep caches “ coherent”?
Mem0 cache PE0 Mem1 cache PE1 Memp-1 Pep-1 cache Interconnection Network This debate is also tied to the debate over programming models:
Writing parallel programs
• Programming model – How should a programmer view the parallel machine?
– Sequential programming: von Neumann model • Parallel programming models: – Shared memory (Shared address space) model – Message passing model – Shared Objects model • Common to all these models: – In all these models, you have multiple independent entities communicating, synchronizing and coordinating with each other via specific mechanisms provided by the model • Special-purpose models: – A common case: data-parallel (loop-parallel) models – Other “domain-specific” models
Shared Address space model
• Also called shared memory model sometimes: – considered a misnomer by some: shared memory is an arch. Concept • Independent entities are called threads (or processes) – All threads use the same common address space – When thread i refers to an address A, it is the same location as when thread j refers to address A.
• Advantages: – Natural extension of sequential programming model • Some people disagree even about this – Relatively easy to get “first parallel version” of an existing sequential code
Shared Address space model:
• Issues: – Need hardware support for cache coherence and consistency: • But that’s not the concern when we are discussing efficacy of the prog model – Data being read by one may be being modified by another • Need ways of synchronizing access • E.g. Producer-consumer relationship between threads – Producer is to store the result in shared variable X – When can the consumer thread read it?
– Another example: inconsistent modifications: • Suppose two processes are both trying to add 5 to x. – In reality, it is not one instruction, but 3: • Now, the 6 instructions (3 from each thread) – may interleave in many possible ways – leading to wrong behavior x := x-5 ld r1,x; add r1,r1,5; st r1,x
SAS model: Locks and Barriers
• Solution: Locks – A lock is a variable – You can: create a lock, “lock” a lock, and “unlock” a lock – The implementation guarantees that: • only one thread can “get” or “lock” a lock at a time • Using locks: – Protect vulnerable shared data using a lock – associate a lock with such a variable • Mentally (there is no construct or call to do the association) – Before changing the variable, lock its associated variable • unlock it as soon as you finished using it – Remember that this is only a convention • Nothing prevents a thread from inadvertently changing a variable that is protected by lock in another part of the code: • Analogy: locking a room with a “post-it” on the door
Matrix multiplication:
• Why people like SAS model: for (i=0; i
SAS matrix multiply
• Each thread know its “serial number”: – myPe() size= M/numPEs( ); myStart = myPE( ) for (i=myStart; i
Message passing
• Parallel entities are processes – With their own address space • Assume that processors have direct access to only their memory • Each processor typically executes the same executable, but may be running different part of the program at a time • Coordination : – via sending and receiving “messages”: bytes of data
Message passing basics:
• Basic calls: send and recv • send(int proc, int tag, int size, char *buf); • recv(int proc, int tag, int size, char * buf); • Recv may return the actual number of bytes received in some systems • tag and proc may be wildcarded in a recv: – recv(ANY, ANY, 1000, &buf); • broadcast: • Other global operations (reductions)
Parallel Programming
• Decomposition – what to do in parallel – Tasks (loop iterations, functions,.. ) that can be done in parallel • Mapping: – Which processor does each task • Scheduling (sequencing) – On each processor • Machine dependent expression – Express the above decisions for the particular parallel machine
Decomposition
Spectrum of parallel Languages
Parallelizing fortran compiler Mapping Scheduling (sequencing) Machine dependent expression l L e v e Charm++ MPI/SAS Specialization What is automated
Shared objects model:
• Basic philosophy: – Let the programmer decide what to do in parallel – Let the system handle the rest: • Which processor executes what, and when • With some override control to the programmer, when needed • Basic model: – The program is set of communicating objects – Objects only know about other objects (not processors) – System maps objects to processors • And may remap the objects for load balancing etc. dynamically • Shared objects, not shared memory – So, in some ways, in between “shared nothing” message passing, and “shared everything” of SAS – More disciplined sharing – Additional information sharing mechanisms
Charm++
• Data Driven Objects: called chares • Asynchronous method invocation • Prioritized scheduling • Object Arrays • Object Groups: – global object with a “representative” on each PE • Information sharing abstractions – readonly data – accumulators – distributed tables
Data Driven Execution
Objects Scheduler Message Q Scheduler Message Q
Object Arrays
• A collection of chares, – with a single global name for the collection, and – each member addressed by an index – Mapping of element objects to processors handled by the system A[0] A[1] A[2] A[3] A[0] A[3] A[..] User’s view System view
Object Groups
• A group of objects (chares) – with exactly one representative on each processor – A single Id for the group as a whole – invoke methods in a branch (asynchronously), all branches (broadcast), or in the local branch
Information sharing abstractions
• Observation: – Information is shared in several specific modes in parallel programs • Other models support only a limited sets of modes: – Shared memory: everything is shared: sledgehammer approach – Message passing: messages are the only method • Charm++: identifies and supports several modes – Readonly / writeonce – Tables (hash tables) – accumulators – Monotonic variables
Comparing Programming Models
• What are the advantages and disadvantages of the models?
– even at this simple/abstract level of introduction?