Transcript PPT
CS2403 Programming Languages
Concurrency
Chung-Ta King
Department of Computer Science
National Tsing Hua University
(Slides are adopted from Concepts of Programming Languages, R.W. Sebesta)
Outline
Parallel architecture and programming
Language supports for concurrency
Controlling concurrent tasks
Sharing data
Synchronizing tasks
1
Sequential Computing
von Neumann arch. with Program Counter (PC)
dictates sequential execution
Traditional programming thus follows a single
thread of control
The sequence of program
points reached as control
flows through the
program
Program counter
(Introduction to Parallel Computing, Blaise Barney)
2
Sequential Programming Dominates
Sequential programming has dominated
throughout computing history
Why?
Why is there no need to change programming style?
3
2 Factors Help to Maintain Perf.
IC technology: ever shrinking feature size
Moore’s law, faster switching, more functionalities
Architectural innovations to remove bottlenecks
in von Neumann architecture
Memory hierarchy for reducing memory latency:
registers, caches, scratchpad memory
Hide or tolerate memory latency: multithreading,
prefetching, predication, speculation
Executing multiple instructions in parallel: pipelining,
multiple issue (in-/out-of-order, VLIW), SIMD
multimedia extensions (inst.-level parallelism, ILP)
(Prof. Mary Hall, Univ. of Utah)
4
End of Sequential Programming?
Infeasible for continuing improving performance
of uniprocessors
Power, clocking, ...
Multicore architecture prevails (homogeneous or
heterogeneous)
Achieve performance gains with simpler processors
Sequential programming still alive!
Why?
Throughput versus execution time
Can we live with sequential prog. forever?
5
Parallel Programming
A programming style that specify concurrency
(control structure) & interaction (communication
structure) between concurrent subtasks
Still in imperative language style
Concurrency can be expressed at various levels
of granularity
Machine instruction level, high-level language
statement level, unit level, program level
Different models assume different architectural
support
Look at parallel architectures first
(Ananth Grama, Purdue Univ.)
6
An Abstract Parallel Architecture
How is parallelism managed?
Where is the memory physically located?
What is the connectivity of the network?
(Prof. Mary Hall, Univ. of Utah)
7
Flynn’s Taxonomy of Parallel Arch.
Distinguishes parallel architecture by instruction
and data streams
SISD: classical uniprocessor architecture
SISD
Single Instruction,
Single Data
SIMD
Single Instruction,
Multiple Data
MISD
Multiple Instruction,
Single Data
MIMD
Multiple Instruction,
Multiple Data
(Introduction to Parallel Computing, Blaise Barney)
8
Parallel Control Mechanisms
(Prof. Mary Hall, Univ. of Utah)
9
2 Classes of Parallel Architecture
Shared memory multiprocessor architectures
Multiple processors can operate independently but
share the same memory system
Share a global address space where each processor
can access every memory location
Changes in a memory location
effected by one processor are
visible to all other processors
like a bulletin board
(Introduction to Parallel Computing, Blaise Barney;
Prof. Mary Hall, Univ. of Utah)
10
2 Classes of Parallel Architecture
Distributed memory architectures
Processing units (PEs) connected by an interconnect
Each PE has its own distinct address space without a
global address space, and they explicitly
communicate to exchange data
Ex.: PC clusters of connected by commodity Ethernet
(Introduction to Parallel Computing,
Blaise Barney; Prof. Mary Hall, Univ.
of Utah)
11
Shared Memory Programming
Often as a collection of threads of control
Each thread has private data, e.g., local stack, and a
set of shared variables, e.g., global heap
Threads communicate implicitly by writing and
reading shared variables
Threads coordinate through locks and barriers
implemented using shared variables
(Prof. Mary Hall,
Univ. of Utah)
12
Distributed Memory Programming
Organized as named processes
A process is a thread of control plus local address
space -- NO shared data
A process cannot see the memory contents of other
processes, nor can it address and access them
Logically shared data is partitioned over processes
Processes communicate by explicit send/receive. i.e.,
asking the destination process to access its local data
on behalf of the requesting process
Coordination is implicit in communication events
blocking/non-blocking send and receive
(Prof. Mary Hall, Univ. of Utah)
13
Distributed Memory Programming
Private memory looks like mailbox
(Prof. Mary Hall, Univ. of Utah)
14
Specifying Concurrency
What language supports are needed for parallel
programming?
Specifying (parallel) control flows
How to create, start, suspend, resume, stop
processes/threads? How to let one process/thread
explicitly wait for events or another process/thread?
Specifying data flows among parallel flows
How to pass a data generated by one process/thread
to another process/thread?
How to let multiple process/thread access common
resources, e.g., counter, with conflicts
15
Specifying Concurrency
Many parallel programming systems provide
libraries and perhaps compiler pre-processors to
extend a traditional imperative language, such
as C, for parallel programming
Examples: Pthread, OpenMP, MPI,...
Some languages have parallel constructs built
directly into the language, e.g., Java, C#
So far, the library approach works fine
16
Shared Memory Prog. with Threads
Several thread libraries:
PThreads: the POSIX threading interface
POSIX: Portable Operating System Interface for UNIX
Interface to OS utilities
System calls to create and synchronize threads
OpenMP is newer standard
Allow a programmer to separate a program into serial
regions and parallel regions
Provide synchronization constructs
Compiler generates thread program & synch.
Extensions to Fortran, C, C++ mainly by directives
(Prof. Mary Hall, Univ. of Utah)
17
Thread Basics
A thread is a program unit that can be in
concurrent execution with other program units
Threads differ from ordinary subprograms:
When a program unit starts the execution of a
thread, it is not necessarily suspended
When a thread’s execution is completed, control may
not return to the caller
All threads run in the same address space but have
own runtime stacks
18
Message Passing Prog. with MPI
MPI defines a standard library for messagepassing that can be used to develop portable
message-passing programs using C or Fortran
Based on Single Program, Multiple Data (SPMD)
All communication, synchronization require subroutine
calls no shared variables
Program runs on a single processor just like any
uniprocessor program, except for calls to message
passing library
It is possible to write fully-functional messagepassing programs by using only six routines
(Prof. Mary Hall, Univ. of Utah; Prof. Ananth Grama, Purdue Univ. )
19
Message Passing Basics
The computing systems consists of p processes,
each with its own exclusive address space
Each data element must belong to one of the
partitions of the space; hence, data must be explicitly
partitioned and placed
All interactions (read-only or read/write) require
cooperation of two processes - the process that has
the data and one that wants to access the data
All processes execute asynchronously unless they
interact through send/receive synchronizations
(Prof. Ananth Grama, Purdue Univ. )
20
Controlling Concurrent Tasks
Pthreads:
Program starts with a single master thread, from
which other threads are created
errcode = pthread_create(&thread_id,
&thread_attribute,
&thread_fun, &fun_arg);
Each thread executes a specific function,
thread_fun(), representing thread’s computation
All threads execute in parallel
Function pthread_join() suspends execution of
calling thread until the target thread terminates
(Prof. Mary Hall, Univ. of Utah)
21
Pthreads “Hello World!”
#include <pthread.h>
void *thread(void *vargp);
int main() {
pthread_t tid;
pthread_create(&tid, NULL, thread, NULL);
pthread_join(tid, NULL);
pthread_exit((void *)NULL);
}
void *thread(void *vargp){
printf("Hello World from thread!\n");
pthread_exit((void *)NULL);
}
(http://www.cs.binghamton.edu/~guydosh/cs350/hello.c)
22
Controlling Concurrent Tasks (cont.)
OpenMP:
Begin execution as a single process and fork multiple
threads to work on parallel blocks of code
single program multiple data
Parallel constructs are
specified using
Pragmas
(Prof. Mary Hall, Univ. of Utah)
23
OpenMP Pragma
All pragmas begin: #pragma
Compiler calculates loop bounds for each thread and
manages data partitioning
Synchronization also automatic (barrier)
(Prof. Mary Hall, Univ. of Utah)
24
OpenMP “Hello World!”
#include <omp.h>
int main (int argc, char *argv[]) {
int th_id, nthreads;
#pragma omp parallel private(th_id)
{ th_id = omp_get_thread_num();
printf("Hello World: %d\n", th_id);
#pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf("%d threads\n",nthreads); }
}
return EXIT_SUCCESS;
}
(http://en.wikipedia.org/wiki/OpenMP#Hello_World)
25
Controlling Concurrent Tasks (cont.)
Java:
The concurrent units in Java are methods named run
A run method code can be in concurrent execution
with other such methods
The process in which the run methods execute is
called a thread
Class myThread extends Thread {
public void run () {...}
}
...
Thread myTh = new MyThread ();
myTh.start();
26
Controlling Concurrent Tasks (cont.)
Java Thread class has several methods to
control the execution of threads
The yield is a request from the running thread to
voluntarily surrender the processor
The sleep method can be used by the caller of the
method to block the thread
The join method is used to force a method to delay
its execution until the run method of another thread
has completed its execution
27
Controlling Concurrent Tasks (cont.)
Java thread priority:
A thread’s default priority is the same as the thread
that create it
If main creates a thread, its default priority is
NORM_PRIORITY
Threads defined two other priority constants,
MAX_PRIORITY and MIN_PRIORITY
The priority of a thread can be changed with the
methods setPriority
28
Controlling Concurrent Tasks (cont.)
MPI:
Programmer writes the code for a single process and
the compiler includes necessary libraries
mpicc -g -Wall -o mpi_hello mpi_hello.c
The execution environment starts parallel processes
mpiexec -n 4 ./mpi_hello
(Prof. Mary Hall, Univ. of Utah)
29
MPI “Hello World!”
#include "mpi.h"
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf(”Hello World from process %d of
%d\n", rank, size);
MPI_Finalize();
return 0;
}
(Prof. Mary Hall, Univ. of Utah)
30
Sharing Data
Pthreads:
Variables declared outside of main are shared
Object allocated on the heap may be shared (if
pointer is passed)
Variables on the stack are private: passing pointer to
these around to other threads can cause problems
Shared variables can be read and written directly by
all threads need synchronization to prevent races
Synchronization primitives, e.g., semaphores, locks,
mutex, barriers, are used to sequence the executions
of the threads to indirectly sequence the data passed
through shared variables
(Prof. Mary Hall, Univ. of Utah)
31
Sharing Data (cont.)
OpenMP:
shared variables are shared; default is shared
private variables are private
Loop index is private
int bigdata[1024];
void* foo(void* bar) {
int tid;
#pragma omp parallel \
shared (bigdata) private (tid)
{
/* Calc. here */ }
}
(Prof. Mary Hall, Univ. of Utah)
32
Sharing Data (cont.)
MPI:
int main( int argc, char *argv[]) {
int rank, buf;
MPI_Status status;
MPI_Init(&argv, &argc);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
buf = 123456;
MPI_Send(&buf, 1, MPI_INT, 1, 0,
MPI_COMM_WORLD); }
else if (rank == 1) {
MPI_Recv(&buf, 1, MPI_INT, 0, 0,
MPI_COMM_WORLD, &status);}
MPI_Finalize();
}
(Prof. Mary Hall, Univ. of Utah)
33
Synchronizing Tasks
A mechanism that controls the order in which
tasks execute
Two kinds of synchronization
Cooperation: one task waits for another, e.g., for
passing data
task 1
task 2
a = ...
... = ... a ...
Competition: tasks compete for exclusive use of
resource without specific order
task 1
task 2
sum += local_sum
sum += local_sum
34
Synchronizing Tasks (cont.)
Pthreads:
Provide various synchronization primitives, e.g.,
mutex, semaphore, barrier
Mutex: protects critical sections -- segments of code
that must be executed by one thread at any time
Protect code to indirectly protect shared data
Semaphore: synchronizes between two threads using
sem_post() and sem_wait()
Barrier: synchronizes threads to reach the same point
in code before going any further
35
Pthreads Mutex Example
pthread_mutex_t sum_lock;
int sum;
main() {
...
pthread_mutex_init(&sum_lock, NULL);
...
}
void *find_min(void *list_ptr) {
int my_sum;
pthread_mutex_lock(&sum_lock);
sum += my_sum;
pthread_mutex_unlock(&sum_lock);
}
36
Synchronizing Tasks (cont.)
OpenMP:
OpenMP has reduce operation
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i=0; i < 100; i++) {
sum += array[i]; }
OpenMP also has critical directive that is executed by
all threads, but restricted to only one thread at a time
#pragma omp critical [( name )] new-line
sum = sum + 1;
(Prof. Mary Hall, Univ. of Utah)
37
Synchronizing Tasks (cont.)
Java:
A method that includes the synchronized modifier
disallows any other method from running on the
object while it is in execution
public synchronized void deposit(int i)
{…}
public synchronized int fetch() {…}
The above two methods are synchronized which
prevents them from interfering with each other
38
Synchronizing Tasks (cont.)
Java:
Cooperation synchronization is achieved via wait,
notify, and notifyAll methods
All methods are defined in Object, which is the root
class in Java, so all objects inherit them
The wait method must be called in a loop
The notify method is called to tell one waiting
thread that the event it was waiting has happened
The notifyAll method awakens all of the threads
on the object’s wait list
39
Synchronizing Tasks (cont.)
MPI:
Use send/receive to complete task synchronizations,
but semantics of send/receive have to be specialized
Non-blocking send/receive:
Non-blocking send/receive: send() and receive() calls
will return no matter whether data has arrived
Blocking send/receive:
Unbuffered blocking send() does not return until
matching receive() is encountered at receiving process
Buffered blocking send() will return after the sender
has copied the data into the designated buffer
Blocking receive() forces the receiving process to wait
(Prof. Ananth Grama, Purdue Univ. )
40
Unbuffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
41
Buffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
42
Summary
Concurrent execution can be at the instruction,
statement, subprogram, or program level
Two fundamental programming style: shared
variables and message passing
Programming languages must provide supports
for specifying control and data flows
43