COMP 308 Parallel Efficient Algorithms
Download
Report
Transcript COMP 308 Parallel Efficient Algorithms
Introduction to Parallel Computation
Lecturer: Dr. Igor Potapov
Ashton Building, room 3.15
E-mail: [email protected]
COMP 308 web-page:
http://www.csc.liv.ac.uk/~igor/COMP308
COMP 308
Parallel Efficient Algorithms
Slide 1
Course Description and
Objectives:
• The aim of the module is
– to introduce techniques for the design of
efficient parallel algorithms and
– their implementation.
Slide 2
Learning Outcomes:
At the end of the course you will be:
familiar with the wide applicability of graph theory
and tree algorithms as an abstraction for the
analysis of many practical problems,
familiar with the efficient parallel algorithms
related to many areas of computer science:
expression computation, sorting, graph-theoretic
problems, computational geometry, algorithmics
of texts etc.
familiar with the basic issues of implementing
parallel algorithms.
Also a knowledge will be acquired of those
problems which have been perceived as
intractable for parallelization.
Slide 3
Teaching method
• Series of 30 lectures ( 3hrs per week )
Lecture
Lecture
Lecture
Monday
Tuesday
Friday
10.00
10.00
10.00
-------------- Course Assessment ---------------------• A two-hour examination
80%
• Continues assignment
(Written class test + Home assignment) 20%
Slide 4
-----------------------------------------------------------------
Recommended Course Textbooks
• Introduction to Algorithms
Cormen et al.
• Introduction to Parallel Computing: Design and Analysis of
Algorithms
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis,
Benjamin Cummings 2nd ed. - 2003
• Efficient Parallel Algorithms
A.Gibbons, W.Rytter, Cambridge University Press 1988.
+
Research papers (will be announced later)
Slide 5
What is Parallel Computing?
(basic idea)
• Consider the problem of stacking
(reshelving) a set of library books.
– A single worker trying to stack all the books in
their proper places cannot accomplish the task
faster than a certain rate.
– We can speed up this process, however, by
employing more than one worker.
Slide 6
Solution 1
• Assume that books are organized into shelves and
that the shelves are grouped into bays
• One simple way to assign the task to the workers is:
– To divide the books equally among them.
– Each worker stacks the books one a time
• This division of work may not be most efficient way
to accomplish the task since
– The workers must walk all over the library to stack books.
Slide 7
Solution 2
Instance of
task
partitioning
• An alternative way to divide the work is to assign a
fixed and disjoint set of bays to each worker.
• As before, each worker is assigned an equal number
of books arbitrarily.
– If the worker finds a book that belongs to a bay assigned to
him or her,
Instance of
• he or she places that book in its assignment spot
Communication
– Otherwise,
task
• He or she passes it on to the worker responsible for the bay it
belongs to.
• The second approach requires less effort from
individual workers
Slide 8
Problems are parallelizable to
different degrees
• For some problems, assigning partitions to other
processors might be more time-consuming than
performing the processing locally.
• Other problems may be completely serial.
– For example, consider the task of digging a post hole.
• Although one person can dig a hole in a certain amount of
time,
• Employing more people does not reduce this time
Slide 9
Power of parallel solutions
• Pile collection
– Ants/robots with very limited abilities
(see its neighbourhood )
– Grid environment
(sticks and robots)
Move()
Move randomly ( )
Until robot sees a stick in its
nighbouhood
Collect()
Move(); Pick up a sick; Move();
Put it down; Collect();
Slide 10
Sorting in nature
6
2
1 3
5
7
4
Slide 11
Parallel Processing
(Several processing elements working
to solve a single problem)
Primary consideration: elapsed time
– NOT: throughput, sharing resources, etc.
• Downside: complexity
– system, algorithm design
• Elapsed Time = computation time +
communication time +
synchronization time
Slide 12
Design of efficient algorithms
A parallel computer is of little use unless
efficient parallel algorithms are available.
– The issue in designing parallel algorithms are very
different from those in designing their sequential
counterparts.
– A significant amount of work is being done to
develop efficient parallel algorithms for a variety
of parallel architectures.
Slide 13
Processor Trends
• Moore’s Law
– performance doubles every 18 months
• Parallelization within processors
– pipelining
– multiple pipelines
Slide 14
Why Parallel Computing
• Practical:
– Moore’s Law cannot hold forever
– Problems must be solved immediately
– Cost-effectiveness
– Scalability
• Theoretical:
– challenging problems
Slide 15
Some Complex Problems
•
•
•
•
•
•
N-body simulation
Atmospheric simulation
Image generation
Oil exploration
Financial processing
Computational biology
Slide 16
Some Complex Problems
• N-body simulation
– O(n log n) time
– galaxy 1011 stars approx. one year /
iteration
• Atmospheric simulation
– 3D grid, each element interacts with neighbors
– 1x1x1 mile element 5 108 elements
– 10 day simulation requires approx. 100 days
Slide 17
Some Complex Problems
• Image generation
– animation, special effects
– several minutes of video 50 days of rendering
• Oil exploration
– large amounts of seismic data to be processed
– months of sequential exploration
Slide 18
Some Complex Problems
• Financial processing
– market prediction, investing
– Cornell Theory Center, Renaissance Tech.
• Computational biology
– drug design
– gene sequencing (Celera)
– structure prediction (Proteomics)
Slide 19
Fundamental Issues
• Is the problem amenable to parallelization?
• How to decompose the problem to exploit
parallelism?
• What machine architecture should be used?
• What parallel resources are available?
• What kind of speedup is desired?
Slide 20
Two Kinds of Parallelism
• Pragmatic
– goal is to speed up a given computation as much
as possible
– problem-specific
– techniques include:
• overlapping instructions (multiple pipelines)
• overlapping I/O operations (RAID systems)
• “traditional” (asymptotic) parallelism techniques
Slide 21
Two Kinds of Parallelism
• Asymptotic
– studies:
• architectures for general parallel computation
• parallel algorithms for fundamental problems
• limits of parallelization
– can be subdivided into three main areas
Slide 22
Asymptotic Parallelism
• Models
– comparing/evaluating different architectures
• Algorithm Design
– utilizing a given architecture to solve a given
problem
• Computational Complexity
– classifying problems according to their
difficulty
Slide 23
Architecture
• Single processor:
– single instruction stream
– single data stream
– von Neumann model
• Multiple processors:
– Flynn’s taxonomy
Slide 24
Many
1
Instruction Streams
Flynn’s Taxonomy
MISD
MIMD
SISD
SIMD
1
Many
Data Streams
Slide 25
Slide 26
Parallel Architectures
• Multiple processing elements
• Memory:
– shared
– distributed
– hybrid
• Control:
– centralized
– distributed
Slide 27
Parallel vs Distributed Computing
• Parallel:
– several processing elements concurrently
solving a single same problem
• Distributed:
– processing elements do not share memory or
system clock
• Which is the subset of which?
– distributed is a subset of parallel
Slide 28
Efficient and optimal parallel
algorithms
• A parallel algorithm is efficient iff
– it is fast (e.g. polynomial time) and
– the product of the parallel time and number of processors is
close to the time of at the best know sequential algorithm
T sequential T parallel N processors
• A parallel algorithms is optimal iff this product is of the
same order as the best known sequential time
Slide 29
Metrics
A measure of relative performance between a multiprocessor
system and a single processor system is the speed-up S( p),
defined as follows:
S( p) =
Execution time using a single processor system
Execution time using a multiprocessor with p processors
S( p) =
T1
Tp
Efficiency =
Sp
p
Cost = p Tp
Slide 30
Metrics
• Parallel algorithm is cost-optimal:
parallel cost = sequential time
Cp = T1
Ep = 100%
• Critical when down-scaling:
parallel implementation may
become slower than sequential
T1 = n 3
Tp = n2.5 when p = n2
Cp = n4.5
Slide 31
Amdahl’s Law
• f = fraction of the problem that’s inherently
sequential
(1 – f) = fraction that’s parallel
• Parallel time Tp:
T p f (1 f ) p
• Speedup with p processors:
1
Sp
f
1 f
p
Slide 32
What kind of speed-up may be achieved?
• Part f is computed by a single processor
• Part (1-f) is computed by p processors, p>1
Basic observation: Increasing p we cannot speed-up part f.
f
Slide 33
Amdahl’s Law
• Upper bound on speedup (p = )
1
Sp
f
1 f
Converges to 0
S
1
f
p
• Example:
f = 2%
S = 1 / 0.02 = 50
Slide 34
The main open question
• The basic parallel complexity class is NC.
• NC is a class of problems computable in poly-logarithmic
time (log c n, for a constant c) using a polynomial number of
processors.
• P is a class of problems computable sequentially in a
polynomial time
The main open question in parallel computations is
NC = P ?
Slide 35