Concurrent Algorithms
Download
Report
Transcript Concurrent Algorithms
Concurrent Algorithms
Summing the elements of an array
76
35
41
10
7
25
3
15
31
10
13
10
18
6
4
2
Parallel sum and parallel prefix sum
It’s relatively easy to see how to sum up the elements of an array
in a parallel fashion
It’s harder to see how to do a prefix (cumulative) sum
For example, the list [3, 1, 4, 1, 6] to [3, 4, 8, 9, 15]
This is a special case of what is sometimes called a scan operation
An example is shown on the next slide
The algorithm is done in two passes:
This is a special case of a reduce operation—combining a number of
values into a single value
The first pass is “up” the tree, retaining the summands
The second pass is “down” the tree
Note: These two examples are from Principles of Parallel Programming by Calvin Lin and Lawrence Snyder
3
Summing the elements of an array
0
76 = 35 + 41
35 (0+35)
0
35 = 10 + 25
41 = 31 + 10
10 (0+10)
66 (35+31)
35
0
10 = 7 + 3
25 = 15 + 10
25 (10+15)
7
10
0
7
7
3
10
15
25
10
35
13
48
31 = 13 + 18
35
18
66
48 (41+13)
6
72
10 = 6 + 4
66
72
4
76
4
Batcher’s Bitonic sort
Batcher’s bitonic sort is a sorting algorithm with the
following characteristics:
It’s a variation of MergeSort
It’s designed for 2n processors
It fully occupies all 2n processors
Unlike array sum, which uses fewer processors on each pass
I’m not going to go through this algorithm—I just want
you to be able to say you’ve heard of it
5
MapReduce
MapReduce is a patented technique perfected by Google to deal
with huge data sets on clusters of computers
From Wikipedia:
"Map" step: The master node takes the input, chops it up into smaller
sub-problems, and distributes those to worker nodes. A worker node may
do this again in turn, leading to a multi-level tree structure. The worker
node processes that smaller problem, and passes the answer back to its
master node.
"Reduce" step: The master node then takes the answers to all the subproblems and combines them in a way to get the output - the answer to the
problem it was originally trying to solve.
Hadoop is a free Apache version of MapReduce
6
Basic idea of MapReduce
In MapReduce, the programmer has to write only two
functions, and the framework takes care of everything
else
The Map function is applied (in parallel) to each item of data,
producing a list of key-value pairs
The framework collects all the lists, and groups the key-value
pairs by key
The Reduce function is applied (in parallel) to each group,
returning either a single value, or nothing
The framework collects all the returns
7
Example: Counting words (Python)
The following Python program counts how many times
each word occurs in a set of data, and returns the list of
words and their counts
def mapper(key, value):
words=key.split()
for word in words:
Wmr.emit(word, '1')
def reducer(key, iter):
sum = 0
for s in iter:
sum = sum + int(s)
Wmr.emit(key, str(sum))
8
Example: Counting words (Java)
* Mapper for word count */
class Mapper {
public void mapper(String key, String value) {
String words[] = key.split(" ");
int i = 0;
for (i = 0; i < words.length; i++)
Wmr.emit(words[i], "1");
}
}
/* Reducer for word count */
class Reducer {
public void reducer(String key, WmrIterator iter) {
int sum = 0;
while (iter.hasNext()) {
sum += Integer.parseInt(iter.next());
}
Wmr.emit(key, Integer.valueOf(sum).toString());
}
}
9
Example: Average movie ratings
#!/usr/bin/env python
def mapper(key, value):
avgRating = float(value)
binRating = 0.0
if (0 < avgRating < 1.25):
binRating = 1.0
elif (1.25 <= avgRating < 1.75):
binRating = 1.5
elif (1.75 <= avgRating < 2.25):
binRating = 2.0
elif (2.25 <= avgRating < 2.75):
binRating = 2.5
elif (2.75 <= avgRating < 3.25):
binRating = 3.0
elif (3.25 <= avgRating < 3.75):
binRating = 3.5
elif (3.75 <= avgRating < 4.25):
binRating = 4.0
elif (4.25 <= avgRating < 4.75):
binRating = 4.5
elif (4.75 <= avgRating < 5.0):
binRating = 5.0
else:
binRating = 99.0
#!/usr/bin/env python
def reducer(key, iter):
count = 0
for s in iter:
count = count + 1
Wmr.emit(key, str(count))
Wmr.emit(str(binRating), key)
10
The End
11