Concurrent Algorithms

Transcript Concurrent Algorithms

Concurrent Algorithms
Summing the elements of an array
76
35
41
10
7
25
3
15
31
10
13
10
18
6
4
2
Parallel sum and parallel prefix sum

It’s relatively easy to see how to sum up the elements of an array
in a parallel fashion


It’s harder to see how to do a prefix (cumulative) sum




For example, the list [3, 1, 4, 1, 6] to [3, 4, 8, 9, 15]
This is a special case of what is sometimes called a scan operation
An example is shown on the next slide
The algorithm is done in two passes:



This is a special case of a reduce operation—combining a number of
values into a single value
The first pass is “up” the tree, retaining the summands
The second pass is “down” the tree
Note: These two examples are from Principles of Parallel Programming by Calvin Lin and Lawrence Snyder
3
Summing the elements of an array
0
76 = 35 + 41
35 (0+35)
0
35 = 10 + 25
41 = 31 + 10
10 (0+10)
66 (35+31)
35
0
10 = 7 + 3
25 = 15 + 10
25 (10+15)
7
10
0
7
7
3
10
15
25
10
35
13
48
31 = 13 + 18
35
18
66
48 (41+13)
6
72
10 = 6 + 4
66
72
4
76
4
Batcher’s Bitonic sort

Batcher’s bitonic sort is a sorting algorithm with the
following characteristics:



It’s a variation of MergeSort
It’s designed for 2n processors
It fully occupies all 2n processors


Unlike array sum, which uses fewer processors on each pass
I’m not going to go through this algorithm—I just want
you to be able to say you’ve heard of it 
5
MapReduce


MapReduce is a patented technique perfected by Google to deal
with huge data sets on clusters of computers
From Wikipedia:



"Map" step: The master node takes the input, chops it up into smaller
sub-problems, and distributes those to worker nodes. A worker node may
do this again in turn, leading to a multi-level tree structure. The worker
node processes that smaller problem, and passes the answer back to its
master node.
"Reduce" step: The master node then takes the answers to all the subproblems and combines them in a way to get the output - the answer to the
problem it was originally trying to solve.
Hadoop is a free Apache version of MapReduce
6
Basic idea of MapReduce

In MapReduce, the programmer has to write only two
functions, and the framework takes care of everything
else




The Map function is applied (in parallel) to each item of data,
producing a list of key-value pairs
The framework collects all the lists, and groups the key-value
pairs by key
The Reduce function is applied (in parallel) to each group,
returning either a single value, or nothing
The framework collects all the returns
7
Example: Counting words (Python)

The following Python program counts how many times
each word occurs in a set of data, and returns the list of
words and their counts


def mapper(key, value):
words=key.split()
for word in words:
Wmr.emit(word, '1')
def reducer(key, iter):
sum = 0
for s in iter:
sum = sum + int(s)
Wmr.emit(key, str(sum))
8
Example: Counting words (Java)

* Mapper for word count */

class Mapper {
public void mapper(String key, String value) {
String words[] = key.split(" ");
int i = 0;
for (i = 0; i < words.length; i++)
Wmr.emit(words[i], "1");
}
}
/* Reducer for word count */
class Reducer {
public void reducer(String key, WmrIterator iter) {
int sum = 0;
while (iter.hasNext()) {
sum += Integer.parseInt(iter.next());
}
Wmr.emit(key, Integer.valueOf(sum).toString());
}
}
9
Example: Average movie ratings

#!/usr/bin/env python
def mapper(key, value):
avgRating = float(value)
binRating = 0.0
if (0 < avgRating < 1.25):
binRating = 1.0
elif (1.25 <= avgRating < 1.75):
binRating = 1.5
elif (1.75 <= avgRating < 2.25):
binRating = 2.0
elif (2.25 <= avgRating < 2.75):
binRating = 2.5
elif (2.75 <= avgRating < 3.25):
binRating = 3.0
elif (3.25 <= avgRating < 3.75):
binRating = 3.5
elif (3.75 <= avgRating < 4.25):
binRating = 4.0
elif (4.25 <= avgRating < 4.75):
binRating = 4.5
elif (4.75 <= avgRating < 5.0):
binRating = 5.0
else:
binRating = 99.0

#!/usr/bin/env python
def reducer(key, iter):
count = 0
for s in iter:
count = count + 1
Wmr.emit(key, str(count))
Wmr.emit(str(binRating), key)
10
The End
11

Concurrent Algorithms

Transcript Concurrent Algorithms

Directory