CS347: MapReduce CS347 Motivation for Map-Reduce Distribution makes simple computations complex •Communication •Load balancing •Fault tolerance •… What if we could write “simple” programs that were automatically parallelized?
Download ReportTranscript CS347: MapReduce CS347 Motivation for Map-Reduce Distribution makes simple computations complex •Communication •Load balancing •Fault tolerance •… What if we could write “simple” programs that were automatically parallelized?
CS347: MapReduce
CS347
1
Motivation for Map-Reduce
Distribution makes simple computations complex
•Communication
•Load balancing
•Fault tolerance
•…
What if we could write “simple” programs that were automatically
parallelized?
2
Motivation for Map-Reduce
Recall one of our sort strategies:
R1
Local sort
R’1
ko
R2
Local sort
R’2
Result
k1
R3
process data
& partition
Local sort
R’3
additional
processing
CS 347
3
Another example: Asymmetric fragment + replicate join
Local join
Ra
R1
S
Sa
Rb
R2
S
Sb
R3
S
f
partition
process data
& partition
Result
union
additional
processing
CS 347
4
From our point of view…
• What if we didn’t have to think too hard
about the number of sites or fragments?
• MapReduce goal: a library that hides the
distribution from the developer, who can
then focus on the fundamental “nuggets” of
their computation
CS347
5
Building Text Index - Part I
original Map-Reduce application....
FLUSHING
1
Page
stream
rat
(rat, 1)
(cat, 2)
dog
(dog, 1)
(dog, 1)
(dog, 2)
(dog, 2)
(cat, 2)
(dog, 3)
(rat, 3)
(rat, 1)
(dog, 3)
(rat, 3)
Tokenizing
Sorting
2 dog
cat
rat
dog
3
Loading
Intermediate
runs
Disk
CS347
6
Building Text Index - Part II
(cat, 2)
(dog, 1)
(ant, 5)
(dog, 2)
(cat, 2)
(dog, 3)
(cat, 4)
(rat, 1)
(dog, 1)
(rat, 3)
(dog, 2)
Merge
Intermediate
Runs
CS347
(dog, 3)
(ant, 5)
(dog, 4)
(cat, 4)
(dog, 5)
(dog, 4)
(eel, 6)
(dog, 5)
(rat, 1)
(eel, 6)
(rat, 3)
(ant: 2)
(cat: 2,4)
(dog: 1,2,3,4,5)
(eel: 6)
(rat: 1, 3)
Final index
7
Generalizing: Map-Reduce
Map
FLUSHING
1
Page
stream
rat
(rat, 1)
(cat, 2)
dog
(dog, 1)
(dog, 1)
(dog, 2)
(dog, 2)
(cat, 2)
(dog, 3)
(rat, 3)
(rat, 1)
(dog, 3)
(rat, 3)
Tokenizing
Sorting
2 dog
cat
rat
dog
3
Loading
Intermediate
runs
Disk
CS347
8
Generalizing: Map-Reduce
(cat, 2)
(dog, 1)
(ant, 5)
(dog, 2)
(cat, 2)
(dog, 3)
(cat, 4)
(rat, 1)
(dog, 1)
(rat, 3)
(dog, 2)
Merge
Intermediate
Runs
CS347
(dog, 3)
(ant, 5)
(dog, 4)
(cat, 4)
(dog, 5)
(dog, 4)
(eel, 6)
(dog, 5)
(rat, 1)
(eel, 6)
(rat, 3)
(ant: 2)
(cat: 2,4)
(dog: 1,2,3,4,5)
(eel: 6)
(rat: 1, 3)
Reduce
Final index
9
Map Reduce
• Input: R={r1, r2, ...rn}, functions M, R
– M(ri) { [k1, v1], [k2, v2],.. }
– R(ki, valSet) [ki, valSet’]
•
•
•
•
S is bag
Let S={ [k, v] | [k, v] M(r) for some r R }
Let K = {k | [k,v] S, for any v }
Let G(k) = { v | [k, v] S } G is bag
Output = { [k, T] | k K, T=R(k, G(k)) }
CS347
10
References
• MapReduce: Simplified Data Processing on Large
Clusters, Jeffrey Dean and Sanjay Ghemawat,
available at
http://labs.google.com/papers/mapreduce-osdi04.pdf
CS347
11
Example: Counting Word Occurrences
• map(String doc, String value);
// doc is document name
// value is document content
for each word w in value:
EmitIntermediate(w, “1”);
• Example:
– map(doc, “cat dog cat bat dog”) emits
[cat 1], [dog 1], [cat 1], [bat 1], [dog 1]
CS347
12
Example: Counting Word Occurrences
• reduce(String key, Iterator values);
// key is a word
// values is a list of counts
int result = 0;
for each v in values:
result += ParseInt(v)
Emit(AsString(result));
• Example:
– reduce(“dog”, “1 1 1 1”) emits “4”
CS347
Becomes (“dog”, 4)
13
Mappers
Source
data
Split
data
Workers
CS347
14
Mappers
Split
data
Workers
CS347
15
Mappers
Split
data
Workers
CS347
16
Mappers
Split
data
Workers
CS347
17
Mappers
Split
data
Workers
CS347
18
Mappers
Split
data
Workers
CS347
19
Mappers
Split
data
Workers
CS347
20
Mappers
Split
data
Workers
CS347
21
Mappers
Split
data
Workers
CS347
22
Mappers
Split
data
Workers
CS347
23
Mappers
Split
data
Workers
CS347
24
Mappers
Split
data
Workers
CS347
25
Shuffle
Workers
CS347
26
Shuffle
Workers
CS347
27
Reduce
Workers
CS347
28
Another way to think about it:
• Mapper: (some query)
• Shuffle: GROUP BY
• Reducer: SELECT Aggregate()
CS347
29
Doesn’t have to be relational
• Mapper: Parse data into K,V pairs
• Shuffle: Repartition by K
• Reducer: Transform the V’s for a K into a
Vfinal
CS347
30
Process model
CS347
31
Process model
Can vary the number of
mappers to tune performance
Reduce tasks bounded by
number of reduce shards
CS347
32
Implementation Issues
•
•
•
•
•
•
Combine function
File system
Partition of input, keys
Failures
Backup tasks
Ordering of results
CS347
33
Combine Function
[cat 1], [cat 1], [cat 1]...
worker
[dog 1], [dog 1]...
worker
worker
Combine is like a local reduce applied before distribution:
[cat 3]...
worker
[dog 2]...
worker
worker
CS347
34
Data flow
reduce worker must be able to
access local disks on map workers
worker must be able to
access any part of input file;
so input on distributed fs
any worker must be able to write
its part of answer; answer is left as
distributed file
High-throughput
network is essential
CS347
35
Partition of input, keys
• How many workers, partitions of input file?
How many workers? Best to have
many splits per worker: Improves
load balance; if worker fails,
easier to spread its tasks
How many splits?
1
2
3
worker
Should workers be assigned
to splits “near” them?
worker
9
worker
Similar questions for
reduce workers
CS347
36
What takes time?
•
•
•
•
Reading input data
Mapping data
Shuffling data
Reducing data
• Map and reduce are separate phases
– Latency determined by slowest task
– Reduce shard skew can increase latency
• Map and shuffle can be overlapped
– But if lots intermediate data, shuffle may be slow
CS347
37
Failures
• Distributed implementation should produce same
output as would have been produced by a nonfaulty sequential execution of the program.
• General strategy: Master detects worker failures,
and has work re-done by another worker.
ok?
master
split j
worker
redo j
worker
CS347
38
Backup Tasks
• Straggler is a machine that takes unusually long
(e.g., bad disk) to finish its work.
• A straggler can delay final completion.
• When task is close to finishing, master schedules
backup executions for remaining tasks.
Must be able to eliminate
redundant results
CS347
39
Ordering of Results
• Final result (at each node) is in key order
also in key order:
[k1, v1]
[k3, v3]
[k1, T1]
[k2, T2]
[k3, T3]
[k4, T4]
CS347
40
Example: Sorting Records
5
3
6
e
c
f
2
1
6
b
a
f*
8
4
9
7
h
d
i
g
3
5
c
e
6
f
1
a
2
6
b
f*
7
9
g
i
4
8
d
h
W1
W2
W3
W5
1
3
5
7
9
a
c
e
g
i
one or two
records for
k=6?
W6
2
4
6
6
8
b
d
f
f*
h
Map: extract k, output [k, record]
CS347
Reduce: Do nothing!
41
Other Issues
• Skipping bad records
• Debugging
CS347
42
MR Claimed Advantages
• Model easy to use, hides details of parallelization,
fault recovery
• Many problems expressible in MR framework
• Scales to thousands of machines
CS347
43
MR Possible Disadvantages
• 1-input 2-stage data flow rigid, hard to
adapt to other scenarios
• Custom code needs to be written even for
the most common operations, e.g.,
projection and filtering
• Opaque nature of map, reduce functions
impedes optimization
CS347
44
Hadoop
• Open-source Map-Reduce system
• Also, a toolkit
– HDFS – filesystem for Hadoop
– HBase – Database for Hadoop
• Also, an ecosystem
– Tools
– Recipes
– Developer community
CS347
45
MapReduce pipelines
• Output of one MapReduce becomes input
for another
– Example:
•
•
•
•
Stage 1: Translate data to canonical form
Stage 2: Term count
Stage 3: Sort by frequency
Stage 4: Extract top-k
CS347
46
Make it database-y?
• Simple idea: each operator is a MapReduce
• How to do:
–
–
–
–
Select
Project
Group by, aggregate
Join
CS347
47
Reduce-side join
• Shuffle puts all values for the same key at
the same reducer
• Mapper
– Input: tuples from R; tuples from S
– Output: (join value, (R|S, tuple))
• Reducer
– Local join of all R tuples with all S tuples
CS347
48
Map-side join
• Like a hash-join, but every mapper has a copy of
the hash table
• Mapper:
– Read in hash table of R
– Input: Tuples of S
– Output: Tuples of S joined with tuples of R
• Reducer
– Pass through
CS347
49
Comparison
• Reduce-side join shuffles all the data
• Map-side join requires one table to be small
CS347
50
Semi-join?
• One idea:
– MapReduce 1: Extract the join keys of R;
reduce-side join with S
– MapReduce 2: Map-side join result of
MapReduce 1 with R
CS347
51
Platforms for SQL-like queries
•
•
•
•
•
Pig Latin
Hive
MapR
MemSQL
…
CS347
52
Why not just use a DBMS?
• Many DBMSs exist and are highly optimized
• A comparison of approaches to large-scale data analysis. Pavlo et al,
SIGMOD 2009
CS347
53
Why not just use a DBMS?
• One reason: loading data into a DBMS is hard
• A comparison of approaches to large-scale data analysis. Pavlo et al,
SIGMOD 2009
CS347
54
Why not just use a DBMS?
• Other possible reasons:
–
–
–
–
–
–
–
MapReduce is more scalable
MapReduce is more easily deployed
MapReduce is more easily extended
MapReduce is more easily optimized
MapReduce is free (that is, Hadoop)
I already know Java
MapReduce is exciting and new
CS347
55
Data store
• Instead of HDFS, data could be stored in a
database
• Example: HadoopDB/Hadapt
– Data store is PostgresSQL
– Allows for indexing, fast local query processing
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for
Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J.
Abadi, Avi Silberschatz, Alex Rasin. In Proceedings of VLDB, 2009.
CS347
56
Batch processing?
• MapReduce materializes intermediate results
– Map output written to disk before reduce starts
• What if we pipelined the data from map to reduce?
– Reduce could quickly produce approximate answers
– MapReduce could implement continuous queries
MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joe Hellerstein,
Khaled Elmeleegy, Russell Sears. NSDI 2010
CS347
57
Not just database queries!
• Any large, partition-able computation
–
–
–
–
Build an inverted index
Image thumbnailing
Machine translation
…
– Anything expressible in the Map/Reduce
functions via general purpose language
CS347
58