Join Algorithms In MapReduce

Transcript Join Algorithms In MapReduce

A Comparison of Join Algorithms for Log Processing in MapReduce

Spyros Blanas, Jignesh M. Patel (University of Wisconsin-Madison) Eugene J. Shekita, Yuanyuan Tian (IBM Almaden Research Center) SIGMOD 2010 August 1, 2010 Presented by Hyojin Song



Introduction



Join Algorithms In MapReduce



Experimental Evaluation



Discussion



Conclusion

2 / 30

Introduction(1/3)

 Log Processing – – – – Important type of data analysis commonly done with MapReduce Log Table A log of events  click-stream   log of phone call records a sequence of transactions To compute various statistics for business insight    Often needs to be join  filtered aggregated mined for patterns Log data and Reference data(user information) Call records 2010.09.24.14:20.30

2010.09.24.14:30.45

2010.09.25.19:11.118

2010.09.28.06:40.97

2010.09.29.08:44.08

…… Number 01191655603 01046841397 01926540846 01098446512 01013461655 …… Reference Table Number 01191655603 01046841397 01926540846 01098446512 01013461655 …… Name 송효진 안철수 한효주 안인석 마음이 …… 3 / 30

Introduction(2/3)

 MapReduce Framework – Used to analyze large volumes of data – The success of MapReduce  Simple programming framework  To manage parallelization, fault tolerance, and load balancing – The critics of MapReduce  lack of a schema  lack of a declarative query language  lack of indexes – Difficult for joins  Not originally designed to combine information from several data sources  To use simple but inefficient algorithms to perform joins 4 / 30

Introduction(3/3)

 The benefits of MapReduce for log processing – – – – Scalability  China Mobile gathers 5-8TB of phone call records per day  Facebook collect almost 6TB of new log data everyday with totally 1.7PB

Schema free  flexibility  a log record may also change over time Simple scans preferable (<-> index scans) Time consuming work  gracefully fault tolerance support (<-> parallel RDBMS)  The goal of this paper – – the implementation of several well-known join strategies in MapReduce comprehensive experiments to compare these join techniques 5 / 30



Introduction



Join Algorithms In MapReduce



Experimental Evaluation



Discussion



Conclusion

Problem Statement 1. Repartition Join 2. Improved Repartition Join 3. Directed Join 4. Broadcast Join 5. Semi-Join 6. Per-split Semi-Join 6 / 30



Join Algorithms in MR

Problem Statement An equi-join between a log table L and a reference table R on single column, with |L| >> |R|  To propose further improving its performance with some

preprocessing techniques

– – Well-known in the RDBMS literature Adapting them to MapReduce is not always straightforward – Crucial implementation details of these join algorithms  To implement two additional functions: init() and close() – These are called before and after each map or reduce task 7 / 30



Join Algorithms in MR

1. Repartition Join The most commonly used join strategy in the MapReduce framework – L and R are dynamically partitioned on the join key – – The corresponding pairs of partitions are joined Similar to partitioned sort-merge join in the parallel RDBMS  Example Tables(Log table & User table) – – – Log table  500,000 records  Log has a lecture name and degree User table  10,000 records Join key is the student ID

Log Table

log DB B+ KRR A Opt A ML C0 OS A+ NL D … Student ID 2008-2424 2010-8281 2005-3682 2009-0078 2010-1004 2008-0909 … 8 / 30

User Table

Student ID 2008-0909 Name Ahn Jaemin 2010-1004 2009-0078 2005-3682 2010-8281 … Kim Somin Song Hyojin Lee taewhi An Inseok …

Join Algorithms in MR

1. Repartition Join A split of R or L (Distributed File System) L DB B 2008-2424 KRR A 2010-8281 Map Phase Intermediate results Local disk 2010-8281 2008-2424

: KRR A

: DB B R Song 2009-0078 An 2010-8281 …….

L NL D 2008-0909 ML C 2009-0078 OPT A 2005-3682

2010-8281 2008-0909 2009-0078 2009-0078 2005-3682

: An

: NL D

: ML C

: Song

: OPT A 9 / 30 Reduce Phase Buffer 2010-8281

: An 2008-0909 2010-8281

: NL D

: KRR A 2009-0078

: Song 2005-3682 2008-2424 2009-0078

: OPT A

: DB B

: ML C

Join Algorithms in MR

1. Repartition Join Reduce Phase Local disk 2010-8281 2008-2424

: KRR A

: DB B 2010-8281 2008-0909 2009-0078 2009-0078 2005-3682

: An

: NL D

: ML C

: Song

: OPT A Buffer

B R

2010-8281

: An

B L

2008-0909 2010-8281

: NL D

: KR A

B R

2009-0078

: Song

B L

2005-3682 2008-2424 2009-0078

: OPT A

: DB B

: ML C 10 / 30

Output File

(Distributed File System)

Student ID

2009-0078 2010-8281

Name

An In Seok Song Hyo Jin

Log

KRR A ML C



Join Algorithms in MR

1. Repartition Join Standard Repartition Join – – – Potential problem  all records have to be buffered.

May not fit in memory  The data is highly skewed  The key cardinality is small Variants of the standard repartition join are used in Pig, Hive, and Jaql today.

 They all suffer from the buffering problem  Improved Repartition Join – – – The output key is changed to a composite of the join key and the table tag The partitioning & grouping function is customized Records from the smaller table R are buffered and L records are streamed to generate the join output 11 / 30

Join Algorithms in MR

2. Improved Repartition Join A split of R or L (Distributed File System) L DB B 2008-2424 KRR A 2010-8281 Map Phase Intermediate results Local disk 2010-8281

L L

: KRR A 2008-2424

L L

: DB B R Song 2009-0078 An 2010-8281 …….

L NL D 2008-0909 ML C 2009-0078 OPT A 2005-3682

2010-8281

2008-0909

L R

: An

: NL D 2009-0078

2009-0078

2005-3682

L L

: ML C

: Song

: OPT A 12 / 30 Reduce Phase Buffer 2010-8281

R R

: An 2008-0909

2010-8281

L L

: NL D

: KRR A 2009-0078

R R

: Song 2005-3682

2008-2424

2009-0078

L L

: OPT A

: DB B

: ML C

Join Algorithms in MR

2. Improved Repartition Join Reduce Phase Local disk 2010-8281

L L

: KRR A 2008-2424

L L

: DB B 2010-8281

2008-0909

L R

: An

: NL D 2009-0078

2009-0078

2005-3682

L L

: ML C

: Song

: OPT A Buffer

B R

2010-8281

: An L records are streamed

B R

2009-0078

: Song L records are streamed 13 / 30

Output File

(Distributed File System)

Student ID

2009-0078 2010-8281

Name

An In Seok Song Hyo Jin

Log

KRR A ML C



Join Algorithms in MR

3. Directed Join Preprocessing for Repartition Join (Directed Join) – – Both L and R have already been partitioned on the join key  Pre-partitioning L on the join key  Then at query time, matching partitions from L and R can be directly joined A map-only MapReduce job.

  During the init phase, R i is retrieved from the DFS To use a main memory hash table, if it’s not already in local storage 14 / 30



Join Algorithms in MR

4. Broadcast Join Broadcast Join – – In most applications, |R| << |L| Instead of moving both R and L across the network, – – – To broadcast the smaller table R to avoids the network overhead A map-only job Each map task uses a main-memory hash table for either L or R 15 / 30



Join Algorithms in MR

4. Broadcast Join Broadcast Join – If R < a split of L  To build the hash table on R – If R > a split of L  To build the hash table on a split of L  Preprocessing for Broadcast Join – Most nodes in the cluster have a local copy of R in advance – To avoid retrieving R from the DFS in its init() function 16 / 30



Join Algorithms in MR

5. Semi-Join Semi-Join – – Some applications, |R| << |L|  In Facebook, user table has hundreds of millions of records  A few million unique active users per hour To avoid sending the records in R over the network that will not join with L  Preprocessing for Semi-Join – First two phases of semi-join can preprocess 17 / 30



Join Algorithms in MR

6. Per-Split Semi-Join Per-Split Semi-Join – – The problem of Semi-join : All records of extracted R will not join L i L i can be joined with R i directly  Preprocessing for Per-split Semi-join – Also benefit from moving its first two phases 18 / 30



Introduction



Join Algorithms In MapReduce



Experimental Evaluation



Discussion



Conclusion

1. Environment 2. Datasets 3. MapReduce Time Breakdown 4. Experimental Results 19 / 30



Experimental Evaluation

1. Environment System Specification – – All experiments run on a 100-node cluster Single 2.4GHz Intel Core 2 Duo processor – – 4GB of DRAM and two SATA disks Red Hat Enterprise Server 5.2 running Linux 2.6.18

 Network Specification – – The 100 nodes were spread across two racks Each node can execute two map and two reduce tasks concurrently – – – Each rack had its own gigabit Ethernet switch The rack level bandwidth is 32Gb/s Under full load, 35MB/s cross-rack node-to-node bandwidth  version 0.19.0, HDFS (128MB block size) 20 / 30



Experimental Evaluation

2. Datasets Datasets Join column size Record size Total size

Event Log (L)

10 bytes 100bytes (average) 500GB

User Info (R)

5 bytes 100 bytes (exactly) 10MB~100GB • • • • • • Join result is a 10 bytes join key n-to-1 join many users are inactive All the records in L always appear in the result To fix the fraction of R that was referenced by L to be 0.1%, 1%, or 10% To simulate some active users, a Zipf distribution was used 21 / 30

Experimental Evaluation

3. MapReduce Time Breakdown 22 / 30



Experimental Evaluation

3. MapReduce Time Breakdown MapReduce Time Breakdown – – What transpires during the execution of a MapReduce job The overhead of various execution components of MapReduce – System Environment  The standard repartition join algorithm  500GB log table and 30MB reference table  1% actually referenced by the log records  4000 map tasks and 200 reduce tasks  A node was assigned 40 map and 2 reduce tasks 23 / 30



Experimental Evaluation

3. MapReduce Time Breakdown Interesting Observations on MapReduce – – – – The map phase was clearly CPU-bound The reduce phase was limited by the network bandwidth  Writing the three copies of the join result to HDFS The disk and the network activities were moderate and periodic during map phase  The peaks were related to the output generation in the map task  The shuffle phase in the reduce task Almost idle for about 30 seconds – between the 9 min and 10 min mark  Waiting for the slowest map task By enabling independent and concurrent map tasks, almost all CPU, disk and network activities can be overlapped 24 / 30

Experimental Evaluation

4. Experimental Results ▣ No preprocessing 25 / 30 ▣ preprocessing

Experimental Evaluation

4. Experimental Results 26 / 30



Introduction



Join Algorithms In MapReduce



Experimental Evaluation



Discussion



Conclusion

27 / 30

Discussion

 Choosing the Right Strategy – – To determine what is the right join strategy for a given circumstance To provide an important first step for query optimization 28 / 30



Introduction



Join Algorithms In MapReduce



Experimental Evaluation



Discussion



Conclusion

29 / 30

Conclusion

 Joining log data with reference data in MapReduce has emerged as an important part – – Analytic operations for enterprise customers Web 2.0 companies  To design a series of join algorithms on top of MapReduce – – Without requiring any modification to the actual framework To propose many details for efficient implementation  Two additional function: Init(), close()  Practical preprocessing techniques  Future work – Multi-way joins – – – Indexing methods to speedup join queries Optimization module (selecting appropriate join algorithms) New programming models to extend the MapReduce framework 30 / 30

Join Algorithms In MapReduce

Transcript Join Algorithms In MapReduce

A Comparison of Join Algorithms for Log Processing in MapReduce

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

Introduction(1/3)

Introduction(2/3)

Introduction(3/3)

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Join Algorithms in MR

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

Discussion

Contents

Introduction

Join Algorithms In MapReduce

Experimental Evaluation

Discussion

Conclusion

Conclusion

Directory