Adaptive Load Balancing in a MapReduce Environment - Balakrishnan - Rozemary

Download Report

Transcript Adaptive Load Balancing in a MapReduce Environment - Balakrishnan - Rozemary

Adaptive Load Balancing in a
MapReduce Environment
- Balakrishnan
- Rozemary
- Rohit
Approach
• Run X% of input as a prejob to get the
statistics of the map outputs
• Decide the number of reducers based on the
statistics obtained
• Run the actual Job with the recommended
number of reducers
Requirements for Prejob
• A Random Sampler
• Mapper: Same as the Mapper of the actual
Job
• Reducer : Takes the output of the mapper and
computes the number of reducers using the
formula below.
- |Reducers| =
min
𝑘𝑒𝑦𝑠 ,
Code Snippet
Experiments
Experiments
Experiments
Drawbacks
• Too general. Does not take into consideration
the distribution of values within the keys.
– Example: 10,000 keys with 2 values each and 2
keys with 20,000 values each
Future Work
• Dynamic hashing approach
– Partition tuning after X% of the map job is done.