HPMR Prefetching and Pre-shuffling in Shared
Download
Report
Transcript HPMR Prefetching and Pre-shuffling in Shared
HPMR : Prefetching and Pre-shuffling in
Shared MapReduce Computation
Environment
IEEE 2009
Sangwon Seo(KAIST), Ingook Jang
Kyungchang Woo, Inkyo Kim
Jin-Soo Kim, Seungyoul Maeng
2013.04.25
파일처리 특론
김태훈
Contents
1.
Introduction
2.
Related Work
3.
Design
4.
Implementation
5.
Evaluations
6.
Conclusion
2 /27
Introduction
It
is difficult to deal internet services
Enormous volumes of data
Generate a large amount of data which needs to be
processed every day
To
solve the problem, use MapReduce
programing model
Support distributed and parallel processing for
largescale data-intensive application
data-intensive
simulation
application e.g : data mining, scientific
3 /27
Introduction
Hadoop;
Since hadoop is distributed system, it’s called HDFS(Hadoop
distributed file system)
HDFS
master server that manages the namespace of a file system,
regulates clients’ access to file
A Number of DataNode
manage storage directly attached to each DataNode
HDFS
cluster is consist of
A Single NameNode
based on MapReduce
placement policy
place each of three replicas on each node in the local rack
Advantage : improve write performance by cutting down interrack write traffic
4 /27
Introduction
Node2
Node1
Files loaded from HDFS stores
file
file
Split
Split
Split
RR
RR
RR
map
map
map
Combiner
Partitione
r
Writeback to
Local HDFS
store
Input format
Input format
RecordReaders
“Shuffling” process
(over the N/W)
Split
Split
Split
RR
RR
RR
map
map
map
file
file
Combiner
Partitione
r
(sort)
(sort)
reduce
reduce
Output
Format
Output
Format
Essential to reduce the shuffling overhead to improve the overall performance of the
MapReduce computation.
the network bandwidth between nodes is also an important factor of the shuffling overhead.
5 /27
Introduction
Hadoop’s
Moving computation is better
Better
It’s
basic principle
to migrate the computation closer
used for when the size of data set is huge
the migration of the computation minimizes network
congestion and increase the overall throughput1) of
the system.
1)Throughput : 지정된 시간 내 전송된 처리량
6 /27
Introduction
HOD(Hadoop-On-Demand,
developed by Yahoo!)
a management system for provisioning virtual Hadoop
cluster over a large physical large physical cluster
All
physical nodes are shared by more than one Yahoo!
Engineers
Increase the utilization of physical resource
When
the computing resources are shared by
multiple users, Hadoop policy(‘Moving
computation’) is not effective
Because resource are shared
Resource
e.g : computing n/w, hardware resource
7 /27
Introduction
To
solve the that problem, two optimization
scheme is proposed
Prefetching
Intra-block
prefetching
Inter-block prefetching
Pre-shuffling
8 /27
Related work
J.
Dean and S. Ghemawat
Traditional prefetching techniques
V.
Padmanabhan and J.Mogul, T.Kroeger and D.
long, P. Cao,E. Felten et al.,
Prefetching method to reduce I/O latency
9 /27
Related work
Zaharia
et al.,
LATE(Longest Approximation Time to End)
More
efficiently in the shared environment
Drayd(Microsoft)
Can be expressed as direct acyclic graph
The
degree of data locality is highly related to the
MapReduce performance
10 /27
Design(Prefetching Scheme)
Assigned input split for map task
Computation
In progress
Prefetching
In progress
Fig.1. The intra-block prefetching in Map Phase
Expected data for reduce task
Computation
In progress
Prefetching
In progress
Fig.2. The intra-block prefetching in Reduce Phase
Intra2)-block prefetching
Bi-directional processing
A simple prefetching technique that prefetches data within a single
block while performing a complex computation
2)Intra : 안 내부
11 /27
Design(Prefetching Scheme)
While
a complex job is performed in the left
side, the to be-required data are prefetched and
assigned in parallel to the corresponding task
Advantage
of Intra-block prefetching
1. Using the concept of processing bar that monitors
the current status of each side and invokes a signal if
synchronization is about to be broken
2. Try find the appropriate prefetching rate at which the
performance can be maximized while minimizing the
prefetching overhead
Can
be minimize the network overhead
3)At which : when, where
12 /27
Design(Prefetching Scheme)
1
n1
2
n2
n3
block
block
block
block
block
block
D=1 D=5 D=8
3
Inter-block prefetching
runs in block level, by prefetching the expected block replica4) to a
local rack
4)replica : 복제본
• A2, A3, A4 is prefetching the required blocks D=Distance
13 /27
Design(Prefetching Scheme)
Inter-block prefetching
runs in block level, by prefetching the expected block replica4) to a
local rack
4)replica : 복제본
• A2, A3, A4 is prefetching the required blocks
14 /27
Design(Prefetching Scheme)
Inter-block
prefetching
processing Algorithm
1. Assign map task to the
node that are the nearest to
the required blocks
2. The predictor generates
the list of data blocks, B, to
be prefetched for the target
task t
15 /27
Design(Pre-Shuffling Scheme)
Pre-Shuffling
processing
The pre-shuffling module
in the task scheduler
looks over input split or
candidate data in the map
phase, and predicts which
reducer the key-value
pairs are partitioned into.
16 /27
Design(Optimization)
LATE(Longest Approximation
Time to End) algorithm
How to robustly perform
speculative execution to maximize
performance under heterogenous
environment
Did not consider data locality that
can accelerate the MapReduce
computation further
D-LATE(Data-aware LATE)
algorithm
Almost the same LATE, except that
a task is assigned as nearly as
possible to the location where the
needed data are present
17 /27
Implementation – Optimizer
scheduler)
Optimized
scheduler
Predictor module
Not
only finds stragglers, but
also predicts candidate data
blocks and the reducers into
which the key-value pairs are
partitioned
D-LATE
These
predictions, the
optimized scheduler perform
the D-LATE algorithm
18 /27
Implementation – Optimizer
scheduler)
Prefetcher
To Monitor the status of worker
threads and to manage the
prefetching synchronization
with processing bar
Load
Balancer
Check the logs(include dis usage
per node and current n/w traffic
per data block)
Invoke to maintain load
balancing based on disk usage
and n/w traffic
19 /27
Evaluation
Two dual-core 2.0Ghz AMD, 4GB main memory
400GB ATA Hard disk drives
Gigabit Ethernet n/w interface card
The entire nodes are divided in to 40racks which are connected with
L3 routers
Yahoo! Grid which consists of 1670 nodes
All test configured that HDFS maintains four replicas for each data
block, whose size is 128MB
Three type of workload ; wordcount, search log aggregator, similarity
calculator
20 /27
Evaluation
Fig7, We can observe that HPMR
shows significantly better
performance than the native Hadoop
for all of test sets
Fig8, #1 : smallest ratio of number of
nodes to the number of map tasks.
#5 : due to significant reduction
in shuffling overhead
21 /27
Evaluation
The prefetching latency is
affected by disk overhead or n/w
congestion
Therefore, the long prefetching
latency indicates that the
corresponding node is heavily
loaded
Prefetching rate increases
beyond 60%
22 /27
Evaluation
This means that HPMR assures
consistent performance even in the
shared environment such as
Yahoo!Grid where the available
bandwidth fluctuates severely.
4Kbps ~ 128Kbps
23 /27
Conclusion
Two
The prefetching scheme
innovative schemes
Exploits data locality
The pre-shuffling scheme
Reduce the network overhead required to shuffle key-value
pairs
HPMR
is implemented as a plug-in type component
for Hadoop
HPMR improves the overall performance by up to
73% compared to the native Hadoop
Next, step we plan to evaluate a more complicated
workload such as HAMA(Open-source Apache
incubator project)
24 /27
Appendix : MapReduce Example
MapReduce
Example : Weather data set 분석
하나의 레코드는 라인 단위로 저장되며, 이때 저장 타입은 ASCII 형태
하나의 파일에서 각 필드는 구분자없이 고정길이로 저장되어 있음
레코드 예제) 0057332130999991950010103004+51317+028783FM12+017199999V0203201N00721004501CN0100001N9-01281-01391102681
질의
1901년 ~ 2001년 동안 작성된 NCDC 데이터 파일들로부터 각 년도별 가장 높은
기온(F)을 측정하라
Input:
1st Map:
2nd Map:
Shuffle:
Reduce:
Chunk(64MB) 단위
파일로부터
각 레코드로부터
연도별 데이터 그룹
최종 결과
데이터 파일
<offset, 레코드>추출
<연도, 기온> 추출
으로 정리
병합 및 반환
25 /27
Appendix : MapReduce Example
1st Map : 파일에서, <Offset, Record> 추출
<Key_1, Value> = <offset, record>
<0,
0067011990999991950051507004...9999999N9+00001+99999999999...>
<106, 0043011990999991950051512004...9999999N9+00221+99999999999...>
<212, 0043011990999991950051518004...9999999N9-00111+99999999999...>
<318, 0043012650999991949032412004...0500001N9+01111+99999999999...>
<424, 0043012650999991949032418004...0500001N9+00781+99999999999...>
...
연
도
2nd Map : 각 레코드별 Year, Temp 추출
기온
<Key_2, Value> = <year, Temp>
<1950, 0>
<1950, 22>
<1950, −11>
<1949, 111>
<1949, 78>
…
26 /27
Appendix : MapReduce Example
Shuffle
2nd Map의 결과가 너무 많기 때문에, 이를 각 연도별 데이터 그룹으로 다시 정리
Reduce 과정에서 병합시, 처리 비용 감소
2nd Map
<1950,
<1950,
<1950,
<1949,
<1949,
0>
22>
−11>
111>
78>
Shuffle
<1949, [111, 78]>
<1950, [0, 22, −11]>
Reduce : 모든 Map의 후보집합을 병합하여 최종 결과 반환
Mapper_1
(1950, [0, 22, −11])
(1949, [111, 78])
Mapper_2
Reducer
(1950, [0, 22, −11, 25, 15])
(1950, 25)
(1949, [111, 78, 30, 45])
(1949, 111)
(1950, [25, 15])
(1949, [30, 45])
27 /27
Appendix : Hadoop the Definitive
Guide p19~20
1
2
3
4
28 /27