Transcript Talk Slides

Online Balancing of
Range-Partitioned Data
with Applications to P2P Systems
Prasanna Ganesan
Mayank Bawa
Hector Garcia-Molina
Stanford University
1
Motivation

Parallel databases use range partitioning
Key Range

0
20
35
60
80
100
Advantages: Inter-query parallelism
– Data Locality  Low-cost range queries  High
thru’put
2
The Problem

How to achieve load balance?
– Partition boundaries have to change over time
– Cost: Data Movement

Goal: Guarantee load balance at low cost
– Assumption: Load balance beneficial !!

Contribution
– Online balancing -- self-tuning system
– Slows down updates by small constant factor
3
Roadmap





Model and Definitions
Load Balancing Operations
The Algorithms
Extension to P2P Setting
Experimental Results
4
Model and Definitions (1)

Nodes maintain range partition (on a key)
– Load of a node = # tuples in its partition
– Load imbalance σ = Largest load/Smallest
load

Arbitrary sequence of tuple inserts and
deletes
– Queries not relevant
– Automatically directed to relevant node
5
Model and Definitions (2)

After each insert/delete:
– Potentially fix “imbalance” by modifying partitioning
– Cost= # tuples moved

Assume no inserts/deletes during balancing
– Non-critical simplification

Goal: σ < constant always
– Constant amortized cost per insert/delete
– Implication: Faster queries, slower updates
6
Load Balancing Operations (1)

NbrAdjust: Transfer data between “neighbors’’
A
B
[0,50)
[50,100)
[0,35)
[35,100)
7
Is NbrAdjust good enough?

Can be highly inefficient
– (n) amortized cost per insert/delete ( n=#nodes )
A
B
C
D
E
F
8
Load Balancing Operations (2)

Reorder: Hand over data to neighbor and
split load of some other node
A
[0,5)
[0,10)
B
[10,20)
[5,10)
C
D
[20,30) [30,40)
E
[40,50)
[40,60)
F
[50,60)
9
Roadmap
Model and Definitions
 Load Balancing Operations


The Algorithms
Experimental Results
 Extension to P2P Setting

10
The Doubling Algorithm

Geometrically divide loads into
levels
– Level i  Load in ( 2i,2i+1 ]
– Will try balancing on level change

Two Invariants
– Neighbors tightly balanced
 Max 1 level apart
– All nodes within 3 levels
 Guarantees σ ≤ 8
2i+2
Level i
2i+1
2i
Level 2
Level 1
Level 0
8
4
2
1
Load Scale
11
The Doubling Algorithm (2)
A
B
C
D
E
F
12
The Doubling Algorithm (2)
A
B
C
D
E
F
13
The Doubling Algorithm (2)
A
B
C
D
E
F
14
The Doubling Algorithm: Case 2

Search for a blue node
– If none, do nothing!
A
B
C
D
E
F
15
The Doubling Algorithm: Case 2

Search for a blue node
– If none, do nothing!
A
B
E
C
D
F
16
The Doubling Algorithm (3)

Similar operations when load goes down a
level
– Try balancing with neighbor
– Otherwise, find a red node and reorder
yourself

Costs and Guarantees
–σ≤8
– Constant amortized cost per insert/delete
17
From Doubling to Fibbing

Change thresholds to Fibonacci numbers
– σ ≤ 3  4.2
– Can also use other geometric sequences
– Costs are still constant
Fi+2
=
Fi+1 + Fi
18
More Generalizations

Improve σ to (1+) for any >0 [BG04]
– Generalize neighbors to c-neighbors
– Still constant cost O(1/ )

Dealing with concurrent inserts/deletes
– Allow multiple balancing actions in parallel
– Paper claims it is ok
19
Application to P2P Systems

Goal: Construct P2P system supporting efficient
range queries
– Provide asymptotic performance a la DHTs

What is a P2P system? A parallel DB with
– Nodes joining and leaving at will
– No centralized components
– Limited communication primitives

Enhance load-balancing algorithms to
– Allow dynamic node joins/leaves
– Decentralize implementation
20
Experiments

Goal: Study cost of balancing for different workloads
– Compare to periodic re-balancing algorithms (Paper)
– Trade-off between cost and imbalance ratio (Paper)


Results presented on Fibbing Algorithm (n=256)
Three-phase Workload
– (1) Inserts (2) Alternating inserts and deletes (3) Deletes

Workload 1: Zipf
– Random draws from Zipf-like distribution

Workload 2: HotSpot
– Think key=timestamp

Workload 3: ShearStress
– Insert at most-loaded, delete from least-loaded
21
Load Imbalance (Zipf)
4.5
Growing Phase
Steady Phase
Shrinking Phase
4
Load Imbalance
3.5
3
2.5
2
1.5
1
0.5
0
0
500
1000
1500
Time (x1000)
2000
2500
3000
22
Load Imbalance (ShearStress)
4.5
Growing Phase
Steady Phase
Shrinking Phase
4
Load Imbalance
3.5
3
2.5
2
1.5
1
0.5
0
0
500
1000
1500
Time (x1000)
2000
2500
3000
23
Cost of Load Balancing
6000
Growing Phase
Steady Phase
Shrinking Phase
Cumulative Cost (x1000)
5000
4000
3000
2000
1000
0
0
500
1000
1500
Time (x1000)
2000
2500
3000
24
Related Work

Karger & Ruhl [SPAA 04]
– Dynamic model, weaker guarantees

Load balancing in DBs
– Partitioning static relations, e.g.,
[GD92,RZML02, SMR00]
– Migrating fragments across disks, e.g.,
[SWZ93]
– Intra-node data structures, e.g., [LKOTM00]

Litwin et al. SDDS
25
Conclusions

Indeed possible to maintain well-balanced range
partitions
– Range partitions competitive with hashing

Generalize to more complex load functions
– Allow tuples to have dynamic weights
– Change load definition in algorithms!*
– Range partitioning is powerful

Enables P2P system supporting range queries
– Generalizes DHTs with same asymptotic guarantees
*Lots of caveats apply. Need load to be evenly divisible. No guarantees offered on costs. This offer not valid with any other offers. Etc, etc. etc.
26