Transcript Slides

CoPhy: A Scalable, Portable, and
Interactive Index Advisor
for Large Workloads
Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis
1
High Cost of DB Tuning
• Enterprises spend a lot on DBMS
DBMS Cost
DBA Task Breakdown
1000 $/TB/yr
20
15
Installation
Backup
Other
Tuning & Admin
10
5
0
Disk
HW/SW
DBA
Sources: MS Azure, Forrester Research; 2010
oracle.com/us/products/database/039433.pdf
Need to reduce administration and tuning cost
2
A New Approach to Index Tuning
Index Tuning: Select indexes that maximize performance
Existing Approaches
CoPhy
Portability
No
Yes
Scalability
Sampling/pruning
Yes
Generality
No
Yes
Quality Feedback
Not with constraints
Yes
Interactivity
No
Yes
CoPhy: Convert to a compact Binary Integer Program (BIP).
Solve using mature solvers.
− BIP: Minimize f ( x) s.t. g ( x)  0 where f and g are linear, x {0,1}n
Outline
• Introduction
• BIP formulation
–
–
–
–
Existing formulation
Discovering structure
Exploiting the structure
Benefits
• Experimental Results
• Conclusion
4
Index Tuning Problem
T1 Join T2
Workload
Candidates
T1
T2
I1 I2
Index
Tuning
I3 I4
Constraints
Optimal
Indexes
? ?
Existing Approaches
Index
Advisor
Fast What-If
Optimizer
[INUM07,C-PQO08]
• Greedy approaches
• Bottom-up [CN97, VZ+00]
• Top-down [BC05]
• BIP-based approach [CS96, PA07]
6
What-If
DBMS
Optimizer
Existing BIP
T1
t1
T2
t2
I1 I2
I3 I4
x 1x 2
x3 x4
{0,1}
A1 , A2 , ... are atomic configurations
C1 , C2 , ... are corresponding costs
{0,1}
minimize 1C1   2C2  ...
Min. Cost
Sucht that :
1   2   1 Select one atomic conf.
Index presence
11t11,
1 
t1 t2 1, t2  1
...
Program size = O(# T1 Indexes x # T2 Indexes)
7
CoPhy vs. Existing Approaches
Index
Advisor
Fast What-If
Optimizer
What-If
DBMS
Optimizer
[INUM07,C-PQO08]
CoPhy
Index
Advisor
Fast What-If
Optimizer
8
What-If
DBMS
Optimizer
Fast What-If: INUM
Plan
What-If
Optimizer
T1 Join T2
Template Plan
Place Holder
Place Holder
I1 I3
I1 I4
Instantiated
Plan
I1
Instantiated
Plan
I1
I3
I4
A template plan can be reused for many index combinations
9
Cost Structure
Cost of template plan under A


i[1, n ], a  A[ i ]
 ia
Atomic
Configuration
A
Cost of optimal plan under A
min{k 

i[1,n ],a  A[i ]
 kia }
Linear Composability of Query Costs
Linear Composability is exhibited by both INUM, C-PQO
10
Exploiting Linear Composability
minimize   t1 11  x1 12  x2 13
 t2 21  x3 22  x4 23
t1
T1
T2
t2
I1 I2
I3 I4
x 1x 2
x3 x4
Such that :
t1  x1  x2  1
t2  x3  x4  1
Program size = O(# T1 Indexes + # T2 Indexes)
BIP Solver explores the index combinations
with the knowledge of the objective
Exposing the cost model leads to linearly growing BIPs
11
More Complex BIPs
We extend the BIP to handle:
• Complex queries
• Update costs
• Complex Constraints [Bruno08]:
–
–
–
–
–
Storage constraint
Index constraints
Column constraints
Generators
Soft constraints
BIP formulation does not restrict the expressive power of the DBA
12
CoPhy’s Architecture
Workload
Candidate
Generator
BIP
Generator
BIP Solver
INUM
What-If
DBMS
Optimizer
Constraints
CoPhy
Selected
Bounds
Indexes
Theorem: CoPhy computes an optimal index configuration
13
Unique Features Enabled by the BIP
• Portability: No change to the optimizer
– Requires only the what-if APIs
• Scalability: By solving large BIPs in seconds
– No need to select workload, candidate indexes
• Generality: The formulation can be reused
• Quality feedback: All modern BIP solvers provide this
– Can stop at near-optimal values
• Interactive tuning: By solving BIPs incrementally
– Interactively add/drop candidate indexes
– Enables efficient multi-objective optimization
14
Outline
• Introduction
• BIP formulation
–
–
–
–
Existing formulation
Discovering structure
Exploiting the structure
Benefits
• Experimental Results
• Conclusion
15
Experimental Setup
•
•
•
•
Two commercial DBMS -- SystemA, SystemB
1 GB TPC-H database
1 GB index size constraint
Algorithms:
– ToolA, ToolB – the commercial designers
– ILP – The state of the art BIP [PA07]
– CoPhyA, CoPhyB – Our approach on the systems
• Queries generated using 15 TPC-H templates
• Metric:
Cost with Suggested Indexes
Savings  1 
Original Cost
16
Better
Speedup Comparison
ToolA
ToolB
CoPhyA
94
100
97
94
CophyB
97
94
97
% Savings
80
61
60
40
35
61
32
61
29
20
0
250
500
1000
Workload Size on SystemA
250
500
1000
Workload size on SystemB
# of queries
SystemA
# of queries
SystemB
# Candidates: CoPhy ~2000, ToolA ~200, ToolB ~50
• Replacing heuristic algorithms improves savings
• Using larger set of candidates also helps
17
Tool Execution Time Comparison
ToolA
CoPhyA
Execution time (minutes)
1000
ToolB
419
ILP
CoPhy
CophyB
66.1
100
39.9
Better
23.1
11.8
10
6.2
6.1
8.3
4.8
3.2
2
2.2
1.2
1
8.3
4.8
2
1
250
500
1000
Workload on SystemA
# of queries
SystemA
250
500
1000
250
500
1000
Workload on SystemB
Workload on SystemA
# of queries
SystemB
# of queries
SystemA
Scalable index tuning eliminates the workload selection problem
18
Conclusion
• Index tuning using a novel compact BIP
–
–
–
–
Generic, scalable, efficient, and high quality
Quality feedback
Incremental index selection
Multi-objective optimization
• Future Work:
– Incorporating other workload types
– Applying the approach to other tuning problems
19
Backup Sildes
20
BIP for Multiple Plans
C1  x11 111  x12 112  x13 113  x14 114  25 y1
x11  x12  y1
x13  x14  y1
y1
25
Matching logic
C1  x23 211  x24 212  x23 213  x14 214  25 y2
x23  x24  y2
T1
T2
I1 I2
I3 I4
x25  x26  y2
y1  y2  ...  1
One plan per query
Minimize C1  C2
y2
15
Minimize cost
x23x24
21
x25x26
More Complex BIPs
• Storage constraint
x11  x1
Build indexes when used
.....
s x  S
i i
Size under a fixed constant
22
CoPhy vs. FLP
2399
Better
Tool execution time
(seconds)
2500
2000
Solving
Build Time
1379
1500
1000
INUM
710
500
499
293
123
0
FLP
CoPhy
250
FLP
CoPhy
500
# of queries
FLP
CoPhy
1000
Offloading the search process to the solver improves both the
problem construction and solving times
23