Transcript slides

dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Query Task Model (QTM):
Modeling Query Execution with
Tasks
Steffen Zeuch and Johann-Christoph Freytag
1
Motivation
✤
✤
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Different DBMS execute the same QEP using different schedules
✤
Run-time execution not query optimization
✤
No uniform scheduling format
✤
Query execution in different DBMS are not comparable
Major differences between DBMS:
✤
Chunk Size: Size of operator’s input
Execution
model vs.comparable
run-time schedulerto
HowScheduling
to makeStrategy:
different
schedules
explain
why one schedule performs better than another
?
✤
2
Outline
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
1.Parallel Query Execution
2.QTM: Query Task Model
3.Evaluation
4.Outlook
3
dbis
Chunk Size
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Tupleat-a-time
Selection
t1
t2
t3
t4
t5
t6
Columnat-a-time
Bufferat-a-time
t1
t1,t2,t3
t4, t5, t6
t1,t2, t3
Chunk Size
DBMS
1 Tuple
System R, MySQL,
(PostgreSQL)
“Fit into Cache”
Monet X100, DB2 with
BLU
Fix number of tuples
Hyper
Fix Block Size
C-Store
Column
MonetDB MIL
4
dbis
Scheduling Strategie
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Hash
Probe (R)
Hash
Probe (S)
Hash
Build
Hash
Build
Selection
R
S
T
5
Volcano Execution Model
(Open-Next-Close Iterator)
Tuple
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Hash
Probe (R)
Next
Hash
Probe (S)
Tuple
Hash
Build
Hash
Build
Next
Selection
Next
Tuple
R
S
T
6
dbis
(Run-time) Scheduler
Hash
Probe (R)
Hash
Probe (S)
Selection
T
Time
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Spatial
Locality
Temporal
Locality
Prob_R(t2)
Prob_R(t2)
Prob_R(t1)
Prob_S(t2)
Prob_S(t2)
Sel(t2)
Prob_S(t1)
Prob_R(t1)
Sel(t2)
Prob_S(t1)
Sel(t1)
Sel(t1)
Further Optimiziation Criteria:
I/O, NUMA or Memory Usage
7
dbis
Dynamic Load Balancing
⋈
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
T2
T1
⋈
T3
T4
T5
σ
σ
R
S
T
CPU1
CPU2
T1
T2
T3
T4
T5
8
dbis
Chunk Size
DBMS Landscape
Column-at-a
time
Buffer-at-a
time
Tuple-at-a
time
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
MonetDB MIL
MonetDB X100
DB2
DB2 BLU
PostgreSQL
StagedDB
Hyper
System R
MySQL
PostgreSQL
Volcano
Execution
Model
(Run-time)
Scheduler
SAP HANA
Dynamic
Load
Balancing
Scheduling Strategy
9
Outline
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
1.Parallel Query Execution
2.QTM: Query Task Model
3.Evaluation
4.Outlook
10
QTM: Query Task Model


dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Idea: A model that describes parallel query execution with
tasks

QEP: Queue of tasks

Task: Encapsulate a piece of work on some data
Goals:

Open a design space for DBMS schedules

Make main aspects of query scheduling comparable:

Execution order, degree of parallelism and thread
coordination, and partitioning
11
dbis
Query Task Model
Work
Data
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Task Queue
T1
T2
T3
Processing
Strategies
Table
t1
t2
t3
Data Queue
t1
t2
t3
12
QTM Transformation: Input
QEP
Hardware Architecture
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Table Format
13
QTM Transformation
QEP
Choosing
Hash Join
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Max. Pipelines
+
Dependency Graph 14
QTM: Task Configuration
Max. Pipelines
+
Dependency Graph
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Task Configurations
(Task Blueprints)
15
dbis
QTM: Tasks
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Instantiation
Task Configuration
(Task Blueprints)
Set of Tasks
(TC Instantiation)
16
QTM: Implementation
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Compile-time
Run-time
17
Outline
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
1.Parallel Query Execution
2.QTM: Query Task Model
3.Evaluation
4.Outlook
18
Evaluation: Scenario
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Schedule
Workload
Tuples per
Relation
30M
Selection
< 25M
S1 Values
0,1,2 …
S2 Values
0,2,4,…
S3 Values
0,4,8,…
19
dbis
Evaluation: Configuration
Schedule
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Buffer
Size
1
Tasks per
Op
30M
Total Tasks
1
30M
150M
1
30M
150M
4
7.5M
22.5M
5) Buf – L1
2,048
14,649
43,947
6) Buf – L2
16,384
1,832
5,496
7) Buf – L3
491,520
62
186
8) Op - Mat
7.5M
4
20
9) Op - Seq
7.5M
4
20
1) Tup –
Pipe
2) Tup –
Mat
3) Tup –
Seq
4) Buf - CL
90M
20
Evaluation: Runtimes
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
21
Evaluation: Sampling
Data-related Misses
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Instruction-related Misses
22
Evaluation: Miss Distribution dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
23
Evaluation: Scalability
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
24
Evaluation: Insights
✤
✤
✤
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Tradeoff between data and instruction cache performance
✤
✤
dbis
I NSTITUT FÜR I NFORMATIK
Sweet spot: Largest private cache size vs. slightly larger buffer
Medium sized tasks are data-efficient:
✤
Pros: Buffer fits entirely into cache, high data locality
✤
Cons: High number of tasks and instructions
Large tasks are instruction-efficient:
✤
Pros: Decrease number of instructions and tasks, high instruction
locality
✤
Cons: More data cache misses if cache size is exceeded
QTM: Cache-performance can be adjusted by buffer size
25
Outline
dbis
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
1.Parallel Query Execution
2.QTM: Query Task Model
3.Evaluation
4.Outlook
26
dbis
Outlook
✤
✤
I NSTITUT FÜR I NFORMATIK
HUMBOLDT− UNIVERSITÄT ZU BERLIN
Contributions:
✤
QTM: A model for parallel query execution using tasks
✤
Open a design space for DBMS schedules
✤
Make different schedules present in different DBMS
comparable
Future Work:
Thanks!
✤
Cost Model
✤
Transformation process for an arbitrary QEP
27