The Yin and Yang of Processing Data Warehousing Queries on

Download Report

Transcript The Yin and Yang of Processing Data Warehousing Queries on

Two opposite forces that are interconnected
and interdependent in natural world
The Yin and Yang of Processing Data
Warehousing Queries on GPUs
Yuan Yuan, Rubao Lee, Xiaodong Zhang
The Ohio State University
7/7/2015
1
GPUs: Powerful and Programmable
Performance
(GFLOPS)
After 10 years’ R&D, GPUs have evolved from dedicated
graphics processors into high performance, general
purpose computing devices
7/7/2015
2002
2012
2
GPUs In High Performance Computing
No. 1 Titan:
261,632 NVIDIA Cores
7/7/2015
3
GPUs: massive parallel computing units
DW workloads: rich data parallelism
A decade of research efforts in database community
[SIGMOD03] [VLDB04] [SIGMOD04] [SIGMOD06] [SIGMOD08]
[SIGMOD10] [VLDB10] [VLDB11] [VLDB12]
7/7/2015
4
GPUs in DW Production Systems?
None!
7/7/2015
5
Why such general purpose GPUs have not been adopted
for critical query processing?
Query
Characteristics
Software
Techniques
Hardware
Advancement
7/7/2015
6
Query Processing on GPUs
CPU
Core
Core
Core
Core
Core
Core
Core
Core
D Host
A Memory
T A
GPU
Device
Kernel
Execution
PCIe
Device Memory
PCIe Data Transfer
7/7/2015
Yang
Yin
7
Experimental Environment
• Hardware
– CPU Intel Core i7 3770k
– NVIDIA GTX 480, 580 and 680
• GPU Query Engine Prototype
– Automatic translator from SQL to highly optimized
CUDA programs (based on YSmart)
– Column store
• Workload
– Star Schema Benchmark
7/7/2015
8
Unbalanced Yin and Yang of SSBM
Transfer
Kernel
450
Execution TIme(ms)
400
350
300
Most queries are dominated by
PCIe data transfer
Kernel execution time varies greatly
250
200
150
100
50
0
7/7/2015
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3 9
• Understand “Where does time go?”
• How do different query characteristics
affect query performance?
• How does GPU hardware advancement
affect the performance?
• How do software optimizations affect
the performance?
7/7/2015
10
GPU Hardware Parameters
GTX 480
GTX 580
GTX 680
Year
2010
2011
2012
Architecture
Fermi
Fermi
Kepler
# of Cores
480
512
1536
GFLOPS
1345
1581.1
3090.4
Memory BW(GB/s)
177.4
192.4
192.3
PCIe
2.0
2.0
3.0
7/7/2015
11
Limited Performance improvement by GPU Arch
480
580
680
1.2
1
0.8
0.6
0.4
0.2
0
1.1
7/7/2015
1.2
1.3
2.1
2.2
2.3
3.1
3.2
3.3
3.4
4.1
4.2
4.3
12
More Performance Improvement by PCIe Bandwidth
PCIe 2.0
PCIe 3.0
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1.1
7/7/2015
1.2
1.3
2.1
2.2
2.3
3.1
3.2
3.3
3.4
4.1
4.2
4.3
13
Performance Prediction
Base(Today's GPU)
Kernel (x2)
Transfer (x2)
350
Execution Time (ms)
300
250
10%-15% improvement
30%-35% improvement
200
150
100
50
Limited benefits from near future GPU advancement
0
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3
7/7/2015
14
Software Optimization
• Techniques
– Data Compression
– Invisible Join
– Transfer Overlapping
7/7/2015
15
The Impact of Compression on SSBM
Effective: high selectivity, and less projected columns
baseTransfer
compressTransfer
Execution Time (ms)
250
compressKernel
Ineffective:
Ineffective:
high
extreme
selectivity,
low and
with projected
selectivity
dim columns
Reduced PCIe transfer time
200
baseKernel
150
100
50
0
1.1
7/7/2015
1.2
1.3
2.1
2.2
2.3
3.1
3.2
3.3
3.4
4.1
4.2
4.3
16
The Impact of Invisible Join on SSBM
Effective: more projected columns from dimension table, with high selectivity
baseTransfer
300
inviTransfer
baseKernel
Effective:
Ineffective:
high
extreme
Selectivity
low
projected
selectivity
dim columns
250
Execution Time(ms)
inviKernel
No effect on PCIe transfer
200
150
100
50
0
1.1
7/7/2015
1.2
1.3
2.1
2.2
2.3
3.1
3.2
3.3
3.4
4.1
4.2
4.3
17
The Impact of Transfer Overlapping on SSBM
Effective: low selectivity, more projected column from fact
table
450
Execution Time (ms)
400
350
base
Overlapping
Ineffective: high selectivity
Effective: extreme low
selectivity
300
250
200
150
100
50
0
7/7/2015
1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3 3.4 4.1 4.2 4.3
18
So, why GPUs not adopted in DWs?
1. Complicated and subtle choices of query
optimization techniques.
2. Limited usage of GPU hardware resources and
unlikely benefit from GPUs advancement due
to unbalanced Yin and Yang.
3. Lack of efficient system software support for
memory management and task concurrency.
7/7/2015
19
• Data
Take Home Messages
– Schema design should take into account GPU features
and avoid data alignment issues.
• Software
– Compression, invisible join and transfer overlapping are
the most effective techniques for GPU query processing,
but they favor different kind of queries
• Hardware
– Query performance are bounded by PCIe bandwidth and
GPU device memory bandwidth, but limited benefits
from the advancement of GPU hardware.
7/7/2015
20
Thank you!
Questions?
7/7/2015
21