Repairing Write Performance on Flash Devices

Download Report

Transcript Repairing Write Performance on Flash Devices

Repairing Write Performance on
Flash Devices
Radu Stoica‡, Manos Athanassoulis‡, Ryan Johnson‡§,
Anastasia Ailamaki‡
‡Ecole
Polytechnique Fédérale de Lausanne
§Carnegie Mellon
Tape is Dead, Disk is Tape, Flash is Disk*
 Slowly replacing HDDs (price
, capacity
)
Fast, reliable, efficient

Potentially huge impact
Slow random write

Read/write asymmetry
-> not a HDD drop-in replacement
*Jim Gray, CIDR 2007
2
DBMS I/O today
Request
Data requirements
DBMS
HDD optimized
I/O pattern
Block Device
API
Flash optimized
I/O pattern
Flash device
Flash memory access
Inadequate device abstraction
Flash devices are not HDD drop-in replacements
3
Random Writes – Fusion ioDrive
Microbenchmark – 8 kiB random writes
Throughput (MiB/s)
350
300
Average over 1s
Moving average
250
20
200
10
150
0
80000
100
80200
50
0
0
5
10
Time(hours)
94% performance drop
15
20
Unpredictability
4
Stabilizing Random Writes
 Change data placement


Flash friendly I/O pattern
Avoid all random writes
 Minimal changes to database engine
 6-9x speedup for OLTP-like access patterns
5
Overview
 Random Write: how big of a problem?
 Random Write: why still a problem?
 Append-Pack Data Placement
 Experimental results
6
Related work
Request
DBMS
Data requirements
Flash-opt.
DB Algs.
HDD optimized
I/O pattern
Block Device
API
Flash FS
Flash optimized
I/O pattern
Data
placement
Flash device
FTL
Flash memory access
No solution for OLTP workloads
7
Random Write – Other devices
Vendor advertised performance
1000
Rand Read
Rand Write
kIOPS
100
Rand. Write « Rand. Read
10
1
0.1
Intel X25-E
Memoright
GT
Solidware
Fusion
ioDrive
100
Response time (ms)
rt
Mtron SSD
10
Pause length
Rand. Write causes
unpredictability
1
Seq. Reads
Seq.
Reads
Random Writes
Random
Writes
Seq. Reads
Seq.
Reads
0.1
0
5000
IO number
IO number
10000
13000
*Graph from uFlip, Bouganim et al. CIDR 2009
8
Random Writes – Fusion ioDrive
Microbenchmark – 8 kiB random writes
Throughput (MiB/s)
350
300
Average over 1s
Moving average
250
200
150
100
50
0
0
5
10
Time(hours)
15
20
9
Sequential Writes – Fusion ioDrive
Microbenchmark – 128kiB sequential write
Throughput (MiB/s)
350
300
250
200
150
100
50
Average over 1s
Moving average
0
0
200
400
600
800
1000
1200
Time(s)
Seq. Writing: Good & Stable Performance
10
Idea – Change Data Placement
 Flash friendly I/O pattern


Avoid all Random Writes
Write in big chunks
 Tradeoffs – additional work:



Give up seq. reads (SR and RR similar performance)
More seq. writing
Other overheads
11
Overview
 Random Write: how big of a problem?
 Random Write: why still a problem?
 Append-Pack Data Placement

Theoretical model
 Experimental results
12
Append-Pack Algorithm
Update
page page
Update page
Update
No more space
Write hot
dataset
Write seq.
Reclaim space
No in-place
updates
Filter cold pages
Write cold
dataset
Reclaim space
Valid page
Invalid page
How much additional work?
Log start
Log end
13
Theoretical Page Reclaiming Overhead
 Update pages uniformly


Equal prob. to replace a page
# valid pages?
prob(valid)
40
sizeof (disk)
α=
sizeof (hotset)
prob(valid) = f (α) → e
30
-α
20
10
0
1
2
3
4
α
5
Worst case: 36% Easily achievable: 6-11%
6
7
14
Theoretical Speedup
 Traditional Random Write I/O latency: TRW
 New latency: TSW+prob(valid)∙(TRR + TSW)
 Conservative assumption: TRW = 10∙TSW
Speedup
8
RW=50%
RW=25%
RW=10%
RW=5%
RW=1%
RW=0%
6
4
2
0
1
2
3
4
5
α = sizeof(device) / sizeof(data)
Up to 7x speedup
15
Overview
 RW: how big of a problem?
 RW: why still a problem?
 Append-Pack Data Layout
 Experimental results
16
Experimental setup
 4x Quad-core Opteron
 X86_64-linux v2.6.18
 Fusion ioDrive 160GB PCIe
 8 kiB I/Os, Direct I/O
 Parallel threads ≥ 16
 Firmware runs on host
 Append-Pack implemented as shim library
17
OLTP microbenchmark
Microbenchmark – 50% Rand Write / 50% Rand Read
Throughput (MiB/s)
500
Average over 1s
Moving average
Append-Pack
400
300
200
FTL?
100
0
0
1000
3000
4000
0
9x improvement
Time (s)
1000
3000
4000
18
OLTP Microbenchmark Overview
Throughput (MiB/s)
600
α=2
500
400
Initial
Predicted
Actual
300
200
100
0
50/50
75/25
90/10
Random Read/Write Mix (%)
Performance better than predicted
19
What to remember
 Flash ≠ HDD
 We leverage


Sequential Writing to avoid Random Writing
Random Reading as good as Sequential Reading
 Append-pack – eliminate Random Writes

6-9x speedup
20
Thank you!
http://dias.epfl.ch
21
Backup
22
FTLs
 Fully-associative sector translation [Lee et al. ’07]
 Superblock FTL [Kang et el. ‘06]
 Locality-Aware Sector Translation [Lee et al. ‘08]
No solution for all workloads:



Static tradeoffs & workload independence
Lack of semantic knowledge
Wrong I/O patterns
-> complicated software layers destroy predictability
23
Other Flash Devices - Backup
Device
RR (IOPS)
RW (IOPS)
SW (MB/s)
SR (MB/s)
Intel x25-E
35,000
3,300
170
250
Memoright GT
10,000
500
130
120
Solidware
10,000
1,000
110
110
Fusion ioDrive
116,046
93,199
(75/25 mix)
750
670
Vendor advertised performance
25
Experimental Results - Backup
RR/RW
Baseline
Append/Pack
Speedup
Prediction
50/50
38 MiB/s
349 MiB/s
9.1
6.2
75/25
48 MiB/s
397 MiB/s
8.3
4.3
90/10
131 MiB/s
541 MiB/s
4.1
2.5
(α = 2 in all experiments)
26
OLTP microbenchmark - Backup
50% RW/50% RR - before
50% RW/50% RR - after
27
OLTP Microbenchmark - Backup
Traditional I/O
28
OLTP Microbenchmark - Backup
Append-Pack
29