Presentation Title

Transcript Presentation Title

Freakish Database Performance With Flash Storage 1

[email protected]

763.228.6463

IBM Internal Use Only

Agenda

• Share some experience with using solid state/ flash storage for database workloads: • OLTP (2TB) • Warehouse (76TB) • Which workload characteristics can best leverage flash storage?

• What are some best practices

IBM Internal Use Only

• • • •

The Challenge

Goal: 1.5 M trans in 5.5 hours Stretch goal: 1.5M trans in 2 hours Improve backup time What is possible if CPU, memory, storage, and network are not constrained?

OLTP Workload

Initial profile

• A brokerage house package • Batch cycle comprised of five Java programs (only one can be parallelized) • 1.5M transactions in 8 hours after extensive application and SQL tuning • 1.68TB uncompressed • Online backup time (backup, then gzip) in 36 hours

IBM Internal Use Only

No holds barred

• 2 x 64 cores, 3.86 GHz, 1TB RAM • 86TB HDD, 256GB cache – 2 ms average response time • 1TB SSD • 10GbE

The Setup

• • • • •

Approaches

Enabled compression No database tuning All-HDD Mixed – SSD (logs & temp), HDD (data & indexes) All-SSD

IBM Internal Use Only

5 Accept all default database settings out of the box

• STMM • • Auto runstats Auto online table reorg Results Mixed SSD (logs & temp) & HDD (data, indexes) All – SSD Disk Utilization Average IOPS Throughput Application Engines Uncompressed offline backup Compressed online/offline backup (SSD to HDD) 14% 26% < 1% busy 20 450KB/s 30 30 – 40 min 18 min

IBM Internal Use Only

6 Application Engines Performance

• • • • Most improvements resulted from more CPUs for the application CPU intensive Verbose application logging • Application logs generated more IOs than database!

More application engines generating transactions to reduce batch elapsed time Low database IO profile

IBM Internal Use Only

7 Final Results

Results Goal: 1.5M trans in 5.5 hours Stretch goal: 1.5M trans in 2 hours Improve backup time: 18 minutes v. 36 hours Best result: 1.5M trans in 1.1 hours! (All SSD)

Y Y Y Y

IBM Internal Use Only

• • •

The Challenge

Aging servers and storage Data center floor space, cooling, and power consumption constraints Same or better performance

Warehouse Workload

Initial profile

• Servers and storage running 100% all day long • Maxed out at around 30 – 40 active users • Half-stroked disks to get performance and throughput

IBM Internal Use Only

The Setup

•

Approach

Replacement will be very fast, very small, very simple

IBM Internal Use Only

10 Database IO Improvement for Warehouse Workload 76TB IBM SSD v. Old HDD Sub-millisecond IO response time Sustained

• • • • •

Synchronous reads Synchronous writes Asynchronous reads Asynchronous writes Data pages per asynchronous request 21.8x

13.6x

17.6x

18.34x

1.8x

Note: Asynchronous IOs are ~18x faster, each asynchronous request is ~2x more effective due to 32K page size, that is a 36x improvement.

IBM Internal Use Only

11 Benchmark Queries Improvement for Warehouse Workload 76TB IBM SSD v. Old HDD

Benchmark details • Actual IO and CPU intensive queries captured from business users • Runs weekly to monitor any performance degradation with respect to new and organic growth in the warehouse over time • Noise queries (75) + benchmark queries (25) = 100 Noise queries completed BM queries completed CPU utilization

All SSD

85% 100% (first time ever) 30%

Old

32% 64% (historically never reached 100) 100%

IBM Internal Use Only

Benchmark Queries Speed Up Factor for Warehouse Workload (Plotted on Logarithmic Scale ) 76TB IBM SSD v. Old HDD 12

Speed up details • Average: 2.21 (log) or 163.96x faster • Median: 1.48 (log) or 29.96x faster (50% is at least ~30x faster) • Low: 0.56 (log) or 3.59x faster • High: 3.05 (log) or 1,113.56x faster • Time is measured as elapsed time (prepare + execute + fetch)

IBM Internal Use Only

CPU Utilization 13

About 30% busy … BTW … We are also using disk level encryption (SED)

IBM Internal Use Only

14 EXP30 Ultra SSD IO Specifications

• Each drive: SFF (1.8”), 1/5 of 1U, 387GB • IO drawer: 30 drives (6 x 5). Total raw capacity: 11.6TB (30 x 387GB). Cache: 3.1GB

• IOPS: 400K (100% read) / 280K (70/30 R/W) / 165K (100% write) • Two POWER 740 servers connected to one IO drawer • PCIe attached via GX++ adapter (8Gb/s) • Configured as 5+p LUNs (130GB LUNs)

IBM Internal Use Only

Deployment Considerations

• IO adapter card (HBA) • At 120K – 400K IOPS per IO drawer, and 32K IO size, it is possible to saturate the HBA • Plan for adequate number of HBAs • If using SAN then be sure the bandwidth to the storage server is consistent along the whole path, for example, 8Gb/s • Balance IOs across HBAs and front end ports for even utilization • Be cautious about mixing flash storage & HDD drives in one HBA • Fewer, larger LUNs (500GB– 700GB) • LUNs do take up available system memory and CPU cycles on the server • Multiple logical volumes per LUN, no reason to stripe LV across LUNs • Use large page size (32K), extent size, but ensure that the database bufferpool(s) are adequately sized to accept big reads • Optimize data movement with less IOPS. It is not about driving up IOPS

IBM Internal Use Only

Candidate Application Considerations

• High IO profile • Indexes, data • Database logs and temp spaces can take advantage of cache write through already, may not be the best candidates • Applications that can parallelize well to take advantage of higher IO throughput • Before we can process more transactions per second the applications need to be able to generate more transactions per second • For example, we needed to increase the number of application engines from 3 to 30 in order to generate 8x throughput in transaction rate • Applications that spend more time fetching result sets across a network, rather than executing complex queries in the database, will likely see less improvement (slow consumers) • client_idle_wait_time (ms) (time spent waiting for client/application to send its next request) • If the database spends more time waiting for client/application to send work then improving database response time alone will not improve throughput. • Increase application parallelism • Look for network congestion issues • call monreport.dbsummary(600), examine client_idle_wait_time

IBM Internal Use Only

17 Why Consider Flash Storage

• Greatly beneficial for high IO workloads • Much smaller footprint, much more energy efficient • Servers (11), IO drawers (7), power supply all fit in one rack!

• Achieve high performance, and throughput quickly without tuning • Performance, reliability, price

Presentation Title

Transcript Presentation Title

[email protected]

763.228.6463

IBM Internal Use Only

Agenda

IBM Internal Use Only

The Challenge

OLTP Workload

Initial profile

IBM Internal Use Only

No holds barred

The Setup

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

The Challenge

Warehouse Workload

Initial profile

IBM Internal Use Only

The Setup

Approach

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

IBM Internal Use Only

Deployment Considerations

IBM Internal Use Only

Candidate Application Considerations

IBM Internal Use Only

IBM Internal Use Only

Directory