Transcript Slide 0

RAC be Nimble, RAC be Quick

Bert Scalzo, Domain Expert, Oracle Solutions [email protected]

Copyright © 2006 Quest Software

About the Author …

Domain Expert & Product Architect for Quest Software

• • Oracle Background: Worked with Oracle databases for over two decades (starting with version 4) Work history includes time at both “Oracle Education” and “Oracle Consulting” • • • • Academic Background: Several Oracle Masters certifications BS, MS and PhD in Computer Science MBA (general business) Several insurance industry designations • • • • • Key Interests: Data Modeling Database Benchmarking Database Tuning & Optimization "Star Schema" Data Warehouses Oracle on Linux – and specifically: RAC on Linux • • • • Articles for: Oracle’s Technology Network (OTN) Oracle Magazine, Oracle Informant PC Week (eWeek) • • • • Articles for: Dell Power Solutions Magazine The Linux Journal www.linux.com

www.orafaq.com

1

Books by Bert …

Coming in 2009 …

Out Now … 2 nd Edition Coming Soon

• • •

Agenda

RAC Challenges

“RAC in the Box”

 syndrome – Must Optimize every subsystem

RAC Optimization Approach

– Optimization Approach Universe – Why a True Top-Down Approach – Top-Down Process & Tools (

but NOT a sales pitch

)

Real-world Scenario

– Dell’s use of Quest’s Approach & Tools for Oracle RAC –

“Best Practices”

applied incrementally & results 3

Oracle RAC is Great, but …

• Too often people expect RAC to “auto-magically” function out of the box with little to no optimization • During RAC optimization attempts, people far too often concentrate on just a single dimension – typically the Oracle database “stuff” (i.e. ratios, waits, parms, etc…) • • During RAC optimization attempts, Oracle is often too heavily weighted as the primary source and reason for most, if not all, of the performance bottleneck issues Not enough true “application nature” is identified and accounted for during the overall optimization process •

Result: Too many people achieve sub-par results!

4

• • • • • •

I call it “RAC in the BOX” syndrome

 That’s just my stupid name for it (hope it catches) But nearly half the RAC sites I visit are suffering from RAC performance issues related to this!

Still far too RAC experts among the general DBA population (although improving each & every day) Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders” Many people too often and readily bail on RAC, and fall back to “big SMP boxes” (the evil that they know)

But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free

 5

RAC a is System, Must Tune its Entirety

6

RAC Performance = Sum of its Parts

• • • • • • •

Application Nature (affects everything else below) Public Network Storage Network Storage Sub-System Oracle Instance Configuration Oracle Cluster Configuration Private Network (i.e. Interconnect)

Traditional Focus 7

Optimization Approaches (i.e. Focus)

Top-Down

Pro-Active

Bottom-Up Side-Ways (Ad-Hoc)

Re-Active

8

Optimization Approach Techniques/Tools

Top-Down

DBA_, V$ and X$ Aggregate Information

•Oracle OEM Diagnostics & Tuning Packs •Confio Ignite for Oracle •BMC DBXray for Oracle •Quest Spotlight for Oracle & Spotlight for RAC Bottom-Up Trace/Instrumentation Side-Ways (Ad-Hoc) V$, STATS$, DBA_HIST_

Response Time ~= Wait Events

•Cary Millsap (Hotsos) calls this

“Method R”

•Anjo Kolk & et al Oracle) call this

“YAPP Method”

•Kyle Haily (PerfVision) paper on

“Waits Defined”

9

True Top-Down Optimization Focus

• • • •

“Low Hanging Fruit”

Obvious yet Overlooked Subtle yet Highly Critical “Dumb Question” Method Radically Different than other RAC tuning sessions:

Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations,

Just easy stuff that makes a big difference

 10

Why Top Down Works (now don’t laugh…) Drive Dallas to Austin:

Initial route:

•45 south to Houston, 610 west to 290, and then 290 west to Austin •Drive Time = 7 Hours!

Improved Route:

•45 south, cut across 79 west half way to Houston, and then 35 south to Austin •Drive Time = 4.5 Hours •36% Reduction •

Top-Down Route:

•Dumb Question: Why Houston?

•35 South •Drive Time = 3.5 Hours •50% Reduction 11

RAC Performance Testing Process (using tools) 1. Benchmark Factory Industry standard benchmark: TPC-C Key Metric = Avg Response Time Apply Top-Down Analysis & Revision 2. Spotlight on RAC Record before & after results Confirm improvements 3. TOAD with DBA Module AWR/ADDM & Stats Pack Again record before & after for improvements confirmation

12

Tool #1 – Load Generator

• • • •

Benchmark Factory ®

Test Oracle RAC environments rapidly and reliably Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration Tests: – TPC-C – TPC-H – Trace File playback –

etc, etc, etc …

Let’s DBA concentrate on task at hand -

Optimization

13

14

Tool #2 – Ad-Hoc Monitor (to confirm results)

• • • •

Spotlight ® on RAC

Monitor Oracle RAC environments rapidly and reliably Diagnose Oracle RAC environment health levels at – Node – Cluster – ASM – Instance – Interconnect Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks Let’s DBA concentrate on task at hand -

Diagnosing

15

16

17

18

Tool #3 – Ad-Hoc Monitor (to confirm results)

• • • •

TOAD ®

with DBA Module Expedite typical DBA management & tuning tasks Great Productivity Enhancing Features – Database Health Check – Database Probe – Database Monitor –

AWR/ADDM Reports

– UNIX Monitor –

Stats Pack Reports

See Toad World paper – Title: “Maximize Database Performance Via Toad for Oracle” – http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

Let’s DBA concentrate on task at hand –

Correcting (i.e. Fixing)

19

20

21

Real-world Scenario

Quest strategic partner & customer

Dell

uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays

22

DELL Success Story www.Quest.com/success_stories/Dell-Quest.pdf

23

Database Configuration used at DELL for the RAC test environment

DELL Success Story

24

Step-By-Step Example

Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize and quantify the approximate percentage of the improvements

Note – will quote some specific examples for a given RAC setup, your mileage will surely vary Test = TPC-C (OLTP) for 200-2000 users, 10 GB

25

Remember – must tune RAC as system

Start Here Ask lots of Questions …

26

Step 1 - Application Nature

• • • •

Know Your Application Demands

(this info flows downstream) OLTP vs. Data Warehousing – Primarily Read vs. Write – Average Transaction Size – Likelihood of “Dead Lock” – Logging, Flashback and Recovery Requirements –

etc, etc, etc …

Concurrent User Load Profile (i.e. user load over time) Focus on User Response Time Requirements – For example, TPC-C must run each transaction <= 2 seconds – Response Time

~=

Wait Events • Cary Millsap (Hotsos) calls this

“Method R”

• Anjo Kolk & et al (Oracle) call this

“YAPP Method”

• Kyle Haily (PerfVision) paper

“Waits Defined” Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!!

27

Application “Best Practices”

• • • Well known rules: Write efficient SQL and/or PL/SQL code (explain plans) Use bind variables to reduce unnecessary “re-parsing” Often the underlying Application Code (e.g. Benchmark)

NOT changeable

, so you can’t do anything • • • • Deeper TPC-C Analysis

(remember across all next steps)

: Primarily Read with Some Writes Small Average Transaction Size High Concurrency with Potential Deadlocks Logging for ACID compliance and no flashback 28

Public Network “Best Practices”

• • • Well known rules: Isolate Network (for single or related applications only) Use Gigabit Ethernet (consider “bonding” multiple cards) Use Layer 2 or 3 Switches and verify Gigabit throughput • • • • TPC-C Analysis Ramifications: Primarily Reads = Nothing Small Transaction = No jumbo frames, Standard SDU/TCU High Concurrency = Multiple Ethernet Segments (collisions) No Logging, etc… = Nothing 29

Storage Network “Best Practices”

• • • • Well known rules: Isolate Network (for single or related applications only) Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI Consider multiple pathways per storage controller and HBA Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s • • • • TPC-C Analysis Ramifications: Primarily Reads = Nothing Small Transaction = Jumbo frames since “Block Level” IO High Concurrency = Fiber Channel and Multiple Pathways (if budget) No Logging, etc… = Nothing 30

Storage Sub System “Best Practices”

• • • • • • • • • Well known rules: More Smaller Disks generally higher overall throughput More memory cache generally higher overall throughput Avoid “write-back” mode if no backup power source (e.g. battery) Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc Stripe Depth (i.e. size) from 256 KB to 1 MB Stripe Width (i.e. # disks) between 4 and 16 Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request Average I/O <= Stripe Width X Stripe Depth Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered) • • • • TPC-C Analysis Ramifications: Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK 31

Oracle Instance “Best Practices”

• • • • Well known rules: Size & Tune the SGA appropriately for application nature Choose reasonable block size based on application nature (

?8K?

) Partition large objects & indexes across storage devices & spindles Don’t assume any

“golden rules”

– i.e. test all assumptions!

• • • • TPC-C Analysis Ramifications: Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20 Small Transaction = Size redo logs correctly for small size X high load High Concurrency = cursor_space_for_time=t, cursor_sharng=similar No Logging, etc… = Turn off

“Recycle Bin”

but keep

“LOGGING”

32

Oracle Cluster “Best Practices”

• • • • Well known rules: Increase default SGA size for all ASM instances (64M too small) Interconnect is the most important bottleneck – many bonded NIC’s Consider hash partitions & reverse indexes to spread IO across nodes Don’t assume any

“golden rules”

– i.e.

really

test all assumptions!!!

• • • • TPC-C Analysis Ramifications: Primarily Reads = Nothing Small Transaction = Decrease db_file_multiblock_read_count High Concurrency = Decrease block size (

?4K?

) – see next slide No Logging, etc… = Nothing 33

Single Instance – No Block Contention

34

Cluster – Block Contention Costs

35

Private Network “Best Practices”

• • • • Well known rules: Isolate Network (for single cluster only – and 0% public) Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI) Consider multiple pathways per HBA and storage controller Jumbo frames since high “Block Level” IO between nodes • • • • TPC-C Analysis Ramifications: Primarily Reads = Nothing Small Transaction = Nothing High Concurrency = Lower block size until interconnect traffic OK – Consider increasing the OS priority of the global cache cluster services No Logging, etc… = Nothing 36

Let’s Apply the Recommendations

1 2 3 4

Read Count Block Size Cursor Space Cursor Share Index Cache Index Cost

5

Jumbo Test 1 16 8 False Exact 0 100 False Test 2

2

Test 3 2 Test 4 2 Test 5 2 8 4 4 4 False Exact False Exact True True 80 True True 80 0 0 100 False 100 False 20 False 20 True 37

Results – TPS (lesser interest here)

Transactions / Second

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0.00

50 100 150 200 250 300 350 400 450 500 Run 1 Run 2 Run 3 Run 4 Run 5 38

Results – Average Response Time

Average Response Time

6.00

5.00

4.00

3.00

2.00

1.00

0.00

50 100 150 200 250 300 350 400 450 500 Run 1 Run 2 Run 3 Run 4 Run 5

Sub Second

39

Thank you

Please offer any questions or comments Remember:

• Eliminate “RAC in the Box” syndrome – eat low hanging fruit  • Example was for TPC-C or OLTP type application • TPC-H or Data Warehouse would NOT be the same • Your mileage may well vary (especially percentages)

Toad World Article: “Maximize Database Performance Via Toad for Oracle” http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

Dell Power Solutions article: http://www.quest.com/success_stories/Dell-Quest.pdf

40