OpenVMS Solutions Center Lab Project - Spring 2004 : Oracle 9i RAC DT/HA in a distributed OpenVMS Environment Phase I – Failover.

Download Report

Transcript OpenVMS Solutions Center Lab Project - Spring 2004 : Oracle 9i RAC DT/HA in a distributed OpenVMS Environment Phase I – Failover.

OpenVMS Solutions Center Lab Project - Spring 2004 : Oracle 9i RAC DT/HA in a distributed OpenVMS Environment Phase I – Failover

RAC DT/HA – Goals – Phase I

 First:  Demonstrate that Oracle 9iRAC continues to run during simulated network failure using LAN Failover and failSAFE IP configurations.

 Second:  Measure the latency effect of failover when RAC instances are connected over long distance (100km).

RAC DT/HA – What is Failover?

 Oracle RAC failover: The ability to resume work on an alternate instance upon instance failure  Oracle TAF (Transparent Application Failover): Runtime failover which enables client applications to automatically reconnect to the database if the connection fails  LAN Failover: Hardware failover from failed network interface card (NIC) to another NIC configured as part of LAN failover set  failSAFE IP: Address failover to alternate interfaces

RAC DT/HA – Hardware Config

 2 4-cpu GS160, with Shared Cluster System disk, a Shared Oracle install disk on Enterprise Storage Array connected via Fibre SAN A Switch  DE602 AA (EIA) NIC’s, using Twisted Pair on 100m-bit LAN Extreme Summit4 Switch  5-DEGPA-SA, 1-DEGXA-SA (EWA-D) NIC’s, 1Gbit fiber on 1Gbit LAN Digital Networks DNSwitch 800  100km cable - Gbit SCS Extreme Summit 7i Switch

RAC DT/HA – Server Config

 OpenVMS 7.3-2, TCPIP 5.4

 Oracle Server 9.2.0.4, with Oracle patch for bug fix 3026720: Excessive CPU and BUFIO for LMD0 and SMON processes when >2cpu  Running 2 RAC instances, in 2 node cluster  Requires the INIT.ORA parameter CLUSTER_INTERCONNECTS to specify alternate network interface for RAC communication

RAC DT/HA – Client Config

 9.2 SQLNet Client, on PC running Windows 2000  Benchmark/Load Generating software: • Swingbench 2.1f An ‘unofficial’, Java based, client load generating tool from Oracle, which allows a ‘load’ to be generated and the transactions/response times to be charted • Configured to connect 100 clients, load balanced between the 2 instances, and run 50,000 ‘typical’ Order Entry transactions

RAC DT/HA – Test Plan

 Restore from disk backup before each test run to ensure same starting point  Ensure RAC instances communicating over specified network interface  Run 3 iterations of same benchmark load while collecting data • Run Benchmark load, no failures • Run Benchmark load, fail instance • Run Benchmark load, fail network connection between instances

RAC DT/HA – Data collection

 T4 running on both nodes, 10sec sampling interval  Saved Swingbench data results after each run  Executed and ‘saved’ output of VMS commands during network failures to see status of network devices and Oracle processes

$ MC LANCP SHOW DEVICE/CHARATERISTICS LLA0 $ TCPIP SHOW INTERFACES/FULL $ PIPE SHO SYS|SEA TT: ORA_CPU

Tabular Timeline Tracking Tool – T4

 Created by OpenVMS Sustaining Engineers to help diagnose OS functionality. Uses OpenVMS Monitor data, stored in Comma Separated Value file format (.csv file), which can then be used by a variety of applications (spreadsheets, TlViz, etc)  Download from web. Shipped with OpenVMS 7.3-2, in SYS$ETC directory  http://h71000.www7.hp.com/openvms/products/t4/index.html

 Users are able to queue data collection and configure data collection frequency   Helpful in establishing baseline performance footprint which can then be used in before and after comparisons of system changes T4 ‘hooks’ for Oracle and Rdb Server being created

RAC DT/HA – EIA Network

GS160 - QBB0 Oracle RAC network connection using EIA device EIA0 161.114.69.7

EVA Common System Disk Shared Oracle 9i Fiber San A Switch Database 100 M-bit Lan Extreme Summit 4 Switch EIA0 161.114.69.8

GS160 - QBB3 PC Swingbench Client 1 PC Swingbench Client 2 .

.

.

PC Swingbench Client 100

RAC DT/HA – T4 data - EIA

1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000

EIA0 - Baseline

16:20:00 (26-Mar-2004) 16:25:00 (26-Mar-2004) 16:30:00 (26-Mar-2004) 16:35:00 (26-Mar-2004) 16:40:00 (26-Mar-2004)

[NET.EIA0:]Bytes Recv/Sec(# 1)

Node QBB0

16:45:00 (26-Mar-2004) 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000

RAC DT/HA - LAN Failover Network

GS160 - QBB0 Oracle RAC connection using LLA0 device for LAN Failover EIA0 161.114.69.7

EWA0 EWB0 EVA LLA0 10.3.3.1

Common System Disk Shared Oracle 9i Database Fiber San A Switch 100 M-bit Lan Extreme Summit 4 Switch G-Bit LAN Digital Networks DNswitch 800 LLA0 10.3.3.2

EWA0 EWB0 PC Swingbench Client 1 PC Swingbench Client 2 .

.

.

PC Swingbench Client 100 EIA0 161.114.69.8

GS160 - QBB3

RAC DT/HA – LAN Failover DCL

$ MCR LANCP SHOW DEVICE/CHAR LLA0

Before NIC ‘fails’

Device Characteristics LLA0: Value Characteristic ----- ------------- 256 Max receive buffers Yes Full duplex enable . .

. .

1000 Line speed (mbps) "EWB0" Failover device "EWA0" Failover device (active) . .

. .

0 Failover priority

After NIC ‘fails’

Device Characteristics LLA0: Value Characteristic ----- ------------- 256 Max receive buffers Yes Full duplex enable . .

. .

1000 Line speed (mbps) "EWB0" Failover device (active) "EWA0" Failover device . .

. .

0 Failover priority

RAC DT/HA-T4 LAN Failover EWA/B

1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0

LAN Failover - Pull Cable

EWA0 cable pulled EWB0 cable pulled 17:05:00 (7-Apr-2004) 17:10:00 (7-Apr-2004) 17:15:00 (7-Apr-2004)

[NET.EWA0:]Bytes Recv/Sec(# 1)

17:20:00 (7-Apr-2004) 17:25:00 (7-Apr-2004) 17:30:00 (7-Apr-2004)

[NET.EWB0:]Bytes Recv/Sec(# 1)

Node QBB0

1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0

RAC DT/HA-T4 LAN Failover LLA0

LAN Failover - Pull Cable

1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 17:05:00 (7-Apr-2004) 17:10:00 (7-Apr-2004) 17:15:00 (7-Apr-2004) 17:20:00 (7-Apr-2004) 17:25:00 (7-Apr-2004)

[NET.LLA0:]Bytes Recv/Sec(# 1)

Node QBB0

17:30:00 (7-Apr-2004)

RAC DT/HA-T4 Overlay of EWA/LLA0

1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0

LAN Failover - Pull Cable

17:05:00 (7-Apr-2004) 17:10:00 (7-Apr-2004)

[NET.EWA0:]Bytes Recv/Sec(# 1)

1,900,000 1,800,000 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 17:15:00 (7-Apr-2004) 17:20:00 (7-Apr-2004)

[NET.EWB0:]Bytes Recv/Sec(# 1)

17:25:00 (7-Apr-2004) 17:30:00 (7-Apr-2004)

[NET.LLA0:]Bytes Recv/Sec(# 1)

Node QBB0

RAC DT/HA – failSAFE IP Network

GS160 - QBB0 Oracle RAC connection using EWD0/E0 devices FailSafeIP EVA EIA0 161.114.69.7

EWA0 EWB0 PC Swingbench Client 1 Common System Disk Shared Oracle 9i 10.4.4.1

Fiber San A Switch G-Bit LAN Digital Networks DNswitch 800 100 M-bit Lan Extreme Summit 4 Switch PC Swingbench Client 2 .

.

.

Database EWD0 10.4.4.2

EWE0 10.4.4.3

 10.4.4.2 & 10.4.4.3 are configured for FailSafeIP EIA0 161.114.69.8

PC Swingbench Client 100 GS160 - QBB3

RAC DT/HA – failSAFE IP DCL

$ TCPIP SHOW INTERFACE/FULL

Route Tree for Protocol Family 2: default 161.114.69.1 UGS 0 7999 IE0 10.4.4/24 10.4.4.2 U 274 408185 WE3 10.4.4/24 10.4.4.3 U 274 445714 WE4 10.4.4.2 10.4.4.2 UHL 0 0 WE3 10.4.4.3 10.4.4.3 UHL 0 14 WE4 WE3: flags=c43 failSAFE IP Addresses: inet 10.4.4.3 netmask ffffff00 broadcast 161.114.69.63 (on QBB3 WE4) *inet 10.4.4.2 netmask ffffff00 broadcast 10.4.4.255 ipmtu 1500 WE4: flags=c43 failSAFE IP Addresses: inet 10.4.4.2 netmask ffffff00 broadcast 161.114.69.63 (on QBB3 WE3) *inet 10.4.4.3 netmask ffffff00 broadcast 10.4.4.255 ipmtu 1500

RAC DT/HA – failSAFE IP DCL Failed 1

$ TCPIP SHOW INTERFACE/FULL

Route Tree for Protocol Family 2: default 161.114.69.1 UGS 0 7999 IE0 10.4.4/24 10.4.4.2 U 274 408185 WE3 10.4.4/24 10.4.4.3 U 274 445714 WE4 10.4.4.2 10.4.4.2 UHL 0 0 WE3 10.4.4.3 10.4.4.3 UHL 0 14 WE4 WE3: flags=c43 *failSAFE IP - interface is in a failed state failSAFE IP Addresses: inet 10.4.4.3 netmask ffffff00 broadcast 161.114.69.63 (on QBB3 WE4) *inet 10.4.4.2 netmask ffffff00 broadcast 10.4.4.255 (on QBB3 WE4) WE4: flags=c43 *inet 10.4.4.3 netmask ffffff00 broadcast 10.4.4.255 ipmtu 1500 inet 10.4.4.2 netmask ffffff00 broadcast 161.114.69.63 ipmtu 1500

RAC DT/HA – failSAFE IP DCL Failed 2

$ TCPIP SHOW INTERFACE/FULL

Route Tree for Protocol Family 2: default 161.114.69.1 UGS 0 7999 IE0 10.4.4/24 10.4.4.2 U 274 408185 WE3 10.4.4/24 10.4.4.3 U 274 445714 WE4 10.4.4.2 10.4.4.2 UHL 0 0 WE3 10.4.4.3 10.4.4.3 UHL 0 14 WE4 WE3: flags=c43 *inet 10.4.4.2 netmask ffffff00 broadcast 10.4.4.255 ipmtu 1500 inet 10.4.4.3 netmask ffffff00 broadcast 161.114.69.63 ipmtu 1500 WE4: flags=c43 *failSAFE IP - interface is in a failed state.

failSAFE IP Addresses: inet 10.4.4.2 netmask ffffff00 broadcast 161.114.69.63(on QBB3 WE3) *inet 10.4.4.3 netmask ffffff00 broadcast 10.4.4.255 (on QBB3 WE3)

RAC DT/HA – T4 data failSAFE IP

FailSafeIP - Pull Cable

1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 EWD0 cable pulled EWE0 cable pulled 14:40:00 (14-Apr-2004) 14:45:00 (14-Apr-2004) 14:50:00 (14-Apr-2004) 14:55:00 (14-Apr-2004)

[NET.EWD0:]Bytes Recv/Sec(# 1)

15:00:00 (14-Apr-2004) 15:05:00 (14-Apr-2004)

[NET.EWE0:]Bytes Recv/Sec(# 1)

Node QBB3

1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0

RAC DT/HA – 100km cable Network

GS160 - QBB0 Oracle RAC connection using EWA0 device separated by 100km EIA0 161.114.69.7

EWA0 EVA PC Swingbench Client 1 Common System Disk Shared Oracle 9i Fiber San A Switch G-Bit SCS Extreme Summit 7i 100km Fiber Cable G-Bit SCS Extreme Summit 7i 100 M-bit Lan Extreme Summit 4 Switch Database EWA0 PC Swingbench Client 2 .

.

.

PC Swingbench Client 100 EIA0 161.114.69.8

GS160 - QBB3

RAC DT/HA

– T4 EWA0 w/100km cable

EWA0 with 100km Fiber cable between instances

1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 12:25:00 (16-Apr-2004) 12:30:00 (16-Apr-2004) 12:35:00 (16-Apr-2004) 12:40:00 (16-Apr-2004) 12:45:00 (16-Apr-2004) 12:50:00 (16-Apr-2004)

[NET.EWA0:]Bytes Recv/Sec(# 1)

QBB0

12:55:00 (16-Apr-2004) 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000

RAC DT/HA

– T4 EIA compared w/ EWA

Bytes/sec of EIA NIC over UTP and EWA NIC over 100km

Red graph says [NET.EWA0], but this is really [NET.EIA0] 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 1,700,000 1,600,000 1,500,000 1,400,000 1,300,000 1,200,000 1,100,000 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 16:20:00 (26-Mar-2004) 16:30:00 (26-Mar-2004)

[NET.EWA0:]Bytes Recv/Sec(# 1)

16:40:00 (26-Mar-2004)

[NET.EWA0:]Bytes Recv/Sec(# 2)

RAC DT/HA – Load Generation Data

50k Transactions, no RAC or Network Failure Network Interface Baseline (EIA 161.114.69.x) Lan Failover (EWA 10.3.3.x) FailSafe IP (EWD 10.4.4.x) 100 km Baseline (EWA 10.3.3.x) Total duration 30:08 30:02 30:02 29:52 TPS 27.8

27.9

27.9

28.0

RAC DT/HA – Load Generation Data

50k Transactions, Network failover Network Interface Baseline (EIA 161.114.69.x) Lan Failover (EWA 10.3.3.x) FailSafe IP (EWD 10.4.4.x) 100 km Baseline (EWA 10.3.3.x) Total duration N/A 30:02 30:13 N/A TPS N/A 27.9

27.7

N/A

RAC DT/HA – Load Generation Data

50k Transactions, 1 RAC instance failed Network Interface Baseline (EIA 161.114.69.x) Total Duration 33:25 TPS 25.0

50client Failover 00:37 Lan Failover (EWA 10.3.3.x) FailSafe IP (EWD 10.4.4.x) 100 km Baseline (EWA 10.3.3.x) 29:54 30:02 29:39 28.0

27.7

28.0

00:39 00:39 00:43

RAC DT/HA – Conclusions

 RAC seemed to have no problems when running with network configured to use LAN Failover or failSAFE IP (on the same node).

 There seems to be a definite distributing effect on network traffic when Oracle init.ora parameter CLUSTER_INTERCONNECTS is used

RAC DT/HA – Phase II and III

 Phase II: Configure Oracle 9iRAC 2-node cluster using Raid-1 Shadow Sets for database and logfiles, and test recently released Host Based Mini-Merge (HBMM) functionality in a variety of configurations.  Refer to: http://h71000.www7.hp.com/news/hbmm.htm

 Phase III: Distribute nodes in cluster over 100km+ distance and test failover and HBMM functionality

RAC DT/HA - References

OpenVMS Technical Journal:  Matt Muggeridge’s July 2003 - V2 Article: Configuring TCP/IP for High Availability http://h71000.www7.hp.com/openvms/journal/v2/ articles/tcpip.pdf

 Steve Lieman’s January 2004 - V3 Article: TimeLine-Driven Collaboration with T4 & Friends: A Time-saving Approach to OpenVMS Performance http://h71000.www7.hp.com/openvms/journal/v3/ t4.pdf

RAC DT/HA – References (con’t)

 TCPIP docs: http://h71000.www7.hp.com/doc/tcpip54.html

 OpenVMS docs: http://h71000.www7.hp.com/doc/os732_index.ht

ml  HP TCP/IP Services for OpenVMS Management: Chapter 5 Configuring and Managing FailSAFE IP o http://h71000.www7.hp.com/doc/732final/docum entation/pdf/aa-lu50m-te.pdf

RAC DT/HA – References (con’t)

 HP OpenVMS System Management Utilities Reference Manual: Chapter 13, LAN Control Program (LANCP) Utility o http://h71000.www7.hp.com/doc/732FINAL/DOC UMENTATION/PDF/aa-pv5ph-tk.PDF

 HP OpenVMS System Manager’s Manual, Volume 2 -Tuning, Monitoring, and Complex Systems: Chapter 10, Managing the Local Area Network (LAN)Software o http://h71000.www7.hp.com/doc/732FINAL/aa pv5nh-tk/aa-pv5nh-tk.pdf

RAC DT/HA – References (con’t)

Oracle References:  Swingbench – an ‘unofficial’ load generating benchmarking tool, developed in Java, which allows a load to be generated and the transactions/response times to be charted  http://www.dominicgiles.com/swingbench.php

 OTN otn.oracle.com

Real 24/7: Use Oracle9i RAC and TAF to guarantee availability. http://otn.oracle.com/oramag/oracle/02 may/o32clusters.html

RAC DT/HA – References (con’t)

Oracle Metalink articles: metalink.oracle.com

.

 Note:183340.1 - Frequently Asked Questions About the.

 CLUSTER_INTERCONNECTS Parameter in 9i.

 Note 220970.1 traffic?" “Which network is Oracle using for RAC  Note: 162725.1 - OPS/RAC VMS: Using alternate TCP Interconnects on 8i OPS and 9i RAC on OpenVMS.

 Note: 226880.1 – Configuration of Load Balancing and Transparent Application Failover.

OpenVMS Solutions Lab

 Available to customers to test new hardware, software, applications  Alpha and Integrity systems available for use  To get the most benefit from the Lab, customer is expected to be prepared with exact list of hardware and software requirements, test plan and goals