RAC Best Practices on Linux

Download Report

Transcript RAC Best Practices on Linux

Session id: 40136

RAC Best Practices on Linux

Kirk McGowan Technical Director – RAC Pack Server Technologies Oracle Corporation Roland Knapp Principal Member Technical Staff – RAC Pack Server Technologies Oracle Corporation

Agenda

   Planning Best Practices – – – – Architecture Expectation setting Objectives and success criteria Project plan Implementation Best Practices – – – – Infrastructure considerations Installation/configuration Database creation Application considerations Operational Best Practices – – – Backup & Recovery Performance Monitoring and Tuning Production Migration

Planning

 Understand the Architecture – – – – Cluster terminology Functional basics  HA by eliminating node & Oracle as SPOFs  Scalability by making additional processing capacity available incrementally Hardware components   Private interconnect/network switch Shared storage/concurrent access/storage switch Software components  OS, Cluster Manager, DBMS/RAC, Application  Differences between cluster managers

RAC Hardware Architecture

Network Centralized Management Console High Speed Switch or Interconnect Clustered Database Servers Hub or Switch Fabric Mirrored Disk Subsystem Low Latency Interconnect ie. VIA or Proprietary

Users No Single Point Of Failure

Storage Area Network

RAC Software Architecture

Shared Data Model

GES&GCS Shared Memory/Global Area shared SQL log buffer GES&GCS GES&GCS Shared Memory/Global Area shared SQL log buffer

. . .

. . .

Shared Memory/Global Area shared SQL log buffer GES&GCS Shared Memory/Global Area shared SQL log buffer Shared Disk Database

RAC on Linux HW & SW Components

public network

Node1a Oracle 9

i

RAC instance 1 DB cache ORACM Unbreakable Linux

cluster interconnect cache to cache

Node2a Oracle 9

i

RAC instance 2 DB cache ORACM Unbreakable Linux

shared storage

concurrent access from every node = “scale out”

redo log instance 1 … redo log instance 2 … control files database files

N3

more nodes = higher availability

N4 Nn

Linux Cluster Hardware

 Cluster interconnects – FastEthernet, Gigabit Ethernet  Public networks – Ethernet, FastEthernet, Gigabit Ethernet  Memory, swap & CPU Recommendations – Each server should have a minimum of 512Mb of memory, at least 1Gb swap space, and two CPUs.  Fiber Channel, SCSI, or NAS storage connectivity

Unbreakable Linux Distributions

   Red Hat Enterprise Linux AS and ES United Linux 1.0

– SuSE Linux Enterprise Server 8 (SuSE Linux AG) – – – Conectiva Linux Enterprise Edition (Conectiva S.A.) SCO Linux Server 4.0 (The SCO Group) Turbolinux Enterprise Server 8 (Turbolinux) Oracle will support Oracle products running with other distributions but will not support the operating system.

RAC Certification for Unbreakable Linux

 Certification – – – – – Enterprise class OS distribution (e.g. RH AS, United Linux 1.0) Clusterware (Oracle Cluster Manager only) Network Attached Storage (e.g. Network Appliance filers) Most SCSI and SAN storage are compatible 32 bit and 64 bit Itanium 2 Intel based servers are certified.

  For more details on software certification: http://technet.oracle.com/support/metalink/content.html

Discuss hardware configuration with your HW vendor

Linux IA64 requirements

 Operating System Requirements – Red Hat Linux Advanced Server 2.1 operating system with kernel 2.4.18-e.14.ia64.rpm – – – – – glibc 2.2.4-29 Gnu gcc 2.96.0 release Linux Header Patch 2.4.18 (available from Intel) asynch libraries libaio-0.3.92-1 (Oracle9i Release Notes Release 2 (9.2.0.2.0) for Linux Intel on Itanium (64-bit) Part No. B10567-02 )

Set Expectations Appropriately

If your application will scale transparently on SMP, then it is realistic to expect it to scale well on RAC, without having to make any changes to the application code.

RAC eliminates the database instance, and the node itself, as a single point of failure, and ensures database integrity in the case of such failures

Planning: Define Objectives

 Objectives need to be quantified/measurable – – – HA objectives  Planned vs unplanned  Technology failures vs site failures vs human errors Scalability Objectives  Speedup vs scaleup  Response time, throughput, other measurements Server/Consolidation Objectives  Often tied to TCO  Often subjective

Build your Project Plan

     Partner with your vendors – Multiple stakeholders, shared success Build detailed test plans – Confirm application scalability on SMP before going to RAC  optimize first for single instance Address knowledge gaps and training – – Clusters, RAC, HA, Scalability, systems management Leverage external resources as required Establish strict System and Application Change control – – – Apply changes to one system element at a time Apply changes to first to test environment Monitor impact of application changes on underlying system components Define Support mechanisms and escalation procedures

Agenda

   Planning Best Practices – – – – Architecture Expectation setting Objectives and success criteria Project plan

Implementation Best Practices

– – – –

Infrastructure considerations Installation/configuration Database creation Application considerations

Operational Best Practices – – – Backup & Recovery Performance Monitoring and Tuning Production Migration

Infrastructure Considerations

  Architecture/Design – – – Eliminate SPOFs (Single Points of Failure) Workload Distribution (load balancing) strategy Systems management framework for monitoring and managing to SLAs Hardware/Software – Processing nodes – sufficient CPU to accommodate failure – Scalable I/O Subsystem  Use S.A.M.E.

– – Private Interconnect  Gige, UDP, switched Patch levels and certification

Impementation Flowchart

Configure HW Install cluster manager 9.2.0.1

Create database Configure private interconnect Install Oracle 9.2.0.1

Install Unbreakable Linux Install 9.2.0.3 cluster manager Configure storage and install OCFS Install Oracle 9.2.0.3

Installation Flowchart for Red Hat Linux AS 2.1

Boot Use DRUID for Partition Setup Account Configuration Choose Language Select Keyboard & Mouse Choose – Advanced Server Option Select Boot Loader Configure Network Configure Timezone Select Graphic Mode Boot Floppy Creation Installation Complete / Reboot

Install tips for Red Hat Linux AS 2.1

    As documented in: – “Tips and Techniques: Install and Configure Oracle9i on Red Hat Linux Advanced Server” by Deepak Patel, Oracle http://otn.oracle.com/tech/linux/pdf/installtips_final.pdf

Boot options – Always use Advanced Server install. As needed install required packages. CD 1 to 3 has all rpm packages. CD 3 and 4 has source packages. CD 5 includes docs. Memory – Based on physical memory on machine smp or enterprise kernel is installed. ( <= 4 GB smp kernel and > 4 GB enterprise kernel ) Post Installation – Add users, configure network and other administrative tasks after installation.

Install tips for United Linux 1.0

    You must install the latest UnitedLinux kernel update! Oracle was certified against an update kernel, the original UL-1.0 kernel is NOT certified!

After installing United Linux 1.0, install Service Pack 2a from: ftp://suse.us.oracle.com/pub/suse/i386/unitedlinux-1.0-iso/ You will also need to have the basic developments tools installed, like make, gcc_old(2.95.3), and the binutils package. Full installation instructions: ftp://ftp.suse.com/pub/suse/i386/supplementary/commercial/Orac le/docs/920_sles8_install.pdf

Install tips for United Linux 1.0

 Install the orarun.rpm package from either the SP2 CD – /UnitedLinux/i586/orarun-1.8-18.i586.rpm

– or from ftp://ftp.suse.com/pub/suse/i386/supplementary/commerci al/Oracle/sles-8/orarun.rpm

 orarun.rpm

 update kernel (ie. shmmax, shmmin)   UDP settings (256K) Installs and configures hangcheck-timer

Prepare Linux Environment

 Follow these steps on EACH node of the cluster – Set Kernel parameters in /etc/sysctl.conf

– Add hostnames to /etc/hosts file – Establish file system or location for ORACLE_HOME (writable for oracle userid) – Setup host equivalence for oracle userid (.rhosts)

Installation Flowchart for OCFS

Download the latest OCFS rpm’s from www.ocfs.org

Create partition on the primary node Install the rpm’s on all nodes Run ocfstool to format and mount your new filesystem Run ocfstool as root (configures /etc/ocfs.conf) on all nodes Run load_ocfs (insmod will load ocfs.o) on all nodes Mount the new filesystem on all nodes Edit rc.local or equivalent add load_ocfs and ‘mount –t ocfs

OCFS and Unbreakable Linux

  Redhat – – – – currently ships 4 flavors of the AS 2.1 kernel, viz., UP, SMP, Enterprise and Summit (IBM x440) Oracle provides a separate OCFS module for each of the kernel flavors Minor revisions of the kernel do not need a fresh build of ocfs e.g., ocfs built for e.12 will work for e.16, e.18, etc. United Linux – – – United Linux ships 3 flavors of its kernel, for the 2.4.19-64GB SMP, the 2.4.19-4GB and the 2.4.19-4GB-SMP kernel OCFS 1.0.9 is supported on UL 1.0 Service Pack 2a or higher OCFS build is not currently upward compatible with kernel (pre SP3)  must ensure OCFS build exists for each new Kernel version prior to upgrading kernel

OCFS and RAC

     Maintains cache coherency across nodes for the filesystem metadata only – Does not synchronize the data cache buffers across nodes, lets RAC handle that OCFS journals filesystem metadata changes only – Filedata changes are journalled by RAC (log files) Overcomes some limitations of raw devices on Linux – No limit on number of files – – Allows for very large files (max 2TB) Max volume size 32G (4K block) to 8T (1M block) Oracle DB performance is comparable to raw kernel e.25 is strongly recommended for use with OCFS 1.0.9 (remove old kernel tuning parameters)

Install Tips for OCFS

     Ensure OCFS rpm corresponds to kernel version – uname –r (i.e. 2.4.19-4GB) Remember to also download rpm’s for OCFS “Support Tools” and “Additional Tools” Download the dd/tar/cp rpm that supports o_direct Use rpm –Uv to install all 4 rpm’s on all nodes Use OCFS for Oracle DB files only, not Oracle binaries (OCFS 1.0.x was not designed as a general purpose filesystem).

Installation Flowchart for oracm and Oracle

Configure private interconnect and quorum device Install 9.2.0.1 software with the RAC option Install the oracm from the 9.2.0.3 patchset Install the oracm from the 9.2.0.1 CD-ROM Kill the oracm and watchdog process Install the 9.2.0.3

patchset Configure ocmargs.ora

and cmcfg.ora

Load the softdog and start with ./ocmstart.sh the cluster manager on both nodes modify ocmargs.ora

and cmcfg.ora (remove watchdog) Load the hangcheck-timer module with lsmod Fix empty directory bug Start with ./ocmstart.sh the cluster manager

Hangcheck NM, and CM Flow (After V9.2.0.2)

Oracle Instance Cluster Manager (including Node Monitor) User-mode Oracm maintains both, node status view and instance status view.

The hangcheck-timer monitors the kernel for hangs, and resets the node if needed.

Kernel-mode Hangcheck-timer

Post Installation

    To enable asynchronous I/O must re-link Oracle to use skgaioi.o Adjust UDP send / receive buffer size to 256K Larger Buffer Cache – – Create an in-memory file system on the /dev/shm (mount -t shm shmfs -o size=8g /dev/shm) To enable the extended buffer cache feature, set the init.ora paramter USE_INDIRECT_DATA_BUFFERS = true Increasing Address Space – – Default 1.7 GB of address space for its SGA.

See Metalink Note: 200266.1 for details and a sample program.

Create RAC database using DBCA

    Create Database  Use DBCA to simplify DB creation – Start gsd ( global services daemon ) on all nodes, if it is not already running.

 Set MAXINSTANCES, MAXLOGFILES, MAXLOGMEMBERS, MAXLOGHISTORY, MAXDATAFILES (auto with DBCA) Create tablespaces as locally Managed (auto with DBCA) Create all tablespaces with ASSM (auto with DBCA) Configure automatic UNDO management (auto with DBCA) Use SPFILE instead of multiple init.ora’s (auto with DBCA)

Validate RAC Configuration

    Instances running on all nodes SQL> select * from gv$instance RAC communicating over the private Interconnect SQL> oradebug setmypid SQL> oradebug ipc SQL> oradebug tracefile_name /home/oracle/admin/RAC92_1/udump/rac92_1_ora_1343841.trc

– Check trace file in the user_dump_dest: SSKGXPT 0x2ab25bc flags info for network 0 socket no 10 IP

204.152.65.33

UDP 49197 sflags SSKGXPT_UP info for network 1 socket no 0 IP 0.0.0.0 UDP 0 sflags SSKGXPT_DOWN RAC is using desired IPC protocol: Check Alert.log

... cluster interconnect IPC version:Oracle UDP/IP IPC Vendor 1 proto 2 Version 1.0 PMON started with pid=2 ...

Use cluster_interconnects only if necessary

Configure srvconfig / srvctl

     SRVCTL uses information from srvconfig – Reads $ORACLE_HOME/srvm/config /srvConfig.loc information  File can be a RAW Device or OCFS file Srvconfig -init gsd must be running Add ORACLE_HOME – $ srvctl add database -d db_name -o oracle_home [-m domain_name] [-s spfile] Add instances (for each instance enter the command) – $ srvctl add instance -d db_name -i sid -n node

Application Deployment

 Same guidelines as single instance – – SQL Tuning Sequence Caching – – – – – Partition large objects Use different block sizes Tune instance recovery Avoid DDL Use LMT’s and ASSM as noted earlier

Agenda

   Planning Best Practices – – – – Architecture Expectation setting Objectives and success criteria Project plan Implementation Best Practices – – – – Infrastructure considerations Installation/configuration Database creation Application considerations Operational Best Practices – – – Backup & Recovery Performance Monitoring and Tuning Production Migration

Operations

      Same DBA procedures as single instance, with some minor, mostly mechanical differences.

Managing the Oracle environment – – – Starting/stopping cluster services (ocmstart.sh) Starting/stopping gsd Managing multiple redo log threads Startup and shutdown of the database – Use srvctl Backup and recovery Performance Monitoring and Tuning Production migration

Operations: srvconfig / srvctl

  Use SRVCTL to administer your RAC database environment. – OEM and the Oracle Intelligent Agent use the configuration information that SRVCTL generates to discover and monitor nodes in your cluster. Global Services Daemon (GSD) receives requests from SRVCTL to execute administrative job tasks, such as startup or shutdown. – GSD must be started on all the nodes in your RAC environment so that the manageability features and tools operate properly. (GSDCTL)

Operations: Backup & Recovery

 RMAN is the most efficient option for Backup & Recovery – – – – Managing the snapshot control file location.

Managing the control file autobackup feature.

Managing archived logs in RAC – choose proper archiving scheme. Node Affinity Awareness  RMAN and Oracle Net in RAC apply – you cannot specify a net service name that uses Oracle Net features to distribute RMAN connections to more than one instance.

 Oracle Enterprise Manager – GUI interface to Recovery Manager

Performance Monitoring and Tuning

      Tune first for single instance 9i Use Statspack: – – – Separate 1 GB tablespace for Statspack snapshots at 10-20 min intervals during stress testing, hourly during normal operations Run on all instances, staggered Supplement with scripts/tracing – – – Monitor V$SESSION_WAIT to see which blocks are involved in wait events Trace events like 10046/8 can provide additional wait event details Monitor Alert logs and trace files, as on single instance Oracle Performance Manager RAC-specific views Supplement with System-level monitoring – – – CPU utilization never 100% I/O service times never > acceptable thresholds CPU run queues at optimal levels

Performance Monitoring and Tuning

   Obvious application deficiency on a single node can’t be solved by multiple nodes.

– Single points of contention. – – Not scalable on SMP I/O bound on single instance DB Tuning on single instance DB to ensure applications scalable first – – – Identify/tune contention using v$segment_statistics to identify objects involved Concentrate on the top 5 Statspack timed events if majority of time is spent waiting Concentrate on bad SQL if CPU bound Maintain a balanced load on underlying systems (DB, OS, storage subsystem, etc. ) – Excessive load on individual components can invoke aberrant behaviour.

Performance Monitoring and Tuning

 Deciding if RAC is the performance bottleneck – – Amount of Cross Instance Traffic  Type of requests  Type of blocks Latency  Block receive time   buffer size factor bandwidth factor

Production Migration

 Adhere to strong Systems Life Cycle Disciplines – – – – – – Comprehensive test plans (functional and stress) Rehearsed production migration plan

Change Control

Separate environments for Dev, Test, QA/UAT, Production

System AND application change control

Log changes to spfile

Backup and recovery procedures Security controls Support Procedures

Next Steps….

    Recommended sessions – List 1 or 2 sessions that complement this session Recommended demos and/or hands-on labs – List of or two demos or labs that will let them see this product in action.

See Your Business in Our Software – Visit the DEMOgrounds for a customized architectural review, see a customized demo with Solutions Factory, or receive a personalized proposal. Visit the DEMOgrounds for more information.

Relevant web sites to visit for more information – List urls here.

Reminder – please complete the OracleWorld online session survey Thank you.

Resources

RedHat Linux – http://www.redhat.com/oracle/  Linux Center - Technical White Papers & Documentation – http://otn.oracle.com/tech/linux/tech_wp.html

 “Tips and Techniques: Install and Configure Oracle9i on Red Hat Linux Advanced Server ” by Deepak Patel, Oracle Corporation • http://otn.oracle.com/tech/linux/pdf/installtips_final.pdf

 “Tips and Techniques: Install and Configure Oracle9i on SLES8 / United Linux 1.0

• http://www.suse.com/en/business/certifications/certified_software/oracle/db /9iR2_sles8.html

United Linux 1.0 Resources

 United Linux – http://www.unitedlinux.com

 SuSE – http://www.suse.com/us/business/products/server/sles/index.html

  SCO Group (Formerly Caldera System)

-

http://www.ebizenterprises.com/page1.asp?p=463  Connectiva – http://www.connectiva.com

TurboLinux – http://www.turbolinux.com/

Recommended one-off patches

 Bug 2820871 - ORA-29740 NODE EVICTION DESIGN ALGORITHM AND ABRUPT TIME CHANGE ARU: 9.2.0.3 ARU 4161735 completed for LINUX Intel  Bug 2420930 - GET ORA-600 [KSXPMPRP1] DURING STARTUP IN RAC MODE WITH LARGER BUFFERS. This was mysteriously included in 9.2.0.2, but not in 9.2.0.3. Bug 2875050 was opened for this issue. ARU: 9.2.0.3 ARU 4202164completed for LINUX Intel  Bug 2420930 - GET ORA-600 [KSXPMPRP1] DURING STARTUP IN RAC MODE WITH LARGER BUFFERS Bug 2922471 – Fractured

Recommended one-off patches

 Bug:2844009 - MISSING LIBCXA.SO.3 LIBRARY ISSUE IN PSR 9203. ARU: 9.2.0.3 ARU 4046387 completed for LINUX Intel  Bug 2779294 – node_list does not populated into oraInventory/ContentsXML/inventory.xml. opatch install will only apply to local node. Workaround is editing inventory.xml documented in bug 2742686.  Bug 2646914, 2675090, 2706220 and 2695783 ORA-600 [KCCSBCK_FIRST], [2] on linux and W2K platform after installing 9.2.0.2. Very important patch, missing from 9.2.0.3

ARU: 9.2.0.3 ARU 4110670 completed for LINUX

Hangcheck-timer and Oracle Cluster Manager

 Download Patch 2594820 from Metalink – #rpm -ivh  Detaching watchdogd from the Cluster Manager (Bug 2495915)   The removal of the watchdogd  ORACLE_HOME/oracm/admin/cmcfg.ora – –

WatchdogTimerMargin WatchdogSafetyMargin

KernelModuleName=hangcheck-timer

CMDiskFile from optional to mandatory

CM quorum partition of

cluster participation.

Hangcheck-timer and Oracle Cluster Manager

 remove or comment out from the /etc/rc.local file:

/sbin/insmod softdog nowayout=0 soft_noboot=1 soft_margin=60 ADD to rc.local, execute as root to load

/sbin/insmod hangcheck-timer.o hangcheck_tick=30 hangcheck_margin=180

Hangcheck-timer and Oracle Cluster Manager

 inclusion of the hangcheck-timer kernel module Parameter Service Value ---------------- ------ hangcheck_tick ---------------- hangcheck-timer ------- 30 seconds 180 seconds hangcheck_margin hangcheck-timer KernelModuleName oracm hangcheck-timer MissCount hangcheck_tick oracm hangcheck_margin > (> 210

Hangcheck-timer and Oracle Cluster Manager

 cmcfg.ora example HeartBeat=15000 ClusterName=Oracle Cluster Manager, version 9i KernelModuleName=hangcheck-timer PollInterval=1000 MissCount=215 PrivateNodeNames=int-node1 int-node2 PublicNodeNames=node1 node2 ServicePort=9998 CmDiskFile=/ocfsdisk1/quorum/quorumfile HostName=int-node1

Hangcheck-timer and Oracle Cluster Manager

 Parameters for ocmargs.ora

oracm norestart 1800

Linux Monitoring and Configuration Tools

Overall tools CPU Memory Disk I/O Kernel messages OS error codes OS calls

sar, vmstat /proc/cpuinfo, mpstat, top /proc/meminfo, /proc/slabinfo, free iostat

Network

/proc/net/dev, netstat, mii-tool

Kernel Version and Rel.

Types of I/O Cards

lspci –vv cat /proc/version

Kernel Modules Loaded List all PCI devices (HW) Startup changes

lsmod, cat /proc/modules lspci –v /etc/sysctl.conf, /etc/rc.local /var/log/messages, /var/log/dmesg /usr/src/linux/include/asm/errno.h

/usr/sbin/strace-p

Post Installation

Increasing Address Space     Default 1.7 GB of address space for its SGA.

Shutdown all instances of Oracle cd $ORACLE_HOME/lib cp -a libserver9.a libserver9.a.org

to make a backup copy cd $ORACLE_HOME/rdbms/lib genksms -s 0x15000000 >ksms.s

lower SGA base to 0x15000000

make -f ins_rdbms.mk ksms.o

compile in new SGA base address

make -f ins_rdbms.mk ioracle (relink)

Post Installation

Increasing Address Space Cont.

 sysctl –w kernel.shmmax=3000000000  Lower process base – Find out the pid of the process (shell) from where oracle will be started using ps (Oracle - echo $$) – changing /proc/$pid/mapped_base to 0x10000000 and restarting oracle  Metalink Note: 200266.1

Post Installation

0xFFFFFFFF

Default

Reserved for kernel 0xC0000000 Variable SGA 0xFFFFFFFF

After Relink

Reserved for kernel 0xC0000000 Variable SGA 0x50000000 0x40000000 0x00000000 DB Buffers (SGA) Code, etc.

sga_base

(relink Oracle)

mapped_base

(/proc//mapped_base) 0x15000000 0x10000000 0x00000000

Lowering of mapped base

DB Buffers (SGA) Code, etc.

Post Installation

Larger Buffer Cache does buffer cache increase with larger SGA  Create an in-memory file system on the /dev/shm  mount -t shm shmfs -o size=8g /dev/shm  To enable the extended buffer cache feature, set the init.ora paramter  USE_INDIRECT_DATA_BUFFERS = true  Don’t Use dynamic cache parameters   DB_CACHE_SIZE DB_#K_CACHE_SIZE Limitations apply to the extended buffer cache feature on Linux: You cannot change the size of the buffer cache while the instance is running.

You cannot create or use tablespaces with non-standard block sizes.

Post Installation

Adjust send / receive buffer size to 256K

Tuning the default and maximum window sizes: /proc/sys/net/core/rmem_default - default receive window /proc/sys/net/core/rmem_max - maximum receive window /proc/sys/net/core/wmem_default - default send window /proc/sys/net/core/wmem_max - maximum send window

-

sysctl -w net.core.rmem_max=262144 sysctl -w net.core.wmem_max=262144 sysctl -w net.core.rmem_default=262144 sysctl -w net.core.wmem_default=262144

Post Installation

 To enable asynchronous I/O must re-link Oracle to use skgaioi.o – cd to $ORACLE_HOME/rdbms/lib – – make -f ins_rdbms.mk async_on make -f ins_rdbms.mk ioracle – – set 'disk_asynch_io=true' (default value is true) set 'filesystemio_options=asynch‘ (RAW Only)