HACMP Best Practices 2009 2009. 10. 15. ()

Download Report

Transcript HACMP Best Practices 2009 2009. 10. 15. ()

2009 하반기 SMA 세미나
HACMP Best Practices
2009. 10. 15.
백진훈 ([email protected])
MTS, GTS, IBM Korea
© 2009 IBM Corporation
목차
Conte
1. HACMP 개요
nts
2. 요소별 고려사항
3. 실 고객 구성
사례
4. HACMP with
PowerVM
. os n 
onts
6.
첨부
© 2009 IBM Corporation
PowerHA Message
• HACMP is now PowerHA for AIX
Renamed as part of Power software initiative
Publications and product (binaries) continue to use HACMP
© 2009 IBM Corporation
개요 및 문제점
HACMP :
• 1991년에 출시, 최신 5.5까지 20번째 release
•
전세계적으로 production 환경에서 60,000 이상의 Reference
•
AIX 기반의 강력하고 안정적인 고 가용성 Product
•
다양한 형태의 유연한 구성 지원
•
구성단계에서 검증 절차도 통과되어 작동되고, 운영 중이기는 하지만 실제 고 가용
성을 제공해야 한다는 측면에서 볼 때는 최적의 구성이 아닌 경우가 있음
•
높은 수준의 가용성을 확보하기 위해서 고려되어야 할 점들은 무엇인가?
© 2009 IBM Corporation
Why do good clusters turn bad?
HACMP가 제대로 작동하지 않는 일반적인 이유 :
• 잘못된 cluster design과 철저하지 못한 planning
- design, planning, test
•
기본 TCP/IP and LVM 구성 문제
•
HACMP cluster topology and resource 구성 문제
•
Cluster를 운영하는데 있어 변경관리 규율(통제) 부재
•
Cluster 관리자에 대한 교육 부족
•
Performance/capacity 문제
© 2009 IBM Corporation
Designing High Availability
•
“…A fundamental design goal of (successful) cluster design is the elimination
of single points of failure (SPOFs) through appropriate design, planning,
selection of hardware, configuration of software, and carefully controlled
change management discipline.…”
© 2009 IBM Corporation
Fault resilience vs. Fault tolerance
High Availability does not mean no interruption to the application
thus we say,
fault resilient instead of tolerant.
© 2009 IBM Corporation
Eliminating single points of failure
Cluster Object
Object
Eliminated as a single point of failure by . . .
failure by . . .
Node
Using multiple nodes
Power Source
Source
Using multiple circuits or uninterruptible power supplies
uninterruptible power supplies
Network adapter
adapter
Network
Using redundant network adapters
Using multiple networks to connect nodes
connect nodes
TCP/IP
Subsystem
Using non-IP networks to connect adjoining nodes and clients
adjoining nodes and clients
Disk adapter
adapter
Disk
Using redundant disk adapter or multipath hardware
multipath hardware
Using multiple disks with mirroring or raid
mirroring or raid
• “…A fundamental design goal of (successful) cluster design is the elimination of
Application
Add node
for takeover; configure application monitor
single points of failure
(SPOFs).…”
on
configure application monitor
Administrator
Add backup or very detailed operations guide
© 2009 IBM Corporation
목차
Contents
1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
HACMP Cluster Configuration – Step by Step
Cluster Name: Cluster 1
Network: net_ether0
NODE A
NODE B
RG1 (NodeA, NodeB)
Service IP
Volume Group
Production App
RG2 (NodeB)
Network: rs232_net
Network: diskhb_net
Storage
Subsystem
Service IP
Volume Group
Development App
Topology Components:
• Cluster name
• Node Names
• IP Network
• Interfaces
• Serial Network
Types of Resources
• Service IP
• Volume Group/s
• Application Server
Resource Components:
• Resource Group/s
• Policies
- startup
- fallover
- fallback
• Dependencies
• Parent / Child
• Location
© 2009 IBM Corporation
HACMP Location Dependencies
Cluster Name: Cluster 1
Network: net_ether0
NODE A
NODE B
RG1 (NodeA, NodeB)
Service IP
Volume Group
Production App
Location Dependencies:
- RGs can coexist on same node
- RGs can coexist on different nodes
RG2 (NodeB)
Network: rs232_net
Network: diskhb_net
Service IP
Volume Group
Development App
- Can also set Priorities:
- High
- Intermediate
- Low
Diagram Assumptions:
RG1 – High Priority
RG2 – Low Priority
On Fallover:
- RG2 Offline
- RG1 will move to Node B
Storage
Subsystem
© 2009 IBM Corporation
HACMP Fallover Scenarios Mutual Fallover
Cluster Name: Cluster 1
Network: net_ether0
NODE A
NODE B
RG1 (NodeA, NodeB)
RG1 (NodeA, NodeB)
Service IP
Volume Group
Application 1
RG2
(NodeB, ONLINE
NodeA)
Service IP
Network: rs232_net
Environment:
- Each Machine is running its own
Production application
- Node A fails to Node B
- Node B fails to Node A
Volume Group
Application 1
Service
IP
Volume
Group
RG2
ONLINE
Application
2
Service IP
(NodeB, NodeA)
Volume Group
Application 2
Fallover Behavior:
- On fallover the target machine will
need enough CPU & memory
resources in order to handle the
load of both applications
Network: diskhb_net
Storage
Subsystem
© 2009 IBM Corporation
Resource Group Tips
RG Decisions beyond: Startup Fallover & Fallback behavior
NODE A
RG1 (NodeA, NodeB)
Service IP
VG1
APP Server 1
RG3 (NodeA, NodeB)
Service IP
VG3
APP Server 3
NODE B
RG2 (NodeB, NodeA)
Service IP
VG2
APP Server 2
RG4 (NodeB, NodeA)
Service IP
VG4
APP Server 4
Further Options
• 1 RG vs. Multiple RGs
– Selective Fallover behavior (VG / IP)
• RG Processing
– Parallel vs. Sequential
• Delayed Fallback Timer
•
RG Dependencies
– Parent / Child
– Location Dependencies
Best Practice:
Always try to keep it simple, but stay current with new features and take
advantage of existing functionality to avoid added manual customization.
© 2009 IBM Corporation
Cluster Components
Nodes
• up to 32 nodes, active와 standby nodes의 조합
•
모든 node가 running인 구성이 가능 (mutual takeover)
•
신뢰할 만한 cluster들은 최소한 한 개의 standby node가 있음
•
공동의 power supply를 가지면 안됨. Ex) 단일 rack에 위치하는 power supply
•
단일 frame내에 lpar로 cluster node를 생성해서는 안됨.
•
여분의 network, disk adapter들을 설치할 수 있는 충분한 I/O slot이 있어야 함
•
즉, Single node에 필요한 수량의 두 배수 만큼의 slot
•
모든 cluster resource는 backup이 있어야 함, 각 node의 rootvg는 mirror되거나
RAID device에 놓여야 함
© 2009 IBM Corporation
Cluster Components(Cont.)
Nodes
• Production application이 peak로 수행될 때에도 HACMP가 정상적으로 작동되도
록 충분한 CPU cycles과 I/O bandwidth가 확보되어야 함
•
Takeover를 고려할 때 resource 사용률이 40% 이하를 유지해야 함
•
단일 standby node가 여러 active node를 backup하면, 모든 가능한 workload를 수
행할 만큼의 충분한 용량이 확보 되어야 함
•
Dlpar를 지원하는 H/W에서, application이 기동되기 전에 node를 takeover하기 위
해서는 HACMP가 processor와 memory를 할당 받아 구성할 수 있어야 함. 모든
resource(CPU, memory)는 사용 가능하거나, CoD를 통해서 획득 가능해야 함
© 2009 IBM Corporation
DLPAR/CoD configuration
Primary node에서 HACMP가 장애를 감지
Running in a partition on another server, HACMP grows the backup partition,
activates the required inactive processors and restarts application
DLPAR/CoD Server
Production
Database Server
(running applications on active processors)
Database
Server
HACMP
HACMP
Shared
Disk
Inactive Processors
Order Entry
Active Processors
Web Server
•
•
HACMP
HACMP
© 2009 IBM Corporation
Configuration Requirements
•
HMC IP와 관리대상 시스템의 이름은 각 DLPAR에 구성되어야만 함
•
모든 DLPAR node들은 HMC와 SSH를 이용한 통신이 가능해야만 함
- clverify checks the SSH connectivity
•
Cod를 사용할 경우 key값은 HMC를 통해 수작업으로 활성화 되어야 함
•
HACMP가 구성할 수 있는 최대 resource는 DLPAR profile 상의 최대 값과 같거나
그 이하임
© 2009 IBM Corporation
HACMP
with
micro-prtition
 Tkeover로 인한 업무 증가로 CPU자원이 더 필요함
 hypervisor에서 각 파티션의 CPU사용률을 모니터링하고 CPU
자원이 필요한 파티션에 더 많은 CPU자원을 할당함
100
WAS 서버
100
개발 서버
0
CPU 사
용률
100
0
CPU 사
용률
100
Test 서버 1
Test 서버 2
0
CPU 사
용률
Prouction
서버1에 장애 발생
& fllover
100
rouction 서버1 (ctive)
0
CPU 사
용률
0
CPU 사
용률
100
Prouction 서버2
Shr
e
isk
0
CPU 사
용률
MicroPrti
tion
환경 하에서
하이퍼바이저
가
자동으로
CPU
자원을 동적
으로
이동하여 부
하를
조절함.
© 2009 IBM Corporation
Infrastructure Considerations
•
•
•
•
•
•
Power Redundancy
I/O Drawers
SCSI Backplane
SAN HBAs
Virtualized Environments
Application Fallover Protection
Real Customer Scenarios:
Ie 1. SCSI adapters for rootvg on same BUS
Ie 2. Two nodes sharing I/O drawer
1
I/O drawer
6
2
I/O drawer
7
3
I/O drawer
8
4
I/O drawer
9
5
I/O drawer
10
Moral of the Story:
* High Availability goes beyond just installing the cluster software
© 2009 IBM Corporation
내장 SCSI 디스크 상의 Single Point of Failure
# lsdev -Ccdisk
hdisk0
Available
Drive
hdisk1
Available
Drive
hdisk2
Available
Drive
#
# lsdev -Cc adapter
……………..
scsi0
Available
……………..
0A-08
0A-08-00-8,0
16 Bit LVD SCSI Disk
0A-08-00-9,0
16 Bit LVD SCSI Disk
0A-08-00-10,0
16 Bit LVD SCSI Disk
Wide/Ultra-3 SCSI I/O Controller
Single Point of Failure
© 2009 IBM Corporation
Single Point of Failure 예시(IO Drawer)
Internal Disk Drive ( 전면 )
1
2
r
o
o
t
v
g
r
o
o
t
v
g
PCI Adapter ( 후면 )
0
r
o
o
t
v
g
L8 L9 L10 L11 L8 L9 L10 L11
T6
T5
Planar 2
L8 L9 L10 L11 L8 L9 L10 L11
T6
T5
Planar 1
e e f
n n c
t t s
4 48 2
E
m
p
T
y
e
n
t
8
f
c
s
3
f
c
s
4
f
c
s
0
f
c
s
1
1
2
4
5
6
7
8
9 10
e
n
t
24
E
e f m
c
n
p
t s
28 11 T
y
e f f f f
n c c c c
t s s s s
32 12 13 9 10
1
2
5
3
3
4
6
Planar 1
7
8
S
A
0
S
A
S
0
9 10
e f
n c
t s
16 6
E
f
m
c
p
s
T
7
y
e f f
n c c
t s s
20 8 15
e
n
t
12
E
f
m
c
p
s
T
5
y
1
3
5
8
9 10
2
4
6
7
E
e f f m
c
c
n
p
t s s
40 16 17 T
y
e f f
n c c
t s s
44 18 19
e
n
t
36
E
f
m
c
p
s
T
14
y
1
5
8
9 10
2
3
4
6
7
Planar 2
© 2009 IBM Corporation
Benefit of Boot from SAN
•
p595
구성의 유연성
- Power 시스템의 Internal adapter &
- Bay 등의 제한에 따른 LPAR 구성 한계를 극복
•
60GB
High performance
- 내장 Disk 대비 월등한 외장 스토리지의 I/O 성능 활용가능
- 외장 스토리지의 Disk cache 활용
•
16 core
I/O drawer * 2ea
High reliability
- DDM 장애 등에 무관하게 시스템 영속성 보장
- 외장 스토리지의 높은 가용성 활용
- 내부 복제, 원격복제 솔루션 등 활용가능
•
용량 효율성
- 외장 스토리지에 복수개의 OS, backup image 등을 위치시킬 수 있음
- 기존의 DDM size 단위로 rootvg 할당대비 유연한 할당 가능 (ex, 20GB, 40GB)
•
손쉬운 AIX 백업 가능 – Flashcopy 가 “mksysb” 를 대체
- Point-In-Time OS backup 을 여러 벌 확보 가능
- OS 백업(Flashcopy target본)을 portable 하게 다른 시스템에서 바로 사용가능
rootvg
20GB X1
datavg
30GB X4
appvg
30GB X3
© 2009 IBM Corporation
Networks
•
Network은 IP and Non-IP network로 구성됨. IP-Network과 Non-IP Network을 통
해서 node간에 통신하며 heartbeat을 주고 받음
•
Network 장애인지 Node 장애인지를 구분하기 위해서 non-IP network 구성이 반드
시 구성되어야 함
•
HACMP는 다음의 세가지 type의 failure만을 다룸:
- Network interface card (NIC) failures
- Node failures
- Network failures
VS.
© 2009 IBM Corporation
Partitioned Cluster (Split Brain)
•
IP network 문제로 heartbeat check가 정상적으로 이뤄지지 않아 각 node는 살아
있음에도 상대 node가 down 되었다고 판단하여 서로가 takeover를 시도함.
•
이로 인해 양 node에서 RG가 동시에 online되는 현상이 발생.
•
공유 disk를 사용하는 application이 동시에 기동됨에 따라 data corruption이 발생
할 수도 있음
•
Proving that Node Isolation caused the problem:
– /tmp/clstrmgr.debug file
– AIX error log entry GS_DOM_MERGE_ER
© 2009 IBM Corporation
Disk Heartbeating vs. RS232 Links
•
HACMP 5.1에서 소개된 기능 : non-IP heartbeating을 위해 SAN 환경을 활용
•
사용 이유 :
- RS232 cables의 거리 제약
- integrated serial ports의 부족
- 일부 model은 integrated port를 heartbeat용으로 사용하는데 제약사항이 있음
- Clusters with more than two nodes may require an async adapter with a RAND
•
필요사항 :
- 작은 size의 disk 또는 LUN (ex. 1GB)
- bos.clvm.enh fileset 설치
- An Enhanced Concurrent Mode VG
(RG로 정의될 필요는 없음)
© 2009 IBM Corporation
Checking for Fast Failure Detection (FFD)
# lssrc –ls topsvcs | grep Fast
Fast Failure Detection enabled
# odmget –q name=diskhb HACMPnim
HACMPnim:
name = "diskhb"
desc = "Disk Heartbeat Serial protocol"
addrtype = 1
path =
"/usr/sbin/rsct/bin/hats_diskhb_nim"
para = “FFD_ON“
grace = 60
hbrate = 3000000
cycle = 8
gratarp = 0
entry_type = "adapter_type"
next_generic_type = "transport"
next_generic_name = ""
src_routing = 0
Reason to set this option:
In the event of a crash this option
will allow for the takeover to start
up immediately instead of waiting
for the Failure Detection Rate
timeout to pass.
Note: new feature starting with HACMP 5.4
This change will not take effect until the NIM is recycled during a cluster restart
© 2009 IBM Corporation
Important network best practices for high availability
•
IP address, subnetmasks, switch port setting, VLAN 등의 network단에서의 변경
에 조심
- 장애 감지는 node당 최소 두장의 물리적 adapter가 동일한 물리적 network/VLAN 안에서
있을 때 가능함
•
최소 한 개의 non-IP network을 구성
•
Network 가용성 확보 차원에서 HACMP에서 Etherchannel을 구성하여 사용하면
유용함
•
구성 시 secondary switch로 연결된 backup adapter를 포함시켜 구성할 것
•
HACMP는 etherchannel 구성을 단일 adapter network으로 간주. adapter 장애 시
문제해결 지원을 위해 netmon.cf file을 별도로 구성. Cluster 외부의 다른 interface
로 ICMP echo request(ping)를 전송해서 adapter 장애를 판단
•
각 node에 persistent IP를 구성
- 원격 관리, monitoring에 유용함
© 2009 IBM Corporation
HACMP Topology Considerations
•
IPAT via Replacement vs. IPAT via Aliasing *
Considerations:
- Max number service IPs within HACMP network
- Hardware Address Takeover (HWAT)
- Speed of Takeover
- Firewall Issues
IPAT via Replacement
Node A
en0 – 9.19.10.1
net_ether_0
en0 - 9.19.10.28
Node B
(boot)
(service IP)
en1 – 192.168.11.1 (standby)
en0 – 9.19.10.2 (boot)
en0 – 9.19.10.2 (boot)
en1 – 192.168.11.2 (standby)
© 2009 IBM Corporation
HACMP Topology Considerations
•
Contrast between Replacement & Aliasing
Considerations:
- Max number service IPs within HACMP network
- Speed of swap
- Hardware Address Takeover (HWAT)
- Firewall Issues
IPAT via Aliasing
net_ether_0
Node A
Node B
en0 – 192.168.10.1 (base1)
en0 – 192.168.10.2 (base1)
9.19.10.28
(persistent a)
9.19.10.51
(service IP)
en1 – 192.168.11.1 (base2)
9.19.10.50
9.19.10.29 (persistent b)
en1 – 192.168.11.2 (base2)
(service IP)
© 2009 IBM Corporation
Etherchannel with HACMP
•
Example of an EtherChannel (Backup) Configuration
primary
secondary
Network Switches
Network: net_ether_0
Node A
ent0
ent1
ent2
en2
10.10.100.1 boot
192.168.10.1 persistent a
Node B
boot 10.10.101.1 en2
ent2
persistent b 192.168.10.4
ent0
ent1
192.168.10.2 service IP
Verification Messages:
For nodes with a single Network Interface Card per logical network configured, it is
recommended to include the file '/usr/es/sbin/cluster/netmon.cf' with a "pingable“ IP
address as described in the 'HACMP Planning Guide'.
WARNING: File 'netmon.cf' is missing or empty on the following nodes:Node A Node B
© 2009 IBM Corporation
Netmon.cf
•
Etherchannel과 같은 단일 adapter의 장애를 HACMP가 정확하게 판단하기가 어려
울 수 있음
•
RSCT Topology Services가 단일 adapter의 정상 작동 여부를 확증하기 위해
packet을 강제적으로 전송할 수 없기 때문임
•
Etherchannel과 같이 single adapter network 구성에서는 netmon.cf file을 생성.
- 일반적으로 default G/W IP 주소를 사용
- 다른 node로 부터의 heartbeat packet 수신이 안되면, 미리 설정되었던 G/W 로
ping을 시도
•
/usr/sbin/cluster/netmon.cf
Ex) 180.146.181.119
steamer
chowder
180.146.181.121
© 2009 IBM Corporation
Topology environments with a Firewall
If multiple IPs on the same subnet are configured on the same interface AIX
will utilize the 1st one configured for the outbound traffic.
LAN
Node X
en0 – 10.10.10.1 boot
9.19.51.1 persistent IP
9.19.51.2 service IP1
Clients can talk to the
cluster nodes, but node
initiated traffic from the
9.19.51.X network will
look like its coming from
the persistent IP not the
service IP
en1 – 10.10.11.1 boot
Network set to use IPAT via Aliasing
Firewall
Tip:
If you only need to manage one service IP per HACMP network consider using
IPAT via Replacement to avoid having multiple IPs on the same interface.
© 2009 IBM Corporation
Topology environments with a Firewall
Overriding default AIX behavior sometimes requires some creativity
LAN
Node X
en0 – 10.10.10.1 boot
9.19.51.2 service IP
9.19.51.1 persistent IP
Clients still talk to the
cluster nodes via both
IPs, but node initiated
traffic now from the
9.19.51.X network will
look like its coming from
the service IP
en1 – 10.10.11.1 boot
Network set to use IPAT via Aliasing
Firewall
Work around 1:
Perform an ifconfig down of the persistent alias within the application start script
followed by an immediate ifconfig up to make it the second IP on the list
© 2009 IBM Corporation
Topology environments with a Firewall
IPAT via Replacement will only host one IP on an interface at any given time
hence avoiding multiple IPs within the same subnet
LAN
Node X
When HACMP is
activated the base
address will be replaced
by the new service IP
address that the clients
use to connect to
en0 – 9.19.51.1
9.19.51.2 boot
service IP1
en1 – 10.10.11.1 standby
Network set to use IPAT via Replacement
Firewall
Work Around 2:
If you only need to manage one service IP per HACMP network consider using
IPAT via Replacement to avoid having multiple IPs on the same interface.
© 2009 IBM Corporation
Enhanced Concurrent VG
•
•
•
•
•
AIX 5L V5.1 처음 소개됨
HACMP에서 사용 가능한 모든 Disk에서 구현 가능함
JFS and JFS2 filesystems 지원
– File systems은 한번에 한 node에만 mount
기존의 classic concurrent volume groups을 대체
Enhanced concurrent VGs는 다음의 기능을 사용하기 위해 필요함 :
– Heartbeat over disk for a non-IP network
– Fast disk takeover
© 2009 IBM Corporation
Converting VG to ECM
•
Stop Cluster
•
한 Node씩 아래와 같은 절차를 수행
- varyonvg
- chvg -C
- varyoffvg
•
모든 Node에서 lsattr -El <VGNAME> 을 통해 VG 의 속성이 이상이 없는지 확인
auto_on
conc_auto_on
conc_capable
•
<VGNAME>
<VGNAME>
<VGNAME>
n N/A
n N/A
y N/A
True
True
True
Verification and synchronization
© 2009 IBM Corporation
Converting VG to ECM(cont.)
© 2009 IBM Corporation
Active varyon vs. passive varyon
•
Active varyon
© 2009 IBM Corporation
Active varyon vs. passive varyon(Cont.)
•
Passive varyon
© 2009 IBM Corporation
Adapters
•
다중 port adapter를 사용할 경우 한 adapter내에서 빈 port를 이용하여 backup을
구성해서는 안됨.
•
Built-in Ethernet adapter를 사용할 경우, Node내에 별도의 backup adapter가 있어
야함
•
가능하면 별도의 IO Drawer 또는 서로 다른 backplane, BUS에 adapter를 위치시
켜서 이중화 구성을 해야 함
© 2009 IBM Corporation
Applications & application Scripts
•
자동화
- 관리자에 의한 수작업 없음
•
일부 application들은 uname이나 serial no. 또는 IP address와 같은 특정한 OS 특
성과 밀접한 관계를 보이는 경향이 있음 (ex. SAP)
•
Application이 현재 running 중인지 확인
- RG가 unmanaged 상태일 때, default startup option을 사용하면 HACMP가 application start
script를 재 수행함
•
Data 상태 확인. 복구가 필요한가?
•
Correct Coding :
- start with declaring a shell (ex. #!/bin/usr/ksh)
- exit with RC=0
- application이 정말 중단되었는지 확인하는 절차 포함
- fuser
•
Smart Assist : DB2, Websphere, Oracle
© 2009 IBM Corporation
Application Monitoring
세가지 type의 monitoring 법
• Startup Monitors – run one time
• Process Monitors – check specified process instance in the process table
• Custom Monitors – run your specified script during reiterating interval
Resource monitors 는 문제 발생 시 단순 알림(notify) event를 수행하도록 구성할 수
도 있고,
정상인 상대 node로 서비스가 fallover 되도록 구성할 수도 있음
Don’t stop at just the base configuration - with thorough testing these can be great
tools to automate recovery and save an admin time.
© 2009 IBM Corporation
Application Monitoring(Cont.)
•
Configure Process Application Monitors
ex)
© 2009 IBM Corporation
Application Monitoring(Cont.)
•
Configure Custom Application Monitors
ex)
© 2009 IBM Corporation
Testing Best Practices
•
Production으로 이행하기 전에 application scripts와 application monitoring을 철저
히 test해야 함
•
모든 방향으로의 fallover를 test 해야 함
•
Test Cluster
- Lpars within same frame
- Virtual resources
•
가용한 Tool을 활용 – Cluster Test Tool
- Further customization enhancements in HACMP 5.4
•
정기적인 test 계획이 세워져야 함 – ex) node fallover and fallback test
- 최소 반기 1회 이상
© 2009 IBM Corporation
Maintenance
•
통제된 관리 환경
- 고 가용성을 확보하기 위해 가장 중요한 것 : 엄격한 변경제어, 변경절차 관리, Test
•
Cluster node에 변경작업을 하기 전에 HACMP snapshot을 받아 놓아야 함
•
OLPW(Online Planning Worksheet)을 이용한 HTML 보고서 생성
- http://www-03.ibm.com/systems/power/software/availability/aix/apps/download.html
•
CSPOC operations
- Cluster Single Point of Control
- No TIPing (Testing In Production) : Production 환경에서는 절대로 test 해서는 안됨
•
Production과 동일한 test cluster 시스템 유지 – ex) PowerVm을 이용한 test환경
구축
•
문서화된 복구 절차
- test를 통해 복구절차와 예상되는 결과를 문서화 해서 활용
© 2009 IBM Corporation
Documenting the Environment with OLPW
•
Online Planning Worksheets: HTML report file
© 2009 IBM Corporation
What about PowerHA and distance?
•
•
•
•
Two options, Remote PowerHA or PowerHA/XD
Remote PowerHA
– Systems see same copy of data (shared access to same LUNs)
– LVM mirroring (with the optional but recommended Cross-site LVM Mirroring configured) keep copies in sync
across sites
– System-down condition is a site-down condition
– Cross-site LVM Mirroring facilitates integration/reintegration, managing LVM mirror copy consistency between
sites
PowerHA/XD
– Systems see “own” copy of data that is replicated
– Data replication mechanism keeps copies in sync across sites
• GLVM (Geographic Logical Volume Manager)
• Metro-Mirror in DS8000 or SVC
– Local fallover option can be employed to keep a system-down condition local to the site
– PowerHA/XD facilitates the integration/reintegration, including replication role reversal, between sites
Again, networking (IP/non-IP) used for monitoring in both cases
© 2009 IBM Corporation
목차
Contents
1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Case #1 – A사
Cluster Name: A사
2 Node Cluster (A + S)
Network: net_ether0
IP Aliasing
NODE A
NODE B
RG1(NodeA, NodeB)
Service IP
Volume Group
Production App
Standby Node
Etherchannel : (A + B)
Rs232/MNDHB :
모두 사용
Application Monitoring :
사용
Network: rs232_net
Network: diskhb_net
Storage
Subsystem
Takeover Test : 안함
Application Monitoring
timeout이 짧아서
fallover 되는 도중에 다시
fallback을 시도하는
현상이 있었음(duration
time을 늘려서 정상
조치함)
© 2009 IBM Corporation
Case #2 – B사
Cluster Name: B사
Network: net_ether0
3 Node Cluster :
(A + A + A)
NODE B
NODE A
NODE C
RG2
RG1(NodeA, NodeB)
Service IP
Volume Group
Production App
(NodeB,NodeA)
Service IP
Volume Group
Production App
IP Aliasing
Etherchannel : (A + B)
RG3
(NodeC,NodeB)
Service IP
Volume Group
Production App
Rs232/MNDHB :
모두 사용
Application Monitoring :
미사용
Network: rs232_net
Takeover Test : 주기적
Network: diskhb_net
Storage
Subsystem
© 2009 IBM Corporation
Case #3 – C사
Cluster Name: C사
Network: net_ether0
NODE B
NODE B
NODE A
RG1
Service IP
Volume Group
Production
App
4 Node Cluster :
(A + A + A +A)
NODE D
RG2
RG3
Service IP
Volume Group
Production App
Service IP
Volume Group
Production App
RG4
Service IP
Volume Group
Production
App
IP Aliasing
Etherchannel : (A + B)
※ Active 2장
Rs232 : 사용
Application Monitoring :
미사용
Network: rs232_net
Takeover Test : 부정기적
Storage
Subsystem
© 2009 IBM Corporation
Case #4 – D사
1,400 Business Applications are
planning to run!
P72
HMC
(Redundant)
P72
ThinkCenter
P72
ThinkCenter
1394
P72
ThinkCenter
1394
ThinkCenter
1394
Power 595
1394
Power 595
1 x 64 way IBM System p5 595
1 x 64 way IBM System p6 595
13 dedicated partitions for failover
9 Dedicated partitions for App
13 dedicated partitions for DB
19 Dedicated partitions for App
8 VIOS with production clients
8 VIOS with production clients
40 miles
NIM partition
NIM/CSM partition
HACMP/XD
SVC
SVC/Metro-Mirroring
(Syncronous PPRC)
DWDM
(FC Extender)
SVC
DWDM
(FC Extender)
1 x 16 way IBM System p6 570
1 x 16 way IBM System p6 570
2 VIOS with 20 VIO clients for
Tivoli applications
2 VIOS with 13 VIO clients for
Dev/QA servers
No Physical Adapters
except VIOs !
No Physical Adapters
except VIOs !
Total Storage
DS8300
Site 1(IDC)
Total Storage
DS8100
Site 2(Juneau)
© 2009 IBM Corporation
목차
Contents
1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
VSCSI General Diagram
VIO Client로 lun을 export 하는 2가지 방법:
• Logical volumes within VG
• Whole Disk mapping – required when sharing across VIO Servers
(Enhanced Concurrent VGs required on VIO Clients)
FRAME 1
VIO Server
hdisk0
(5GB)
rootvg
AIX Client LPAR 1
client_lunsvg
(5GB)
client1_rootvg_lv
vhost0
(10GB) client1_datavg_lv
(36GB)
(21GB)
hdisk2
(72GB)
free partitions
Whole
Disk
Mapping
vhost1
rootvg
hdisk1
datavg1
vscsi0
Hypervisor
hdisk1
hdisk0
AIX Client LPAR 2
scsi0
hdisk0
rootvg
vscsi0
hdisk1
datavg2
(no_reserve)
© 2009 IBM Corporation
VSCSI General Diagram – (Mapping entire Disk)
FRAME 1
VIO Server 1
hdisk0
(5GB)
hdisk1
(72GB)
rootvg
Whole
Disk
Mapping
vhost0
AIX Client LPAR 2
(no_reserve)
Hypervisor
Storage
Subsystem
VIO Server 2
hdisk0
(5GB)
hdisk1
(72GB)
hdisk0
rootvg
hdisk1
datavg
vscsi0
MPIO
vscsi1
rootvg
Whole
Disk
Mapping
scsi0
vhost0
(no_reserve)
© 2009 IBM Corporation
Virtual SCSI & HACMP 구성 절차
•
On Storage device
- 대응하는 VIO server 두 대에 Luns 할당
•
On HMC
- Mappings 정의– (vhost & vscsi)
•
On VIO Server 1
- “no_reserve” 속성 설정
chdev -l <hdisk#> -a reserve_policy=no_reserve –a algorith=round_robin
- 각 client로 luns export
mkvdev –vdev hdisk# -vadapter vhost0
mkvdev –f –vdev hdisk# -vadapter vhost1
•
On VIO Server 2
- “no_reserve” 속성 설정
chdev -l <hdisk#> -a reserve_policy=no_reserve
- 각 client로 luns export
mkvdev –vdev hdisk# -vadapter vhost0
mkvdev –f –vdev hdisk# -vadapter vhost1
© 2009 IBM Corporation
Virtual SCSI & HACMP 구성 절차(Cont.)
•
On Clients
- MPIO SDDPCM 설치
- 첫 번째 client에서 공유 VG(volume group)을 ECVG (Enhanced Concurrent VG)
로 생성
(bos.clvm.enh fileset이 필요함)
- Client 1에서 varyoffvg
- Client 2에서 importvg
- Define to HACMP as a shared resource
© 2009 IBM Corporation
VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)
FRAME 1
HACMP Node A
hdisk0
hdisk0
vscsi0
Hypervisor
STORAGE
SUBSYSTEM
VIOS 1
no_reserve
hdisk0
vhost0
vhost0
hdisk1
}
sharedvg
HACMP Node B
vscsi0
hdisk1
}
sharedvg
VIOS 2
This configuration works initially because HACMP has visibility to the
shared disk from both
servers.
However,
doespicture?
NOT provide VIO Server
What
is wrong
withit this
redundancy. If VIO Server1 were to fail or go down for maintenance
HACMP Node 1 would not be able to see or utilize the disks.
© 2009 IBM Corporation
VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)
FRAME 1
HACMP Node A
vhost0
vscsi0
MPIO
hdisk0
STORAGE
SUBSYSTEM
no_reserve
VIOS 1
hdisk0
vhost0
Hypervisor
vhost1
}
sharedvg
HACMP Node B
vscsi0
MPIO
hdisk0
vhost1
hdisk1
vscsi1
vscsi1
hdisk1
}
sharedvg
VIOS 2
For HACMP clients within the same frame you need an additional path
connection from each VIO Server. This additional VSCSI lun export is
accomplished by using the mkvdev command with the “-f” flag option.
© 2009 IBM Corporation
VSCSI Disks & HACMP (2 Frames - Single HBA on VIO Servers)
FRAME 1
hdisk0
hdisk0
HACMP Node A
Hypervisor
no_reserve
VIOS 1
vhost0
vscsi0
MPIO
hdisk1
vscsi1
}
sharedvg
vhost0
VIOS 2
hdisk0
FRAME 2
STORAGE
SUBSYSTEM
hdisk0
hdisk0
HACMP Node B
Hypervisor
no_reserve
VIOS 1
vhost0
vscsi0
MPIO
vscsi1
hdisk1
}
sharedvg
vhost0
VIOS 2
© 2009 IBM Corporation
VSCSI Disks & HACMP (2 Frames - Dual HBAs on VIO Servers)
FRAME 1
VIOS 1
HBA
MPIO
hdisk0
vhost0
no_reserve
VIOS 2
HBA
HBA
MPIO
hdisk0
HACMP Node A
Hypervisor
HBA
vscsi0
MPIO
hdisk1
vscsi1
}
sharedvg
vhost0
hdisk0
FRAME 1
VIOS 1
HBA
MPIO
vhost0
no_reserve
VIOS 2
HBA
HBA
MPIO
hdisk0
HACMP Node B
Hypervisor
HBA
hdisk0
vscsi0
MPIO
vscsi1
hdisk1
}
sharedvg
vhost0
© 2009 IBM Corporation
HACMP View of Virtual SCSI
Node A
Node B
Resource Group 2
Resource Group 1
sharedvg
sharedvg
Resource Group 2
Resource Group 1
sharedvg
sharedvg
hdisk0
PVID
VSCSI
Shared disk
hdisk0
PVID
Enhanced
Concurrent
Enhanced
Concurrent
FRAME X
hdiskx
hdiskx
HACMP Node A
Hypervisor
hdiskx
no_reserve
VIOS 1
vhost0
vscsi0
MPIO
vscsi1
hdiskx
}
sharedvg
vhost0
VIOS 2
© 2009 IBM Corporation
Virtual SCSI Summary
•
Luns 은 VIO Servers에 할당됨
- 다중(최소 2개) VIO Servers를 사용 (Frame 당 2개)
•
“No Reserve” 속성이 반드시 설정되어야 함
•
HACMP Clients에서 ECM VGs를 활용함
© 2009 IBM Corporation
Virtual I/O Network Terms
Shared
Ethernet
Adapter
(Acts as a
layer 2 bridge)
Virtual I/O Server (VIOS)
AIX Client LPAR 1
AIX Client LPAR 2
Interface
Link
Aggregation
Adapter
(Combines
physical
adapters)
ent3
(LA)
ent0
(phy)
ent1
(phy)
ent4
(SEA)
en5
ent2
(virt)
ent5
(virt)
en0
en1
en0
en1
ent0
(virt)
ent1
(virt)
ent0
(virt)
ent1
(virt)
PVID=10
PVID=1
PVID=10
PVID=1
Virtual
Ethernet
Adapter
Physical
Ethernet
Adapter
Hypervisor
Channel
Group
IEEE 802.3ad Link Aggregation (LA)
Cisco EtherChannel (EC)
Virtual Ethernet
Ethernet Switch
Path Virtual ID (PVID)
IEEE 802.1Q Virtual LAN (VLAN)
© 2009 IBM Corporation
Virtual Ethernet & HACMP (No Link Aggregation / Same Frame)
Virtual I/O Server (VIOS1)
ent4
(SEA)
ent0
(phy)
ent2
(virt)
AIX Client LPAR 1
AIX Client LPAR 2
en6
en0
en0
ent6
(virt)
ent0
(virt)
Virtual I/O Server (VIOS2)
Control
Channel
Control
Channel
ent5
(virt)
ent4
(SEA)
en6
ent6
(virt)
ent0
(virt)
ent5
(virt)
ent2
(virt)
ent0
(phy)
PVID 99
Hypervisor
PVID 10
Frame 1
Ethernet Switch
Ethernet Switch
This is a diagram of the configuration required for SEA fallover across VIO Servers. Note
that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk
priority on the “ent2” virtual adapter would designate the primary VIO Server to use.
© 2009 IBM Corporation
Virtual Ethernet & HACMP (Link Aggregation / Same Frame)
Virtual I/O Server (VIOS1)
ent3
(LA)
ent0
(phy)
ent1
(phy)
ent4
(SEA)
ent2
(virt)
AIX Client LPAR 1
AIX Client LPAR 2
en0
en0
en1
ent6
(virt)
ent0
(virt)
Virtual I/O Server (VIOS2)
ent3
(LA)
Control
Channel
Control
Channel
ent5
(virt)
ent4
(SEA)
en0
ent6
(virt)
ent1
(virt)
ent5
(virt)
ent2
(virt)
ent1
(phy)
ent0
(phy)
PVID 99
Hypervisor
PVID 10
Frame 1
Ethernet Switch
Ethernet Switch
Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk
priority on the “ent2” virtual adapter would designate the primary VIO Server to use.
© 2009 IBM Corporation
Virtual Ethernet & HACMP (Independent Frames)
Virtual I/O Server (VIOS1)
ent3
(LA)
Frame1
ent1
(phy)
ent0
(phy)
ent4
(SEA)
ent2
(virt)
AIX Client LPAR 1
Virtual I/O Server (VIOS2)
en0
Control
Channel
Control
Channel
ent5
(virt)
ent0
(virt)
ent5
(virt)
ent4
(SEA)
ent2
(virt)
ent3
(LA)
ent1
(phy)
ent0
(phy)
Hypervisor
Ethernet Switch
Ethernet Switch
Hypervisor
Frame2
ent1
(phy)
ent0
(phy)
ent2
(virt)
ent5
(virt)
ent0
(virt)
Control
Channel
ent3
(LA)
ent4
(SEA)
Virtual I/O Server (VIOS1)
en0
AIX Client LPAR 2
ent5
(virt)
Control
Channel
ent2
(virt)
ent4
(SEA)
ent1
(phy)
ent0
(phy)
ent3
(LA)
Virtual I/O Server (VIOS2)
© 2009 IBM Corporation
HACMP View of Virtul
Ethernet (IPAT vi Alising)
net_ether_0
9.19.1.20
9.19.1.10
(service IP)
(service IP)
(persistent IP)
(persistent IP)
9.19.1.21
9.19.1.11
192.168.100.1
192.168.100.2
( bse ress)Topsvcs hertbeting( bse ress)
seril_net1
en0
en0
HACMP Noe 1
HACMP Noe
FRAME 1
2
FRAME 2
Hypervisor
ent1ent0 ent2 ent
(phy)
(phy)
(virt)
(virt)
FRAME X
ent0
(virt)
Control
Chnnel
ent3
(LA)
ent4
(SEA)
en0
ent ent2 ent1ent0
(virt)
(virt)
(phy)
(phy)
Control
Chnnel
ent4
(SEA)
ent3
(LA)
Virtul I/O Server (VIOS1)
AIX ClientVirtul
LPAR
I/O Server (VIOS2)
© 2009 IBM Corporation
Virtual Ethernet – Additional considerations
•
단일 Adapter Network에 대해서는 항상 netmon.cf file을 구성
: /usr/es/sbin/cluster/netmon.cf
Typical File:
9.12.4.11
9.12.4.13
In virtualized environments:
9.12.4.11
!REQD en2 100.12.7.9
9.12.4.13
!REQD en2 100.12.7.10-
Most adapters will use netmon in the traditional manner,
pinging 9.12.4.11 and 9.12.4.13 along with other local
adapters or known remote adapters, and will only care about
the interface's inbound byte count for results.
interface en2 will only be considered up if it can ping either
100.12.7.9 or 100.12.7.10
Note:
There are additional !REQD formats that may be used within the netmon.cf file
outlined in the description of APAR IZ01332
© 2009 IBM Corporation
목차
Contents
1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Do’s
•
가능하면 IPAT via Aliasing와 enhanced concurrent VG를 사용
•
모든 공유 LVM 요소들은 unique한 이름으로 생성
•
주기적인 cluster snapshot과 system backup을 받을 것
•
철저한 계획과 충분한 Test 할 것(ex: application script)
•
가용성 증대와 자가 치유(복구)를 지원하기 위해 application monitoring을 구성할
것
•
변경에 대한 충분한 test를 위해 test 환경을 구축할 것
•
최소 한 개 이상의 non-IP network을 포함하여 신뢰할만한 heartbeat망을 구축할
것
•
cluster에서 문제가 발생 할 경우 SNMP를 통한 alert나 SMS, email 전송 시스템 구
성
•
가용한 HACMP features를 활용할 것 : application monitoring, extended cluster
verification methods, ‘automated’ cluster testing (in TEST only), file collections,
© 2009 IBM Corporation
fast disk takeover, fast failure detection.
Don’ts
•
Node 한쪽만 변경하고 다른 node는 sync하지 않는 것.
- 항상 모든 변경이 이뤄진 즉시 sync할 것.
- 만약, 한 node는 up이고 다른 node들은 down이라면 변경이 적용되고 sync되는 것은 살아
있는(active)node에서 실행할 것
•
HACMP menu외에서 변경작업 시도. CSPOC 사용할 것
•
지나치게 복잡한 HACMP 구성, test하기 힘든 구성
•
기본 application start, stop script 구현
- Pre-requisite 확인 절차와 error 복구 routine을 포함하지 않음.
- 항상 script들이 표준 입, 출력 관련 상세 log를 쌓도록 작성할 것
•
failover시간을 늘어지게 하는 filesystem 구성
- 상호 의존관계(dependency)나 처리를 위한 별도의 절차와 대기 시간(wait)을 생성
© 2009 IBM Corporation
Don’ts
•
훈련되지 않고 cluster를 잘 모르는 관리자(admin.)에게 root 권한 제공
•
신중하고 충분한 고려 없이 network에 대한 failure detection rate 변경
•
Application stop시 “# kill `ps –ef | grep appname | awk ‘{print $2}’`” 이 내용은
HACMP application monitor도 kill 시킬 수 있음.
•
DB가 raw lv를 사용하는데 standard AIX VGs(Original VG)로 구성하는 것
- Big 또는 scalable VG로 구성할 것
- lvcb의 copy본이 VGDA에도 생성되며, 시스템이 halt될 경우 DB 가 AIX 에서 reserve 한 첫
4K block 에 write 가능. 그렇지 않을 경우(Original VG사용) halt시 db가 깨지는 현상 발생
•
수작업 형태의 절차를 이용하여 Application의 고 가용성을 관리하는 것
© 2009 IBM Corporation
목차
Contents
1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Parameter Tuning - Network
•
•
Failure Detection Parameters
The time needed to detect a failure : (heartbeat rate) X (cycles to fail) X 2
- Cycles to fail (cycle) : The number of heartbeats missed before detecting a failure
- Heartbeat rate (hbrate) : The number of seconds between heartbeats.
•
•
For IP Networks
IP Network Setting
Setting
Seconds between
between
Heartbeats
Failure Cycle
Cycle
Failure
Detection Rate
Rate
Slow
2
12
48
Normal
1
10
20
Fast
1
5
10
For Non-IP Networks
IP Network Setting
Setting
Seconds between
between
Heartbeats
Failure Cycle
Cycle
Failure
Detection Rate
Rate
Slow
3
8
48
Normal
2
5
20
Fast
1
5
10
© 2009 IBM Corporation
Parameter Tuning – Network(Cont.)
•
•
The default setting for the failure detection rate is usually optimal
Be careful to change setting, fast or too low custom values can cause false
takeover to occur
ex)
© 2009 IBM Corporation
Parameter Tuning – DMS checklist
•
False takeover를 방지하기 위한 설정으로 가급적 아래의 내용대로 수정을 권고합
니다
가이드라인
확인방법 / 가이드
권고 내용
P4
IO pacing 확인
인
syncd 주기 증가
Failure Detection Rate
1.#lsattr -El sys0
check the maxpout & minpout
Check:
1.#ps -ef|grep syncd
2.#pg /sbin/rc.boot
AIX Default value is 60
10 is recommended for most clusters
1. If your network is always busy, SLOW is recommended t
busy, SLOW is recommended to prevent DMS.
High water mark: 33
33
Low water mark: 24
24
P5 이상(AIX 5.3 or 6.1)
6.1)
High water mark: 8193
8193
Low water mark: 4096
4096
syncd frequency: 10
10
FDR for ether networks is set to
networks is set to SLOW
© 2009 IBM Corporation
SLOW
변경 전 확인 절차(Check list)
•
•
•
•
•
•
•
•
•
•
•
필요한 변경인가?
얼마나 긴급한 변경인가?
얼마나 중요한 변경인가?
변경이 cluster관점에서 미치는 영향은?
변경이 허용되지 않을 경우 이것이 미칠 영향?
변경에 필요한 모든 절차가 명료하게 이해되고 문서화 되었나?
변경에 대한 Test는 어떤 식으로 이뤄졌나?
필요할 경우 변경한 내용을 되돌릴 수 있는 계획은?
변경은 언제로 일정이 잡혔나?
사용자들에게 통보되었나?
유지정비를 위해 계획된 시간에 변경 전 Full backup을 위한 시간과 변경이 실패
할 경우 복구를 위한 충분한 시간이 포함되었나?
© 2009 IBM Corporation
Test your cluster before going live! (Checklist)
•
Careful testing of your production cluster before going live reduces the risk of
problems later.
Test Item
How to test
Checked
Node Fallover
Network Adapter Swap
Adapter Swap
IP Network Failure
Failure
Storage Adapter Failure
Adapter Failure
Failure
Disk Failure
Clstrmgr Killed
Serial Network Failure
Failure
SCSI Adapter for rootvg
for rootvg Failure
Failure
© 2009 IBM Corporation
Lifecycle of HACMP
•
Support Life Cycle for HACMP (typically 3 year lifecycle)
Version
Release Date
End of Support Date
HACMP 5.1
July 11, 2003
Sep 1, 2006
HACMP 5.2
July 13, 2004
Sep 30, 2007
HACMP 5.3
Aug 12, 2005
Sept 30, 2009
HACMP 5.4
Nov 09, 2007
N/A
PowerHA 5.5
5.5
Nov 14, 2008
N/A
© 2009 IBM Corporation
HACMP Version Compatibility Matrix
AIX 4.3.3
4.3.3
.3
AIX 5.1
5.1
AIX 5.1(62bit)
5.1(62bit)
bit)
AIX 5.2
5.2
AIX 5.3
5.3
AIX 6.1
6.1
HACMP 4.5
4.5
No
Yes
No
Yes
No
No
HACMP/ES 4.5
4.5
No
Yes
Yes
Yes
No
No
HACMP/ES 5.1
5.1
No
Yes
Yes
Yes
Yes
No
HACMP/ES 5.2
5.2
No
Yes
Yes
Yes
Yes
No
HACMP/ES 5.3
5.3
No
No
No
Yes
Yes
Yes
HACMP/ES 5.4.0
5.4.0
No
No
No
TL8+
TL4+
No
HACMP/ES 5.4.1
5.4.1
No
No
No
TL8+
TL4+
Yes
© 2009 IBM Corporation