SanBoot MPIO SDD SDDPCM Presentation

Download Report

Transcript SanBoot MPIO SDD SDDPCM Presentation

Path Management and SAN Boot with
MPIO on AIX
John Hock [email protected]
Dan Braden [email protected]
Power Systems Advanced Technical Skills
Materials may not be reproduced in whole or in part without the prior written permission of IBM.
5.3
IBM Power Systems Technical Symposium 2011
Agenda
 Correctly Configuring Your Disks
► Filesets
for disks and multipath code
 Multi-path basics
 Multi Path I/O (MPIO)
► Useful
► Path
MPIO Commands
priorities
► Failed
Path Recovery and path health
checking
► MPIO
path management
 SDD and SDDPCM
 Multi-path code choices for DS4000, DS5000
and DS3950
 XIV & Nseries
 SAN Boot
© 2011 IBM Corporation
2
IBM Power Systems Technical Symposium 2011
Disk configuration
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates
ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
 The disk vendor…
 Dictates what multi-path code can be used
 Supplies the filesets for the disks and multipath code
 Supports the components that they supply
 A fileset is loaded to update the ODM to support the storage
 AIX then recognizes and appropriately configures the disk
 Without this, disks are configured using a generic ODM definition
 Performance and error handling may suffer as a result
 # lsdev –Pc disk displays supported storage
 The multi-path code will be a different fileset
 Unless using the MPIO that’s included with AIX
Beware of generic “Other” disk definition
No command queuing.
Poor Performance & Error Handling
© 2011 IBM Corporation
3
IBM Power Systems Technical Symposium 2011
How many paths for a LUN?
Server
FC Switch
Storage
© 2011 IBM Corporation
• Paths = (# of paths from server to switch) x
(# paths from storage to switch)
…Here there are potentially 6 paths per LUN
…But reduced via:
• LUN masking at the storage
Assign LUNs to specific FC adapters at the host,
and thru specific ports on the storage
• Zoning
WWPN or SAN switch port zoning
• Dual SAN fabrics
divides potential paths by two
• 4 paths per LUN are sufficient for availability
and reduces CPU overhead for choosing the path
• Path selection overhead is relatively low—usually negligible
• MPIO has no practical limits to number of paths
• Other products have path limits
• SDDPCM limited to 32 paths per LUN
4
IBM Power Systems Technical Symposium 2011
How many paths for a LUN?, cont’d
Dual SAN Fabric Reduces Potential Paths
Server
FC Switch
Storage
4 X 4 = 16
© 2011 IBM Corporation
2X2+2X2=8
5
IBM Power Systems Technical Symposium 2011
Path selection benefits and costs
 Path selection algorithms choose a path to hopefully minimize latency added to
an IO to send it over the SAN to the storage
 Latency to send a 4 KB IO over a 8 Gbps SAN link is
4 KB / (8 Gb/s x 0.1 B/b x1048576 KB/GB) = 0.0048 ms
 Multiple links may be involved, and IOs are round trip
 As compared to fastest IO service times around 1 ms
 If the links aren’t busy, there likely won’t be much, if any, savings from
use of sophisticated path selection algorithims vs. round robin
Generally utilization
of links is low
 Costs of path selection algorithms
 CPU cycles to choose the best path
 Memory to keep track of in-flight IOs down each path, or
 Memory to keep track of IO service times down each path
 Latency added to the IO to choose the best path
© 2011 IBM Corporation
6
IBM Power Systems Technical Symposium 2011
Multi-path IO with VIO and VSCSI LUNs
VIO Client
MPIO
 Two layers of multi-path code: VIOC and VIOS
 VSCSI disks always use AIX default MPIO and
all IO for a LUN normally goes to one VIOS
► algorithm
VIO Server
Multi-path code
VIO Server
Multi-path code
= fail_over only
 VIOS uses the multi-path code specified for the disk
subsystem
 Set the path priorities for the VSCSI hdisks so half
Disk
Subsystem
© 2011 IBM Corporation
use one VIOS, and half use the other
7
IBM Power Systems Technical Symposium 2011
Multi-path IO with VIO and NPIV
 VIOC has virtual FC adapters (vFC)
► Potentially
VIO Client
Multi-path code
VFC
VFC
VIO Server
VFC
VFC
VIO Server
one vFC adapter for every real FC adapter
in each VIOC
► Maximum
of 64 vFC adapters per real FC adapter
recommended
 VIOC uses multi-path code that the disk subsystem
supports
 IOs for a LUN can go thru both VIOSs
Disk
Subsystem
© 2011 IBM Corporation
 One layer of multi-path code
8
IBM Power Systems Technical Symposium 2011
What is MPIO?
 MPIO is an architecture designed by AIX development (released in AIX V5.2)
 MPIO is also a commonly used acronym for Multi-Path IO
►
In this presentation MPIO refers explicitly to the architecture, not the acronym
 Why was the MPIO architecture developed?
►
With the advent of SANs, each disk subsystem vendor wrote their own multi-path code
►
These multi-path code sets were usually incompatible
►
● Mixing disk subsystems was usually not supported on the same system, and if they
were, they usually required their own FC adapters
Integration with AIX IO error handling and recovery
● Several levels of IO timeouts: basic IO timeout, FC path timeout, etc
 MPIO architecture details available to disk subsystem vendors
►
Compliant code requires a Path Control Module (PCM) for each disk subsystem
►
Default PCMs for SCSI and FC exist on AIX and often used by the vendors
►
Capabilities exist for different path selection algorithms
►
Disk vendors have been moving towards MPIO compliant code
MPIO Common Interface
© 2011 IBM Corporation
9
IBM Power Systems Technical Symposium 2011
Overview of MPIO Architecture
 LUNs show up as an hdisk
►Architected
►No
for 32 K paths
more than 16 paths are necessary
 PCM: Path Control Module
►Default
PCMs exist for FC, SCSI
►Vendors
►May
may write optional PCMs
provide commands to manage paths
 Allows various algorithms to balance use
of paths
 Full support for multiple paths to rootvg
Tip: to keep paths <= 16, group
sets of 4 host ports and 4 storage ports
and balance LUNs across them
 Hdisks can be Available, Defined or non-existent
 Paths can also be Available, Defined, Missing or non-existent
 Path status can be enabled, disabled or failed if the path is Available
(use chpath command to change status)
 Add path: e.g. after installing new adapter and cable to the disk
run cfgmgr (or cfgmgr –l <adapter>)
 One must get the device layer correct, before working with the path status layer
© 2011 IBM Corporation
10
IBM Power Systems Technical Symposium 2011
MPIO support
Storage Subsystem Family
MPIO code
IBM ESS, DS6000, DS8000,
DS3950, DS4000, DS5000,
SVC, V7000
IBM Subsystem Device
Driver Path Control
Module (SDDPCM)
DS3/4/5000 in VIOS
Default FC PCM
recommended
fail_over, round_robin
IBM XIV Storage System
Default FC PCM
fail_over, round_robin
IBM System Storage N Series
Default FC PCM
fail_over, round_robin
EMC Symmetrix
Default FC PCM
fail_over, round_robin
Hitachi Dynamic Link
Manager (HDLM)
fail_over, round robin,
extended round robin
Default FC PCM
fail_over, round_robin
SCSI
Default SCSI PCM
fail_over, round_robin
VIO VSCSI
Default SCSI PCM
fail_over
HP & HDS
(varies by model)
© 2011 IBM Corporation
11
Multi-path algorithm
fail_over, round_robin, load
balance, load balance port
IBM Power Systems Technical Symposium 2011
Non-MPIO multi-path code
Storage subsystem family
Multi-path code
IBM DS4000
Redundant Disk Array Controller (RDAC)
EMC
Power Path
HP
AutoPath
HDS
HDLM (older versions)
Vertias supported storage
Dynamic MultiPathing (DMP)
© 2011 IBM Corporation
12
IBM Power Systems Technical Symposium 2011
Mixing multi-path code sets
 The disk subsystem vendor specifies what multi-path code is supported for their storage
►The
disk subsystem vendor supports their storage, the server vendor generally doesn’t
 You can mix multi-path code compliant with MPIO and even share adapters
►There
may be exceptions. Contact vendor for latest updates.
HP example: “Connection to a common server with different HBAs requires separate HBA
zones for XP, VA, and EVA”
 Generally one non-MPIO compliant code set can exist with other MPIO compliant code sets
►Except
►The
that SDD and RDAC can be mixed on the same LPAR
non-MPIO compliant code must be using its own adapters
 Devices of a given type use only one multi-path code set
►e.g.,
you can’t used SDDPCM for one DS8000 and SDD for another DS8000 on the same
AIX instance
© 2011 IBM Corporation
13
IBM Power Systems Technical Symposium 2011
Sharing Fibre Channel Adapter ports
 Disk using MPIO compliant code sets can share adapter ports
 It’s recommended that disk and tape use separate ports
Disk (typicaly small block random) and
tape (typically large block sequential) IO
are different, and stability issues have
been seen at high IO rates
© 2011 IBM Corporation
14
IBM Power Systems Technical Symposium 2011
MPIO Command Set
 lspath – list paths, path status and path attributes for a disk
 chpath – change path status or path attributes
► Enable
or disable paths
 rmpath – delete or change path state
► Putting
a path into the defined mode means it won’t be used (from available to
defined)
► One
cannot define/delete the last path of a device
 mkpath – add another path to a device or makes a defined path available
► Generally
cfgmgr is used to add new paths
 chdev – change a device’s attributes (not specific to MPIO)
 cfgmgr – add new paths to an hdisk or make defined paths available
(not specific to MPIO)
© 2011 IBM Corporation
15
IBM Power Systems Technical Symposium 2011
Useful MPIO Commands
 List status of the paths and the parent device (or adapter)
# lspath -Hl <hdisk#>
 List connection information for a path
# lspath -l hdisk2 -F"status parent connection path_status path_id“
Enabled fscsi0 203900a0b8478dda,f000000000000 Available 0
Enabled fscsi0 201800a0b8478dda,f000000000000 Available 1
Enabled fscsi1 201900a0b8478dda,f000000000000 Available 2
Enabled fscsi1 203800a0b8478dda,f000000000000 Available 3
 The connection field contains the storage port WWPN
In the case above, paths go to two storage ports and WWPNs:
203900a0b8478dda
201800a0b8478dda
 List a specific path's attributes
# lspath -AEl hdisk2 -p fscsi0 –w “203900a0b8478dda,f00000000000“
scsi_id
0x30400
SCSI ID
False
node_name 0x200800a0b8478dda FC Node Name False
priority 1
Priority
True
►
© 2011 IBM Corporation
16
IBM Power Systems Technical Symposium 2011
Path priorities
 A Priority Attribute for paths can be used to specify a preference for path
IOs. How it works depends whether the hdisk’s algorithm attribute is set to
fail_over or round_robin.
Value specified is inverse to priority, i.e. “1” is high priority
 algorithm=fail_over
►the
path with the higher priority value handles all the IOs unless there's a path failure.
►the
other path(s) will only be used when there is a path failure.
►Set
the primary path to be used by setting it's priority value to 1, and the next path's
priority (in case of path failure) to 2, and so on.
►if
the path priorities are the same and algorithm=fail_over, the primary path will be the
first listed for the hdisk in the CuPath ODM as shown by # odmget CuPath
 algorithm=round_robin
►If
the priority attributes are the same, then IOs go down each path equally.
►In
the case of two paths, if you set path A’s priority to 1 and path B’s to 255, then for
every IO going down path A, there will be 255 IOs sent down path B.
 To change the path priority of an MPIO device on a VIO client:
# chpath -l hdisk0 -p vscsi1 -a priority=2
►Set
© 2011 IBM Corporation
path priorities for VSCSI disks to balance use of VIOSs
17
IBM Power Systems Technical Symposium 2011
Path priorities
# lsattr -El hdisk9
PCM
PCM/friend/otherapdisk
algorithm
fail_over
hcheck_interval 60
hcheck_mode
nonactive
lun_id
0x5000000000000
node_name
0x20060080e517b6ba
queue_depth
10
reserve_policy single_path
ww_name
0x20160080e517b6ba
…
Path Control Module
Algorithm
Health Check Interval
Health Check Mode
Logical Unit Number ID
FC Node Name
Queue DEPTH
Reserve Policy
FC World Wide Name
False
True
True
True
False
False
True
True
False
# lspath -l hdisk9 -F"parent connection status path_status"
fscsi1 20160080e517b6ba,5000000000000 Enabled Available
fscsi1 20170080e517b6ba,5000000000000 Enabled Available
# lspath -AEl hdisk9 -p fscsi1 -w"20160080e517b6ba,5000000000000"
scsi_id
0x10a00
SCSI ID
False
node_name 0x20060080e517b6ba FC Node Name False
priority 1
Priority
True
Note: whether or not path priorities apply depends on the PCM.
With SDDPCM, path priorities only apply when the algorithm used is fail over (fo).
Otherwise, they aren’t used.
© 2011 IBM Corporation
18
IBM Power Systems Technical Symposium 2011
Path priorities – why change them?
 With VIOCs, send the IOs for half the LUNs to one VIOS and half to the other
►Set
priorities for half the LUNs to use VIOSa/vscsi0 and half to use
VIOSb/vscsi1
►Uses
both VIOSs CPU and virtual adapters
►algorithm=fail_over
is the only option at the VIOC for VSCSI disks
 With NSeries – have the IOs go the primary controller for the LUN
►Set
© 2011 IBM Corporation
via the dotpaths utility that comes with Nseries filesets
19
IBM Power Systems Technical Symposium 2011
Path Health Checking and Recovery
 Validate a path is working
 Automate recovery of path
 For SDDPCM and MPIO compliant disks, two hdisk attributes apply:
# lsattr -El hdisk26
hcheck_interval
hcheck_mode
0
nonactive
Health Check Interval
Health Check Mode
True
True
 hcheck_interval
Defines how often the health check is performed on the paths for a device. The attribute supports a range
from 0 to 3600 seconds. When a value of 0 is selected (the default), health checking is disabled
► Preferably set to at least 2X IO timeout value
►
 hcheck_mode
►
Determines which paths should be checked when the health check capability is used:
● enabled: Sends the healthcheck command down paths with a state of enabled
● failed: Sends the healthcheck command down paths with a state of failed
● nonactive: (Default) Sends the healthcheck command down paths that have no active I/O, including
paths with a state of failed. If the algorithm selected is failover, then the healthcheck command is
also sent on each of the paths that have a state of enabled but have no active IO. If the algorithm
selected is round_robin, then the healthcheck command is only sent on paths with a state of failed,
because the round_robin algorithm keeps all enabled paths active with IO.
 Consider setting up error notification for path failures (later slide)
© 2011 IBM Corporation
20
IBM Power Systems Technical Symposium 2011
Path Recovery
 MPIO will recover failed paths if path health checking is enabled with hcheck_mode=nonactive or failed
and the device has been opened
 Trade-offs exist:
►
Lots of path health checking can create a lot of SAN traffic
►
Automatic recovery requires turning on path health checking for each LUN
►
Lots of time between health checks means paths will take longer to recover after repair
►
Health checking for a single LUN is often sufficient to monitor all the physical paths, but not to recover
them
 SDD and SDDPCM also recover failed paths automatically
 In addition, SDDPCM provides a health check daemon to provide an automated method of reclaiming failed
paths to a closed device.
 To manually enable a failed path after repair or re-enable a disabled path:
# chpath -l hdisk1 -p <parent> –w <connection> -s enable
 To disable all paths using a specific FC port on the host:
# chpath –l hdisk1 –p <parent> -s disable
© 2011 IBM Corporation
21
IBM Power Systems Technical Symposium 2011
Path Health Checking and Recovery – Notification!
 One should also set up error notification for path failure, so that someone knows
about it and can correct it before something else fails.
 This is accomplished by determining the error that shows up in the error log when a
path fails (via testing), and then
 Adding an entry to the errnotify ODM class for that error which calls a script (that you
write) that notifies someone that a path has failed.
Hint: You can use # odmget errnotify to see what the entries (or stanzas) look like,
then you create a stanza and use the odmadd command to add it to the errnotify
class.
© 2011 IBM Corporation
22
IBM Power Systems Technical Symposium 2011
Path management with MPIO
 Includes examining, adding, removing, enabling and disabling paths
►
Adapter failure/replacement or addition
►
VIOS upgrades (VIOS or multi-path code)
►
Cable failure and replacement
►
Storage controller/port failure and repair
 Adapter replacement
►
Paths will not be in use if the adapter has failed, paths will be in the failed state
1.
Remove paths with # rmpath –l <hdisk> -p <parent> -w <connection> [-d]
-d will remove the path, without it the path will changed to Defined
2.
Remove the adapter with # rmdev –Rdl <fcs#>
3.
Replace the adapter
4.
cfgmgr
5.
Check the paths with lspath
 It’s better to stop using a path before you know the path will disappear
►
Avoid timeouts, application delays or performance impacts and potential error
recovery bugs
© 2011 IBM Corporation
23
IBM Power Systems Technical Symposium 2011
Active/Active vs. Active/Passive Disk Subsystem Controllers
 IOs for a LUN can be sent to any storage port with Active/Active controllers
 LUNs are balanced across controllers for Active/Passive disk subsystems
►
So a controller is active for some LUNs, but passive for the others
 IOs for a LUN are only sent to the Active controller’s port for disk subsystems with Active/Passive
controllers
►
ESS, DS6000, DS8000, and XIV have active/active controllers
DS4000, DS5000, DS3950, Nseries, V7000 have active/passive controllers
● The NSeries passive controller can accept IOs but IO latency is affected
► The passive controller takes over in the event the active controller or all paths to it fail
►
 MPIO recognizes Active/Passive disk subsystems and sends IOs only to the primary controller
►
Except under failure conditions, then the active/passive role switches for the affected LUNs
 Terminology regaring active/active and active/passive varies considerably
© 2011 IBM Corporation
24
IBM Power Systems Technical Symposium 2011
Example: Active/Passive Paths
© 2011 IBM Corporation
25
IBM Power Systems Technical Symposium 2011
SDD: An Overview
 SDD = Subsystem Device Driver – Pre-MPIO Architecture
 Used with IBM ESS, DS6000, DS8000 and the SAN Volume Controller, but
is not MPIO compliant
►A “host
attachment” fileset (provides subsystem-specific support code &
populates the ODM) and SDD fileset are both installed
►Host
attachment: ibm2105.rte
►SDD:
devices.sdd.<sdd_version>.rte
 LUNs show up as vpaths, with an hdisk device for each path
►32
paths maximum per LUN, but less are recommended with more than 600 LUNs
 One installs SDDPCM or SDD, not both.
 No support for rootvg, dump or paging devices
►
One can exclude disks from SDD control using the excludesddcfg command
►
Mirror rootvg across two separate LUNs on different adapters for availability
© 2011 IBM Corporation
26
IBM Power Systems Technical Symposium 2011
SDD
 Load balancing algorithms
►fo:
failover
►rr:
round robin
►lb:
load balancing (aka. df or the default) and chooses adapter with fewest in-flight IOs
►lbs:
load balancing sequential – optimized for sequential IO
►rrs:
round robin sequential – optimized for sequential IO
 The datapath command is used to examine vpaths, adapters, paths, vpath statistics,
path statistics, adapter statistics, dynamically change the load balancing algorithm,
and other administrative tasks such as adapter replacement, disabling paths, etc.
 mkvg4vp is used instead of mkvg, and extendvg4vp is used instead of extendvg
 SDD automatically recovers failed paths that have been repaired via the sddsrv
daemon
© 2011 IBM Corporation
27
IBM Power Systems Technical Symposium 2011
Does Load Balancing Improve Performance?
 Load balancing tried to reduce latency by picking a less active path
► …but
adds latency to choose the best path
 These latencies are typically < 1% of typical IO service times
 Load balancing is more likely to be of benefit in SANs with heavy utilizations or
with intermittent errors that slow IOs on some path
 A round_robin algorithm is usually equivalent
Conclusion:
Load balancing is unlikely to improve performance--especially when
compared to other strategies like algorithm=round_robin or approaches
that balance IO with algorithm=fail_over
© 2011 IBM Corporation
28
IBM Power Systems Technical Symposium 2011
Balancing IOs with algorithm=fail_over
 A fail_over algorithm can be efficiently used to balance IOs!
► Any
load_balancing algorithm must consume CPU and memory resources to determine
the best path to use.
► It's
possible to setup fail_over LUNs so that the loads are balanced across the available
FC adapters.
► Let's
use an example with 2 FC adapters. Assume we correctly lay out our data so that
the IOs are balanced across the LUNs (this is usually a best practice). Then if we
assign half the LUNs to FC adapterA and half to FC adapterB, then the IOs are evenly
balanced across the adapters!
►A
question to ask is, “If one adapter is handling more IO than another, will this have a
significant impact on IO latency?”
► Since
the FC adapters are capable of handling more than 35,000 IOPS then we're
unlikely to bottleneck at the adapter and add significant latency to the IO.
© 2011 IBM Corporation
29
IBM Power Systems Technical Symposium 2011
SDDPCM: An Overview
 SDDPCM = Subsystem Device Driver Path Control Module
 SDDPCM is MPIO compliant and can be used with IBM ESS, DS6000, DS8000,
DS4000 (most models), DS5000, DS3950, V7000 and the SVC
►
A “host attachment” fileset (populates the ODM) and SDDPCM fileset are both installed
►
Host attachment: devices.fcp.disk.ibm.mpio.rte
►
SDDPCM: devices.sddpcm.<version>.rte
 LUNs show up as hdisks, paths shown with pcmpath or lspath commands
►
16 paths per LUN supported
 Provides a PCM per the MPIO architecture
 One installs SDDPCM or SDD, not both.
SDDPCM is recommended and strategic
© 2011 IBM Corporation
30
IBM Power Systems Technical Symposium 2011
Comparing AIX Default MPIO PCMs & SDDPCM
Feature/Function
How obtained
MPIO PCMs
SDDPCM
Provided as an integrated part of the base VIOS Provided by most IBM storage products for
POWERVM firmware and AIX operating system subsequent installation on the various server
product distribution
OS’s that the device supports
Supports most disk devices that the AIX
operating system and VIOS POWERVM
firmware support, including selected third-party
devices
Supports specific IBM devices and is referenced
by the particular device support statement. The
supported devices differ between AIX and
POWERVM VIOS
OS Integration
Considerations
Update levels are provided and are updated and
migrated as a mainline part of all the normal AIX
and VIOS service strategy and
upgrade/migration paths
Add-on software entity that has its own update
strategy and process for obtaining fixes. The
customer must manage coexistence levels
between both the mix of devices, operating
system levels and VIOS levels. NOT a licensed
program product.
Path Selection
Algorithms
Fail over (default)
Round Robin (excluding VSCSI disks)
Fail over
Round Robin
Load Balancing (default)
Load Balancing Port
Suported Devices
access must be stopped in order to change Dynamic
Algorithm Selection Disk
algorithm
SAN boot, dump,
paging support
Yes
Yes. Restart required if SDDPCM installed after
MPIOPCM and SDDPCM boot desired.
PowerHA & GPFS
Support
Yes
Yes
standard AIX performance monitoring tools
such as iostat and fcstat
Enhanced utilities (pcmpath commands) to
show mappings from adapters, paths, devices,
as well as performance and error statistics
Utilities
© 2011 IBM Corporation
31
IBM Power Systems Technical Symposium 2011
SDDPCM
 Load balancing algorithms
► rr
- round robin
► lb
- load balancing based on in-flight IOs per adapter
► fo
- failover policy
► lbp
- load balancing port (for ESS, DS6000, DS8000, V7000 and SVC only) based
on in-flight IOs per adapter and per storage port
 The pcmpath command is used to examine hdisks, adapters, paths, hdisk statistics, path
statistics, adapter statistics, dynamically change the load balancing algorithm, and other
administrative tasks such as adapter replacement, disabling paths
 SDDPCM automatically recovers failed paths that have been repaired via the pcmserv
daemon
► MPIO
health checking can also be used, and can be dynamically set via the pcmpath
command. This is recommended. Set the hc_interval to a non-zero value for an
appropriate number of LUNs to check the physical paths.
© 2011 IBM Corporation
32
IBM Power Systems Technical Symposium 2011
Path management with SDDPCM and the pcmpath command
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
# pcmpath
And more
query adapter
List adapters and status
query device
List hdisks and paths
query port
List DS8000/DS6000/SVC… ports
query devstats
List hdisk/path IO statistics
query adaptstats
List adapter IO statistics
query portstats
List DS8000/DS6000/SVC port statistics
query essmap
List rank, LUN ID and more for each hdisk
set adapter …
Disable/enable paths to adapter
set device path …
Disable/enable paths to a hdisk
set device algorithm Dynamically change path algorithm
set device hc_interval Dynamically change health check interval
disable/enable ports … Disable/enable paths to a disk port
query wwpn
Display all FC adapter WWPNs
 SDD offers the similar datapath command
© 2011 IBM Corporation
33
IBM Power Systems Technical Symposium 2011
Path management with SDDPCM and the pcmpath command
# pcmpath query device
…
DEV#:
2 DEVICE NAME: hdisk2 TYPE: 2145 ALGORITHM: Load Balance
SERIAL: 600507680190013250000000000000F4
==========================================================================
Path#
Adapter/Path Name
State
Mode
Select
Errors
0
fscsi0/path0
OPEN
NORMAL
40928736
0
1*
fscsi0/path1
OPEN
NORMAL
16
0
2
fscsi2/path4
OPEN
NORMAL
43927751
0
3*
fscsi2/path5
OPEN
NORMAL
15
0
4
fscsi1/path2
OPEN
NORMAL
44357912
0
5*
fscsi1/path3
OPEN
NORMAL
14
0
6
fscsi3/path6
OPEN
NORMAL
43050237
0
7*
fscsi3/path7
OPEN
NORMAL
14
0
…
•
•
•
•
* Indicates path to passive controller
2145 is a SVC which has active/passive nodes for a LUN
DS4000, DS5000, V7000 and DS3950 also have active/passive controllers
IOs will be balanced across paths to the active controller
© 2011 IBM Corporation
34
IBM Power Systems Technical Symposium 2011
Path management with SDDPCM and the pcmpath command
# pcmpath query devstats
Total Dual Active and Active/Asymmetrc Devices : 67
DEV#:
2 DEVICE NAME: hdisk2
===============================
Total Read Total Write
I/O:
169415657
2849038
SECTOR:
2446703617
318507176
Transfer Size:
<= 512
183162
Active Read
0
0
Active Write
0
0
Maximum
<= 16K
35609487
<= 64K
46379563
> 64K
22703724
<= 4k
67388759
20
5888
…
• Maximum value useful for tuning hdisk queue depths
• “20” is maximum inflight requests for the IOs shown
• Increase queue depth until queue is not filling up or
until IO services times suffer (bottleneck is pushed to the subsystem)
•
•
writes > 3ms
reads > 15-20ms
• See References for queue depth tuning whitepaper
© 2011 IBM Corporation
35
IBM Power Systems Technical Symposium 2011
SDD & SDDPCM: Getting Disks configured correctly
 Install the appropriate filesets
SDD or SDDPCM for the required disks (and host attachment fileset)
If you are using SDDPCM, install the MPIO fileset as well which comes with AIX
● devices.common.IBM.mpio.rte
► Host attachment scripts
● http://www.ibm.com/support/dlsearch.wss?rs=540&q=host+scripts&tc=ST52G7&dc=D410
 Reboot or start the sddsrv/pcmsrv daemon
►
►
 smitty disk -> List All Supported Disk
Displays disk types for which software support has been installed
 Or # lsdev -Pc disk | grep MPIO
disk mpioosdisk fcp
MPIO Other FC SCSI Disk Drive
disk 1750
fcp
IBM MPIO FC 1750
…DS6000
disk 2105
fcp
IBM MPIO FC 2105
…ESS
disk 2107
fcp
IBM MPIO FC 2107
…DS8000
disk 2145
fcp
MPIO FC 2145
…SVC
disk DS3950
fcp
IBM MPIO DS3950 Array Disk
disk DS4100
fcp
IBM MPIO DS4100 Array Disk
disk DS4200
fcp
IBM MPIO DS4200 Array Disk
disk DS4300
fcp
IBM MPIO DS4300 Array Disk
disk DS4500
fcp
IBM MPIO DS4500 Array Disk
disk DS4700
fcp
IBM MPIO DS4700 Array Disk
disk DS4800
fcp
IBM MPIO DS4800 Array Disk
disk DS5000
fcp
IBM MPIO DS5000 Array Disk
disk DS5020
fcp
IBM MPIO DS5020 Array Disk
►
© 2011 IBM Corporation
36
IBM Power Systems Technical Symposium 2011
www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350#AIXSDDPCM
© 2011 IBM Corporation
37
IBM Power Systems Technical Symposium 2011
© 2011 IBM Corporation
38
IBM Power Systems Technical Symposium 2011
Migration from SDD to SDDPCM

Migration from SDD to SDDPCM is fairly straightforward and doesn't require
a lot of time. The procedure is documented in the manual:
1. Varyoff your SDD VGs
2. Stop the sddsrv daemon via stopsrc -s sddsrv
3. Remove the SDD devices (both vpaths and hdisks) via instructions below
4. Remove the dpo device
5. Uninstall SDD and the host attachment fileset for SDD
6. Install the host attachment fileset for SDDPCM and SDDPCM
7. Configure the new disks (if you rebooted it's done, else run cfgmgr and
startsrc –s pcmserv)
8. Varyon your VGs - you're back in business

To remove the vpaths and hdisks, use:
►
# rmdev -Rdl dpo

No exportvg/importvg is needed because LVM keeps track of PVs via PVID

Effective queue depths change (and changes to queue_depth will be lost):
► SDD effective queue depth = # paths for a LUN x queue_depth
► SDDPCM effective queue depth = queue_depth
© 2011 IBM Corporation
39
IBM Power Systems Technical Symposium 2011
Multi-path code choices for DS4000/DS5000/DS3950
 These disk subsystems might use RDAC, MPIO or SDDPCM
► Choices
depend on model and AIX level
 MPIO is strategic
► SDDPCM
uses MPIO and is recommended
► SDDPCM
not supported on VIOS yet for these disk subsystems so use MPIO
 SAN cabling/zoning is more flexible with MPIO/SDDPCM than with RDAC
► RDAC
requires fcsA be connected to controllerA and fcsB connected to controllerB with
no cross connections
 These disk subsystems have active/passive controllers
► All
IO for a LUN goes to its primary controller
● Unless the paths to it fail, or the controller fails, then the other controller takes over
the LUN
► Storage
administrator assigns half the LUNs to each controller
 The manage_disk_drivers command is used to choose the multi-path code
► Choices
vary among models and AIX levels
 DS3950, DS5020, DS5100, DS5300 use MPIO or SDDPCM
© 2011 IBM Corporation
40
IBM Power Systems Technical Symposium 2011
Multi-path code choices for DS3950, DS4000 and DS5000
# manage_disk_drivers -l
Device
Present Driver
2810XIV
AIX_AAPCM
DS4100
AIX_SDDAPPCM
DS4200
AIX_SDDAPPCM
DS4300
AIX_SDDAPPCM
DS4500
AIX_SDDAPPCM
DS4700
AIX_SDDAPPCM
DS4800
AIX_SDDAPPCM
DS3950
AIX_SDDAPPCM
DS5020
AIX_SDDAPPCM
DS5100/DS5300
AIX_SDDAPPCM
DS3500
AIX_AAPCM
Driver Options
AIX_AAPCM,AIX_non_MPIO
AIX_APPCM,AIX_fcparray
AIX_APPCM,AIX_fcparray
AIX_APPCM,AIX_fcparray
AIX_APPCM,AIX_fcparray
AIX_APPCM,AIX_fcparray
AIX_APPCM,AIX_fcparray
AIX_APPCM
AIX_APPCM
AIX_APPCM
AIX_APPCM
 To set the driver for use:
# manage_disk_drivers -d <device> -o <driver_option>
 AIX_AAPCM - MPIO with active/active controllers
 AIX_APPCM - MPIO with active/passive controllers
 AIX_SDDAPPCM - SDDPCM
 AIX_fcparray - RDAC
© 2011 IBM Corporation
41
IBM Power Systems Technical Symposium 2011
Other MPIO commands for DS3/4/5000
# mpio_get_config –Av
Frame id 0:
Storage Subsystem worldwide name:
608e50017be8800004bbc4c7e
Controller count: 2
Partition count: 1
Partition 0:
Storage Subsystem Name = 'DS-5020'
hdisk
LUN #
Ownership
hdisk4
0
A (preferred)
hdisk5
1
B (preferred)
hdisk6
2
A (preferred)
hdisk7
3
B (preferred)
hdisk8
4
A (preferred)
hdisk9
5
B (preferred)
# sddpcm_get_config –Av
output is the same as above
© 2011 IBM Corporation
42
User Label
Array1_LUN1
Array2_LUN1
Array3_LUN1
Array4_LUN1
Array5_LUN1
Array6_LUN1
IBM Power Systems Technical Symposium 2011
XIV
 Host Attachment Kit for AIX
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000802
 # lsdev -Pc disk | grep xiv
disk 2810xiv fcp N/A
 XIV support has moved from fileset support, to support within AIX
►
Installing the Host Attachment Kit is still recommended
● Provides diagnostic and other commands
 Disks configured as 2810xiv devices
 ODM entries for XIV included with
AIX 5.3 TL 10,
AIX 6.1 TL3,
VIOS 2.1.2.x and
AIX 7
© 2011 IBM Corporation
43
IBM Power Systems Technical Symposium 2011
Nseries/NetApp
 Nseries/NetApp has a preferred storage controller for each LUN
 Not exactly an active/passive disk subsystem, as the non-preferred
controller can accept IO requests
 I/O requests have to be passed to the preferred controller which
impacts latency
 Install the SAN Toolkit
Ontap.mpio_attach_kit.*
►
Provides the dotpaths utility
and sanlun commands
►
dotpaths sets hdisk path priorities
to favor the primary controller
…for every IO going down secondary path, there will be 255 IOs sent down primary path
© 2011 IBM Corporation
44
IBM Power Systems Technical Symposium 2011
Storage Area Network (SAN) Boot
Boot from an SVC
 Storage is zoned
directly to the client
 HBAs used for
boot and/or data
access
 SDDPCM runs in
client (to support
boot)
Boot Directly from SAN
 Storage is zoned
directly to the client
 HBAs used for boot
and/or data access
 Multi-path code for
the storage runs in
client
© 2011 IBM Corporation
SAN Sourced Boot Disks
 Affected LUNs are zoned
to VIOS(s) and assigned to
clients via VIOS definitions
 Multi-path code in the client
will be the MPIO default
PCM for disks seen
through the VIOS.
Boot from SVC
via VIO Server
 Affected LUNs are
zoned to VIOS(s) and
assigned to clients via
VIOS definitions
 Multi-path code in the
client will be the MPIO
default PCM for disks
seen through the
VIOS.
45
IBM Power Systems Technical Symposium 2011
Storage Area Network (SAN) Boot
 Requirements for SAN Booting
► System
with FC boot capability
► Appropriate
microcode (system, FC adapter, disk subsystem and FC switch)
► Disk
subsystem supporting AIX FC boot
Some older systems don’t support FC boot, if in doubt, check the sales manual
 SAN disk configuration
► Create
the SAN LUNs and assign them to the system's FC adapters’ WWPNs prior to
installing the system
► For
non-MPIO configurations, assign one LUN to one WWPN to keep it simple
 AIX installation
► Boot
from installation CD or NIM, this runs the install program
► When
you do the installation you'll get a list of disks that will be on the SAN for the
system
► Choose
► Be
the disks for installing rootvg
aware of disk SCSI reservation policies
● Avoid policies that limit access to a single path or adapter
© 2011 IBM Corporation
46
IBM Power Systems Technical Symposium 2011
How to assure you install to the right SAN disk
 Only assign the rootvg LUN to the host prior to install, assign data LUNs later, or
 Create a LUN for rootvg with a size different than other LUNs, or
 Write down LUN ID and storage WWN, or
 Use disk with an existing PVID
These criteria can be used to select the LUN from the AIX install program (shown
in following screen shots) or via a bosinst_data file for NIM
© 2011 IBM Corporation
47
IBM Power Systems Technical Symposium 2011
Choose via Location Code
1 hdisk2 U8234.EMA.06EF634-V5-C22-T1-W50050768012017C2-L1000000000000
2 hdisk3 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L2000000000000
3 hdisk5 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L3000000000000
Storage WWN
© 2011 IBM Corporation
48
LUN ID
IBM Power Systems Technical Symposium 2011
Choose via Size
© 2011 IBM Corporation
49
IBM Power Systems Technical Symposium 2011
Choose via PVID
© 2011 IBM Corporation
50
IBM Power Systems Technical Symposium 2011
Storage Area Network Booting: Pros & Cons
 The main benefits of SAN rootvg
► Performance
< 2 ms write, 5-10 ms read due to cache. Higher IOPS
► Availability with built in RAID protection
► Ability to easily redeploy disk
► Ability to FlashCopy/MetroMirror the rootvg for backup/DR
► Fewer hardware resources
 SAN rootvg disadvantages
► SAN
problems can cause loss of access to rootvg ~ not an issue as app data is on SAN
anyway
► Potential
loss of system dump and diagnosis if loss of access to SAN is caused by a
kernel bug
► Difficult
to change multi-path IO code
● Not an issue with dual VIOS—can take down one VIOS at a time and change multipath code
 SAN boot thru VIO with NPIV is like SAN boot
© 2011 IBM Corporation
51
IBM Power Systems Technical Symposium 2011
Changing multi-path IO code for rootvg – not so easy
How do you change/update rootvg multi-path code when it’s in use?
 Changing from SDD to SDDPCM (or vice versa) requires contacting
support if booting from SAN, or:
► Move
rootvg to internal SAS disks, e.g., using extendvg, migratepv, reducevg,
bosboot and bootlist, or use alt_disk_install
► Change
► Move
the multi-path code
rootvg back to SAN
► Newer
versions of AIX require a newer version of SDD or SDDPCM
 Follow procedures in the SDD and SDDPCM manual for upgrades of AIX
and/or the multi-path code
 Not an issue when using VIO with dual VIOSs
If one has many LPARs booting from SAN,
one SAS adapter with a SAS disk or two can be used
to migrate SDD to SDDPCM, one LPAR at a time
© 2011 IBM Corporation
52
IBM Power Systems Technical Symposium 2011
Documentation & References

Infocenter “Multiple Path IO”
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc
/baseadmndita/dm_mpio.htm

SDD and SDDPCM Support matrix:
www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350

Downloads and documentation for SDD
www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400006
5&loc=en_US&cs=utf-8&lang=en

Downloads and documentation for SDDPCM:
www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400020
1&loc=en_US&cs=utf-8&lang=en

IBM System Storage Interoperation Center (SSIC)
http://www-03.ibm.com/systems/support/storage/ssic/interoperability.wss

Guide to selecting a multipathing path control module for AIX or VIOS
http://www.ibm.com/developerworks/aix/library/au-multipathing/index.html

AIX disk queue depth tuning techdoc:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105745
© 2011 IBM Corporation
53
IBM Power Systems Technical Symposium 2011
Documentation & References




Hitachi MPIO Support Site
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates
EMC MPIO Support Site
ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
HP Support Site
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=u
s&objectID=c02619876&jumpid=reg_R1002_USEN
HP StorageWorks for IBM AIX
http://h18006.www1.hp.com/storage/aix.html
© 2011 IBM Corporation
54
IBM Power Systems Technical Symposium 2011
Session Evaluations
Session Number – SE39
Session Name – Working with San Boot…
Date - Thursday, April 28, 14:30, Lake Down B
Friday, April 29, 13:00, Lake Hart B
© 2011 IBM Corporation
55