Oracle Clustered File System (OCFS)

Download Report

Transcript Oracle Clustered File System (OCFS)

Automatic Storage
Management
Julian Dyke
Independent Consultant
Web Version - December 2008
1
© 2008 Julian Dyke
juliandyke.com
Objectives
2
1.
Understand how Oracle database files are stored in ASM
2.
Calculate how long ASM rebalance operations will take
© 2008 Julian Dyke
juliandyke.com
Agenda


3
© 2008 Julian Dyke
ASM Instances
ASM Disk Groups
 Metadata
 Extent Distribution
 Rebalancing
 Redundancy
juliandyke.com
ASM
Instances
4
© 2008 Julian Dyke
juliandyke.com
ASM Single Instance Architecture
OCSSD
Daemon
Only
Oracle
Clusterware
ASM
Instance
RDBMS
Instance
Server
Dedicated
Storage
5
© 2008 Julian Dyke
juliandyke.com
ASM Single Instance Background Processes
Oracle 11.1
DBW0
LGWR
VKTM
PSP0
Fixed Area
CKPT
MMAN
RBAL
GMON
X000
Variable Area
SMON
PMON
6
© 2008 Julian Dyke
DIAG
ASM Cache
DIA0
juliandyke.com
ASM RAC Architecture
Public
Network
Private
Network
Oracle
Clusterware
Oracle
Clusterware
Oracle
Clusterware
Oracle
Clusterware
ASM
Instance
ASM
Instance
ASM
Instance
ASM
Instance
RDBMS
Instance
RDBMS
Instance
RDBMS
Instance
RDBMS
Instance
Node 1
Node 2
Node 3
Node 4
Storage
Network
Shared
Storage
7
© 2008 Julian Dyke
juliandyke.com
ASM RAC Architecture
CLUSTERWARE
+ASM1
PROD1
TEST1
Clusterware
ASM
Instances
RDBMS
Instances
CLUSTERWARE
+ASM2
PROD2
TEST2
PROD
TEST
Database Files
8
© 2008 Julian Dyke
juliandyke.com
ASM RAC Instance Background Processes
Oracle 11.1
LMON
LMS0
LMD0
LCK0
DIAG
DBW0
VKTM
PSP0
MMAN
RBAL
LGWR
Fixed Area
CKPT
SMON
X000
Variable Area
PMON
MARK
9
© 2008 Julian Dyke
GMON
DIAG
DIA0
ASM Cache
KATE
juliandyke.com
ASM
Disk
Groups
10
© 2008 Julian Dyke
juliandyke.com
ASM Disk Groups and Disks
Disk
Group 1
11
Disk
Group 2
Disk
Group 3
Disk 1
Disk 2
Disk 5
Disk 4
Disk 6
Disk 7
© 2008 Julian Dyke
Disk 3
juliandyke.com
ASM Disk Groups and Disks
Disk
Group 1
Disk 1
12
Disk 4
© 2008 Julian Dyke
Disk
Group 2
Disk 2
Disk 5
Disk 6
Disk
Group 3
Disk 7
Disk 3
juliandyke.com
ASM Disk Groups, Disks and Database Files
Disk
Group 1
Disk
Group 2
Disk
Group 3
File 2
File 3
File 1
File 5
File 4
File 6
File 5
Disk 1
Disk 4
Disk 2
13
Disk 3
© 2008 Julian Dyke
Disk 5
Disk 6
Disk 7
juliandyke.com
File Extents versus Allocation Units
14

File Extent
 Logical unit of ASM file
 Map to allocation units
 One to many mapping

Allocation Unit
 Physical unit of ASM disk
 Oracle 10.2 and below
 Always 1MB
 Can be increased using _asm_ausize
 Oracle 11.1 and above
 Variable size
 1MB, 2MB ,4MB, 8MB, 16MB, 32MB, 64MB
© 2008 Julian Dyke
juliandyke.com
X$KFFXP



15
Maps file extents to allocation units
Only populated in ASM instance
Columns include
Column Name
Description
GROUP_KFFXP
Disk Group Number
NUMBER_KFFXP
File Number
COMPOUND_KFFXP
Disk Group Number || File Number
INCARN_KFFXP
Incarnation Number
PXN_KFFXP
Physical Extent Number (within file)
XNUM_KFFXP
Logical Extent Number (within file)
LXN_KFFXP
0=primary, 1=first mirror, 2=second mirror
DISK_KFFXP
Disk Number
AU_KFFXP
Allocation Unit Number (within disk)
SIZE_KFFXP
Size (# allocation units)
© 2008 Julian Dyke
juliandyke.com
ASM Metadata
16

Metadata is stored in first 256 files in ASM disk group
 Space is initially allocated when disk group is created
 Can be subsequently extended

Metadata allocation units are divided into blocks
 Each block is 4096 bytes
 Block size specifed using _asm_blksize

Metadata files include
© 2008 Julian Dyke
File#
Description
0
Metadata Header
1
File Directory
2
Disk Directory
3
Active Change Directory
4
Continuing Operations Directory
5
Template Directory
6
Alias Directory
9
Attribute directory (optional)
12
Staleness registry (optional)
juliandyke.com
ASM Metadata
Metadata Header
Disk Header
Partner Status Table
File Directory
Disk Directory
Free Space
Table
Allocation
Table
Active Change
Directory
Continuing Operations
Directory
Template Directory
Alias Directory
17
© 2008 Julian Dyke
Metadata
Header
juliandyke.com
ASM Metadata

Initial Allocation (Single Instance)
File#
AU
0


18
Description
# AUs
0
Disk Header, Free Space Table, Allocation Table
1
1
Partner Status Table
1
1
File Directory
1
2
Disk Directory
1
3
Active Change Directory
4
Continuing Operations Directory
2
5
Template Directory
1
6
Alias Directory
1
42
Active Change Directory
 Records changes to metadata
 Used during recovery of instance or operation failures
Continuing Operations Directory
 Maintains state of active operations
© 2008 Julian Dyke
juliandyke.com
ASM Metadata Block Types
19
Type
Description
Type
Description
1
KFBTYP_DISKHEAD
13
KFBTYP_PST_NONE
2
KFBTYP_FREESPC
14
KFBTYP_HASHNODE
3
KFBTYP_ALLOCTBL
15
KFBTYP_COD_RBO
4
KFBTYP_FILEDIR
16
KFBTYP_COD_DATA
5
KFBTYP_LISTHEAD
17
KFBTYP_PST_META
6
KFBTYP_DISKDIR
18
KFBTYP_PST_DTA
7
KFBTYP_ACDC
19
KFBTYP_HBEAT
8
KFBTYP_CHNGDIR
20
KFBTYP_SR
9
KFBTYP_CODBGO
21
KFBTYP_STALEDIR
10
KFBTYP_TMPLTDIR
22
KFBTYP_VOLUMEDIR
11
KFBTYP_ALIASDIR
23
KFBTYP_ATTRDIR
12
KFBTYP_INDIRECT
© 2008 Julian Dyke
juliandyke.com
KFED Utility

In Oracle 10.2 and above the kfed utility can be used to inspect and edit the
contents of ASM blocks
[oracle@server3 ~]$ $ORACLE_HOME/bin/kfed -h
as/mlib
ASM Library [asmlib='lib']
aun/um
AU number to examine or update [AUNUM=number]
aus/z
Allocation Unit size in bytes [AUSZ=number]
blkn/um
Block number to examine or update [BLKNUM=number]
blks/z
Metadata block size in bytes [BLKSZ=number]
ch/ksum
Update checksum before each write [CHKSUM=YES/NO]
cn/t
Count of AUs to process [CNT=number]
d/ev
ASM device to examine or update [DEV=string]
o/p
KFED operation type
[OP=READ/WRITE/MERGE/NEW/FORM/FIND/STRUCT]
p/rovnm
Name for provisioning purposes [PROVNM=string]
s/eek
AU number to seek to [SEEK=number]
te/xt
File name for translated block text [TEXT=string]
ty/pe
ASM metadata block type number [TYPE=number]

20
This utility should only be used under the guidance of Oracle Support
© 2008 Julian Dyke
juliandyke.com
KFED Utility

For example to dump blocks in aliases directory in DISKGROUP1
Find group number

SELECT group_number FROM v$asm_diskgroup
WHERE name = 'DISKGROUP1';
Alias directory is stored in file number 6

SELECT disk_kffxp, au_kffxp FROM x$kffxp
WHERE group_kffxp = 1
AND number_kffxp = 6
AND lxn_kffxp = 0;

Disk
Allocation Unit
0
49
Find disk name
SELECT path FROM v$asm_disk
WHERE group_number = 1
AND disk_number = 0;
Path
/dev/oracleasm/disks/VOL1
21
© 2008 Julian Dyke
juliandyke.com
KFED Utility

Example (continued)
 Allocation unit is 1MB
 Block size is 4096
 Therefore there are 256 blocks per allocation unit
 Starting block offset = 256 * 49 = 12544
for (( f = 12544 ; f < 12544 + 256 ; f++ ))
do
kfed op=read blkn=$f dev='/dev/oracleasm/disks/VOL1' > blk${f}
done
22
© 2008 Julian Dyke
juliandyke.com
Extent
Distribution
23
© 2008 Julian Dyke
juliandyke.com
Extent Distribution

Creating a disk group:
CREATE DISKGROUP diskgroup1
EXTERNAL REDUNDANCY
DISK '/dev/oracleasm/disks/VOL1';

Dropping a disk group:
DROP DISKGROUP diskgroup1
INCLUDING CONTENTS
24
© 2008 Julian Dyke
juliandyke.com
Extent Distribution
1 disk
Metadata
0
1
2
3
4
5
6
7
Disk 0
Metadata
25
© 2008 Julian Dyke
Data
juliandyke.com
Extent Distribution
2 disks
Metadata
0
2
4
6
8
10
12
14
Disk 0
Metadata
26
© 2008 Julian Dyke
1
3
5
7
9
11
13
15
Disk 1
Data
juliandyke.com
Extent Distribution
4 disks
Metadata
0
4
8
12
16
20
24
28
Disk 0
2
6
10
14
18
22
26
30
1
5
9
13
17
21
25
29
3
7
11
15
19
23
27
31
Disk 1
Disk 2
Disk 3
Metadata
27
© 2008 Julian Dyke
Data
juliandyke.com
Extent Distribution
1 large disk - 1 small disk
Metadata
2
5
8
11
0
1
3
4
6
7
9
10
Disk 0
Metadata
28
© 2008 Julian Dyke
Disk 1
Data
juliandyke.com
Extent Distribution
1 large disk - 3 small disks
Metadata
2
7
12
17
1
6
11
16
4
9
14
19
Disk 1
Disk 2
Disk 4
0
3
5
8
10
13
15
18
Disk 0
Metadata
29
© 2008 Julian Dyke
Data
juliandyke.com
Extent Distribution
2 large disks - 2 small disks
Metadata
0
2
6
8
12
14
18
20
Disk 0
1
4
7
10
13
16
19
22
3
9
15
21
5
11
17
23
Disk 1
Disk 2
Disk 4
Metadata
30
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
31
© 2008 Julian Dyke
juliandyke.com
Extent Distribution

Adding a disk:
ALTER DISKGROUP diskgroup1
ADD DISK '/dev/oracleasm/disks/VOL2'
REBALANCE POWER 0;

Dropping a disk:
ALTER DISKGROUP diskgroup1
DROP DISK 'DISKGROUP1_0002'
REBALANCE POWER 0;

Rebalancing a disk group:
ALTER DISKGROUP diskgroup1
REBALANCE POWER 1;
32
© 2008 Julian Dyke
juliandyke.com
Rebalancing
Adding disks - 1 disks to 2 disks
0
1
2
3
4
5
6
7
1
3
5
7
1
3
5
7
0
1
2
3
4
5
6
7
Disk 0
Metadata
33
STOP
© 2008 Julian Dyke
Disk 1
Data
juliandyke.com
Rebalancing
Adding disks - 1 disks to 4 disks
0
1
2
3
4
5
6
7
2
6
2
6
1
5
1
5
0
4
0
4
Disk 1
Disk 2
Disk 3
0
1
2
3
4
5
6
7
Disk 0
Metadata
34
STOP
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
Adding disks - 2 disks to 3 disks
0
1
2
3
4
5
1
3
5
7
9
11
13
15
17
0
12
4
6
48
10
12
14
16
0
3
2
5
8
11
14
17
Disk 0
Disk 1
Disk 2
Metadata
35
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
Adding disks - 2 disks to 4 disks
0
1
2
3
4
5
6
7
20
2
64
6
8
10
12
14
1
5
1
5
9
13
0
4
0
4
8
12
Disk 1
Disk 2
Disk 3
1
3
5
7
9
11
13
15
Disk 0
Metadata
36
STOP
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
Dropping disks - 3 disks to 1 disk
0
3
0
3
6
1
4
1
4
7
2
5
2
5
8
Disk 1
Disk 2
1
2
4
5
1
2
4
5
7
8
Disk 0
Metadata
37
STOP
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
Moving disks - 2 disks to 2 disks
0
2
4
6
8
10
12
14
0
2
4
6
0
2
4
6
8
10
12
14
Disk 1
Disk 2
0
1
2
3
4
5
6
7
1
3
5
7
9
11
13
15
Disk 0
Metadata
38
STOP
© 2008 Julian Dyke
Data
juliandyke.com
Rebalancing
V$ASM_OPERATION

39
Contains details of ongoing rebalance operations
Column Name
Data Type
GROUP_NUMBER
NUMBER
OPERATION
CHAR(5)
STATE
VARCHAR2(4)
POWER
NUMBER
ACTUAL
NUMBER
SOFAR
NUMBER
EST_WORK
NUMBER
EST_RATE
NUMBER
EST_MINUTE
NUMBER
ERROR_CODE
VARCHAR2(44)
© 2008 Julian Dyke
Estimate of
remaining time
juliandyke.com
Rebalancing
Power Limit
40

Power limit can be 0 to 11
 0 disables rebalance operation
 1 to 11 specifies number of ARBn background processes used for
rebalance

In Oracle 10.2
 RBAL manages rebalance operation
 Each ARBn background process is allocated a range of 128 allocation
units to rebalance
 When complete another range is requested
 AD lock is taken while an allocation unit is being rebalanced
 Rebalance operations take much longer than theoretically necessary.
Possible reasons include:
 Locking
 GES updates with other ASM instances
 Updates to RDBMS instance
© 2008 Julian Dyke
juliandyke.com
Rebalancing
Summary
41

EST_MINUTES column of V$ASM_OPERATION is reasonably accurate
 Allow a few minutes for SAN cache to stabilize
 Check regularly for changes to estimate

ASM rebalance operations do not affect workload
 Locks are only taken briefly
 Lock mechanism has changed in Oracle 11.1
 SAN cache and I/O performance will be affected

In Oracle 10.2 rebalancing is fastest if
 Other ASM instances are shutdown
 RDBMS instance is shutdown

Estimated completion time will be affected by:
 Use of SAN cache and I/O by rest of workload
 Rate of change by applications to blocks in ASM files being rebalanced
© 2008 Julian Dyke
juliandyke.com
Redundancy
42
© 2008 Julian Dyke
juliandyke.com
Redundancy

43
ASM Supports three levels of redundancy

External Redundancy
 Implemented externally using storage layer
 Most common configuration in production

Normal Redundancy
 Two copies of each extent maintained in separate failure groups
 Used with extended clusters
 Used occasionally in production e.g. CERN
 Increases CPU overhead on servers

High Redundancy
 Three copies of each extent maintained in separate failure groups
 Very rare in production
© 2008 Julian Dyke
juliandyke.com
ASM Failure Groups - External Redundancy
Disk Group
Disk 1
44
© 2008 Julian Dyke
Disk 2
Disk 3
juliandyke.com
ASM Failure Groups - Normal Redundancy
Disk Group
Failure Group 1
Disk 1
45
Disk 2
© 2008 Julian Dyke
Disk 3
FailureGroup 2
Disk 4
Disk 5
Disk 6
juliandyke.com
ASM Failure Groups - High Redundancy
Disk Group
Failure Group 1
Disk 1
46
Disk 2
© 2008 Julian Dyke
Failure Group 2
Disk 1
Disk 2
Failure Group 3
Disk 1
Disk 2
juliandyke.com
Normal Redundancy
1 Disk Per Failure Group
Failure Group 1
Failure Group 2
0
1
2
3
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
3
0
1
2
3
4
5
6
7
8
9
10
11
Disk 0
Disk 1
Metadata
Data
47
© 2008 Julian Dyke
Primary
Primary
Secondary
Secondary
juliandyke.com
Normal Redundancy
2 Disks per Failure Group
Failure Group 2
Failure Group 1
0
3
4
7
0
3
4
7
8
11
12
15
16
19
20
23
1
2
5
6
1
2
5
6
9
10
13
14
17
18
21
22
0
3
4
7
0
3
4
7
8
11
12
15
16
19
20
23
1
2
5
6
1
2
5
6
9
10
13
14
17
18
21
22
Disk 0
Disk 1
Disk 2
Disk 3
Metadata
Data
48
© 2008 Julian Dyke
Primary
Primary
Secondary
Secondary
juliandyke.com
High Redundancy
1 Disk per Failure Group
Failure Group 1
Failure Group 3
5
0
1
2
3
4
5
6
7
8
9
4
5
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
0
1
2
3
4
5
6
7
8
9
Disk 0
Disk 1
Disk 2
0
1
2
3
0
1
2
3
4
Metadata
Data
49
Failure Group 2
© 2008 Julian Dyke
Primary
Primary
Secondary
Secondary
Tertiary
Tertiary
juliandyke.com
References
50

Oracle Automatic Storage Management (Oracle Press)
 Nitin Vengurlekar
 Murali Vallath
 Rich Long

What ASM and ZFS Can Do For You
 Jason Arneil - Nominet

A Closer Look Inside Oracle ASM
 Luca Canali - CERN

Implementing ASM Without HW Raid
 Luca Canali - CERN
© 2008 Julian Dyke
juliandyke.com
Thank you for listening
[email protected]
51
© 2008 Julian Dyke
juliandyke.com