Case Study: A Time Variant Multi

Download Report

Transcript Case Study: A Time Variant Multi

“You Can Do It”
Datawarehouse:
Beginner to Advanced
In Two Hours
by
Arup Nanda
Manager – Database Systems
Starwood Hotels & Resorts International
White Plains, NY
Objectives
• Exploring DW Techniques in Oracle
• Case Study
• Oracle 10G Additions
DB1
Cust1
DB2
Cust11
?
Cust10
Cust2
Cust9
Cust3
Cust8
Cust4
DB3
Datawarehouse
Cust7
Cust5
Cust6
DB4
DB5
DB6
A Real Life Case
•
•
•
•
•
•
•
Claims Datawarehouse
Several Customers/Sources
Several Quarters
Data Volume Was High
Irregular Frequency
Data Comes Often Late
Near Real Time Requirements
Problem of Irregular Data
Detail
Table
Summary
Table
Detail
Table
DBMS_MVIEW.REFRESH (…)
CUST2
Problems
•
•
•
•
•
Incoming Data Irregular
Summary Tables Need Refreshing
Quarters Added Continuously
Archival Requirements Vary Across Customers
Quick Retrieval of Archival Needed
Problems contd.
• Summary on Summary Tables as Materialized
Views
• Need Refresh Whenever New Data Arrives
• Or When Data is Purged/Reinstated
• Customers Added and Deleted Frequently
Objective
• To Minimize Downtime for Refreshes
– Incrementally Refresh
– Partitioning Techniques
•
•
•
•
To Add Customers Easily
To Add Quarters Easily
To Archive Off and Purge Easily and Atomically
To Restore Archives Quickly
Objective contd.
• To have an ETL Setup for Easy Addition of
Objects Such As Tables, Indexes, Mat Views.
• Use Only Available Oracle and Unix Tools
– PL/SQL
– Unix Shell Scripts
– SQL*Plus
Design
• Varying Dimensions –
– Customer
– Quarter
• Composite Partitioning
– Range (for Quarters)
– List (for Customers)
• Local Indexes
Partitioning
• Partitioned on CLAIM_DATE
– RANGE
– Partitioned named YyyQq
– Storage Clauses Not Defined
• Supartitioned on CUST_NAME
– LIST
– Named YyyQq_CustName, e.g. Y03Q3_CUST1
Indexing
• All Indexes Local
CREATE INDEX IN_CLAIM_SUM_01
LOCAL
ON SUMTAB1 (COL1, COL2)…
• No Indexes UNIQUE and GLOBAL
Storage
Each Subpartition – of Index or Table is kept in
separate tablespaces named in the format
Y<Year>Q<Qtr>_<CustName>_DATA
e.g. Y02Q2_CUST1_DATA
Y02Q2_CUST2_DATA
Y03Q3_CUST1_DATA
Table
Index
Customers
Cust3
Y03 Q3
Cust3
Y03 Q3
Quarter
In Tablespace
Y03Q3_CUST3_DATA
In Tablespace
Y03Q3_CUST3_INDX
Tablespace
create tablespace y03q3_cust1_data
datafile
‘/oradata/y03q3_cust1_data_01.dbf’
size 500m
autoextend on next 500m
extent management local
segment space management auto
Table DDL
CREATE TABLE TAB1
( … )
PARTITION BY RANGE (CLAIM_DATE)
SUBPARTITION BY LIST (CUST_NAME)
(
PARTITION Y03Q1 VALUES LESS THAN (TO_DATE(‘2003/04/01’,’YYYY/MM/DD’)),
(
SUBPARTITION Y03Q1_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q1_CUST1_DATA,
SUBPARTITION Y03Q1_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q1_CUST2_DATA,
… and so on for all subpartitions …
SUBPARTITION Y03Q1_DEF VALUES (DEFAULT) TABLESPACE USER_DATA
),
PARTITION Y03Q2 VALUES LESS THAN (TO_DATE(‘2003/07/01’,’YYYY/MM/DD’)),
(
SUBPARTITION Y03Q2_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q2_CUST1_DATA,
SUBPARTITION Y03Q2_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q2_CUST2_DATA,
… and so on for all subpartitions …
SUBPARTITION Y03Q2_DEF VALUES (DEFAULT) TABLESPACE USER_DATA
),
… and so on for all the partitions …
PARTITION DEF VALUES LESS THAN (MAXVALUE),
(
SUBPARTITION DEF_CUST1 VALUES (‘CUST1’) TABLESPACE USER_DATA,
SUBPARTITION DEF_CUST2 VALUES (‘CUST2’) TABLESPACE USER_DATA,
… and so on for all subpartitions …
SUBPARTITION DEF_DEF VALUES (DEFAULT) TABLESPACE USER_DATA
)
)
Index DDL
CREATE INDEX IN_TAB1_01
ON TAB1 (COL1)
LOCAL NOLOGGING
(
PARTITION Y03Q1
(
SUBPARTITION Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_INDX,
SUBPARTITION Y03Q1_CUST2 TABLESPACE Y03Q1_CUST2_INDX,
… and so on for all subpartitions …
SUBPARTITION Y03Q1_DEF TABLESPACE USER_DATA
),
PARTITION Y03Q2
(
SUBPARTITION Y03Q2_CUST1 TABLESPACE Y03Q2_CUST1_INDX,
SUBPARTITION Y03Q2_CUST2 TABLESPACE Y03Q2_CUST2_INDX,
… and so on for all subpartitions …
SUBPARTITION Y03Q2_DEF TABLESPACE USER_DATA
),
… and so on for all the partitions …
PARTITION DEF
(
SUBPARTITION DEF_CUST1 TABLESPACE USER_DATA,
SUBPARTITION DEF_CUST2 TABLESPACE USER_DATA,
… and so on for all subpartitions …
SUBPARTITION DEF_DEF TABLESPACE USER_DATA
)
)
Creating DDLs
Static
Part
create table tab1
(………)
DDL to
Create
Table
Variable
Part
partition y03q1 (
subpartition
y03q1_cust1
tablespace …)
Constraints
Constraints defined as
DISABLE NOVALIDATE RELY
ALTER TABLE … ADD CONSTRAINT …
RELY DISABLE NOVALIDATE;
Constraint
• VALIDATE/NOVALIDATE
– Table TAB1 (Column: STATUS)
– Current Values A, I, F
– Check Constraint: STATUS IN (‘A’,’I’)
• ENABLE/DISABLE
– New Value ‘F’
• RELY
RELY
Reasons
• To Include Relation Information to the
Metadata
• To Enable Query Rewrite
Summary Tab and View
On DW
Summary Table View
CUST_NAME
CLAIM_DATE
PROVIDER_ID
NUM_CLAIMS
NUM_LINES
On
Source
SELECT
‘CUST1’
AS CUST_NAME,
CLAIM_DATE,
PROVIDER_ID,
COUNT(DISTINCT CLAIM_ID) AS NUM_CLAIMS,
COUNT(*)
AS NUM_LINES
FROM ….
GROUP BY …
Casting
SELECT
CAST (CUST_NAME AS VARCHAR2(20))
AS CUST_NAME
FROM <viewname>
CAST (column_name AS datatype (precision))
cust1
Owned by
Cust Schema
Index of
Temporary
Table
INDEX
View
Filter:
Where CLAIM_DATE is
in that quarter
Temporary
Table
Massaging
Analyzing
TABLE
Summary
Table
For Customer
Cust1 and
Quarter Q1
DW
cust
View
Old Sub
Partition
Old Sub
Partition
INDEX
TABLE
ALTER TABLE …
EXCHANGE SUBPARTITION subpartname
WITH TEMPTABLE
INCLUDING INDEXES
DW
Technique
• Not Using DBMS_MVIEW.REFRESH
• MV is always STALE
Temp Table
CREATE TABLE T1_Y03Q1_CUST1
TABLESPACE Y03Q1_CUST1_DATA
PARALLEL 8 NOLOGGING
AS
SELECT …
FROM CUST1.VIEW1@DB1
WHERE CLAIM_DATE >=
add_months(trunc(to_date(‘03','RR'),'YYYY'),
3*(to_number(‘1')-1))
and batch_date <
last_day(add_months(trunc(
to_date(‘03','RR'),'YYYY'),
3*(to_number(‘1')) - 1 )) + 1
Script
CREATE TABLE T1_Y&&YY.Q&&Q._&&CUST
TABLESPACE Y&&YY.Q&&Q._&&CUST._DATA
PARALLEL 8 NOLOGGING
AS
SELECT …
FROM &&CUST..VIEW1@&&DBLINK
WHERE CLAIM_DATE >=
ADD_MONTHS(TRUNC(TO_DATE('&&YY','RR'),'YYYY'),
3*(TO_NUMBER('&&Q')-1))
AND BATCH_DATE <
LAST_DAY(ADD_MONTHS(TRUNC(
TO_DATE('&&YY','RR'),'YYYY'),
3*(TO_NUMBER('&&Q')) -1 )) + 1
External Table
Reason
Source is a non-Oracle DB, e.g. DB2
Source is External, no DB Link Allowed
Fixed Format –vs- Delimited
Fixed Format
Faster, Easier
More Space
Delimited
Less Space
Slower, Slightly More Complex
Massaging
• Removing NOT NULL Constraints
• Making Datatypes Consistent
– The CAST operation converts NUMBER(m,n) to
NUMBER
– cast(col1 as number(10,2)) as
col1_m
– COL1
NUMBER(5,2)
– COL1_M NUMBER
Analyzing
• Using DBMS_STATS.GATHER_TABLE_STATS
• PARALLEL Degree
dbms_stats.gather_table_stats (
ownname => ‘DWOWNER',
tabname => '&&TABNAME',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
degree => dbms_stats.default_degree,
cascade => TRUE );
Mat Views
MVs Created as Tables
CREATE TABLE MV_SUMMTAB1
Storage clauses just like the underlying table
CREATE MATERIALIZED VIEW
MV_SUMMTAB1
ON PREBUILT TABLE
AS SELECT ……
http://www.proligence.com/painless_alter.pdf
Query Rewrite
Table SUM_CLAIMS
PROVIDER_ID, STATE, TYPE, TOT_AMT
Table MV_SUM_CLAIMS
PROVIDER_ID, STATE, SUM(TOT_AMT) TOT_AMT
GROUP BY PROVIDER_ID, STATE
SELECT SUM(TOT_AMT) FROM SUM_CLAIMS
SELECT SUM(TOT_AMT) FROM
MV_SUM_CLAIMS
Query Rewrite
Init.ora Parameters
query_rewrite_enabled='TRUE'
query_rewrite_integrity='STALE_TOLERATED‘
ENFORCED – Rewrite only if guaranteed
TRUSTED – Uses only if RELY
STALE_TOLERATED – Even if not RELY
Checking QR
dbms_mview.explain_rewrite (
‘select cust_name, count(*) from summtab1 group by
cust_name’ );
select message from rewrite_table;
QSM-01033: query rewritten with materialized view,
MV_SUMMTAB1
QSM-01101: rollup(s) took place on mv, MV_SUMMTAB1
Design …
MV_* subpartitions are on the same tablespace as
the parents.
Subparts of MV_SUMMTAB1_0? are in the same
TS as SUMMTAB1
Subparts of MV_SUMMTAB2_0? in SUMMTAB2
Quarter
MV2
MV1
PARENT
TableSpace1 TableSpace2
MV and Parents
• Partition Pruning
• Partition-wise Joins
• Partition Independence
Adding Quarters/Customers
• Partition
– Default Partition – VALUES LESS THAN
(MAXVALUE)
• Subpartition
– Default Subpartition – VALUES (DEFAULT)
Cust1
Cust2
Cust3
DEF
Cust1
Cust2
Cust3
DEF
Cust1
Cust2
Cust3
Cust4
DEF
alter table … split subpartition
Cust1
Cust2
Cust3
DEF
Cust1
Cust2
Cust3
DEF
alter table … split partition
Backup/Restore
• Backup
– ALTER TABLESPACE <TSName> READ ONLY
– Copy the files to tape/CD.
• Restore
– Copy the file back into the directory
– ALTER TABLESPACE <TSName> RECOVER
Archival/Purge
Table
SP1
SP2
SP3
SP4
Table4
Table
SP1
SP2
SP3
SP4
Table4
Table
SP1
SP2
SP3
Table4
Archival/Purge
CREATE TABLE S1_Y<yy>Q<q>_<CustName>
TABLESPACE Y<yy>Q<q>_<CustName>_<TSType>
AS SELECT * FROM SUMMTAB1 WHERE 1=2
/
CREATE INDEXES, CONSTRAINTS, etc.
/
ALTER TABLE SUMMTAB1 EXCHANGE
SUBPARTITION Y<yy>Q<q>_<CustName>
WITH TABLE Y<yy>Q<q>_<CustName>
INCLUDING INDEXES
/
Check TTS
ALTER TABLESPACE
Y<yy>Q<q>_<CustName>_<TSType>
READ ONLY;
DBMS_TTS.TRANSPORT_SET_CHECK (
<DataTS>,<IndexTS>) ;
SELECT * FROM
TRANSPORT_SET_VIOLATIONS;
Transport TS
Export Parameter File
TRANSPORT_TABLESPACE=y
TTS_FULLCHECK=Y
FILE=‘<FileLocation>/exp<TS>.dmp’
TABLESPACES=(<DataTS>, <IndexTS>)
Copy the exp.dmp and Datafiles to tape/CD.
Purge
Drop Subpartition
Drop the Tablespace
DROP TABLESPACE <TSName> INCLUDING
CONTENTS AND DATAFILES;
Restore
• ALTER TABLE SPLIT SUBPARTITION
<DefaultSP>
• Copy Datafiles & Export Dump Files from
CD/Tape
• Import Parameter File
TRANSPORT_TABLESPACES=Y
TABLESPACES=(<DataTS>,<IndexTS>)
DATAFILES=(…)
Minimizing Refresh Unit
• Months – instead of quarters refreshed at a time.
• Last Quarter Split into a Subpartition per Month
• Naming Convention
– YyyQqMmm
– Y03Q3M09
• Merge Subpartition
Merging Subpartitions
• Index Subpartitions Created in User’s Default
Tablespace
• Subpartition Template
ALTER TABLE SUMTAB1 ADD SUBPARTITION
TEMPLATE
Resumable Statement
• When?
– Running Large Report Jobs
– Creating Large Indexes
• ALTER SESSION ENABLE RESUMABLE NAME ‘Job1’;
• View DBA_RESUMABLE
– NAME – Name specified in ALTER SESSION
– COORD_SESSION_ID – Coord Session in PQ
– SQL_TEXT – The text of the SQL
– STATUS - RUNNING, SUSPENDED, ABORTED,
ABORTING, TIMEOUT
– ERROR_NUMBER/ERROR_MSG
Objectives Revisited
• To Minimize Downtime for Refreshes
– Incrementally Refresh
– Partitioning Techniques
•
•
•
•
To Add Customers Easily
To Add Quarters Easily
To Archive Off and Purge Easily and Atomically
To Restore Archives Quickly
Oracle 10G
• Transportable Tablespaces Can Be Reinstated
At a Different Operating System
– Can be used for Restoring to a Different OS
• Tablespaces Can Be Renamed
– Restoring Tablespace of the Same Name
• Multiple Temporary Tablespace
– For Large Index Creation, Sorting, etc.
Oracle 10G contd.
• Partition Change Tracking Support for List
Partitioning
• Query Rewrites Can Use Multiple MVs
• OEM Shows All Partitioning Features
• Data Pump
– Export/Import on Steroids
– Parallel Operation
Oracle 10G contd.
• External Table Download
– A Utility to Create File from Table Data
CREATE TABLE …
ORGANIZATION EXTERNAL
AS SELECT * FROM <a query>
– Platform Independent File
– Can Be Used In External Tables
Thank You!
Updated Copy Can Be Found In
www.proligence.com
[email protected]