Lex7DataWarehouse

Download Report

Transcript Lex7DataWarehouse

Oracle Utility: SQL*Loader

Input Data Files Loader Control File SQL*Loader Discard Files Log File Bad Files Data Base Tables Indexes

KO OK Discard Data Base Record SQL*Loader Field Processing SQL*Loader When Clause SQL*Loader DML Inserting Bad

sqlldr ...

Valid Keywords: Userid -- ORACLE username/password control -- Control file name log -- Log file name bad -- Bad file name data -- Data file name discard -- Discard file name discardmax -- Number of discards to allow (Default all) skip -- Number of logical records to skip (Default 0) load -- Number of logical records to load (Default all) errors -- Number of errors to allow (Default 50) rows -- Number of rows in conventional path bind array or between direct path data saves (Default: Conventional path 64, Direct path all) silent -- Suppress messages during run (header,feedback,errors,discards,partitions) direct -- use direct path (Default FALSE) parfile -- parameter file: name of file that contains parameter specifications

-- Loads EMP records from first 23 characters -- Creates and loads PROJ records for each PROJNO listed -- for each employee LOAD DATA INFILE ’ulcase5.dat’ BADFILE ’ulcase5.bad’ DISCARDFILE ’ulcase5.dsc’ REPLACE INTO TABLE emp (empno POSITION( 1: 4) INTEGER EXTERNAL, ename POSITION( 6:15) CHAR, deptno POSITION(17:18) CHAR, mgr POSITION(20:23) INTEGER EXTERNAL) INTO TABLE proj -- PROJ has two columns, both not null: EMPNO and PROJNO WHEN projno != ’ ’ (empno POSITION( 1: 4) INTEGER EXTERNAL, projno POSITION(25:27) INTEGER EXTERNAL) -- 1st proj INTO TABLE proj WHEN projno != ’ ’ (empno POSITION( 1: 4) INTEGER EXTERNAL, projno POSITION(29:31) INTEGER EXTERNAL) -- 2nd proj INTO TABLE proj WHEN projno != ’ ’ (empno POSITION( 1: 4) INTEGER EXTERNAL, projno POSITION(33:35) INTEGER EXTERNAL) -- 3rd proj

External Tables Concepts

The

Oracle9i

external tables feature is a complement to existing SQL*Loader functionality. It allows you to access data in external sources as if it were in a table in the database.

External tables are

read-only

.

No data manipulation language (DML) operations or index creation is allowed on an external table.

Therefore, SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table.

To use the external tables feature, you must have some knowledge of the file format and record format of the datafiles on your platform.

You must also know enough about SQL to be able to create an external table and execute DML statements that access the external table.

CREATE TABLE empxt (empno NUMBER(4), ( ( ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4) ) ORGANIZATION EXTERNAL TYPE ORACLE_LOADER DEFAULT DIRECTORY dat_dir ACCESS PARAMETERS records delimited by newline badfile dat_dir:'empxt%a_%p.bad' logfile dat_dir:'empxt%a_%p.log' fields terminated by ',' missing field values are null ( empno, ename, job, mgr ) ) LOCATION ('empxt1.dat') ) REJECT LIMIT UNLIMITED;

Data Warehousing

Discoverer Forms& Reports Pro*C S.Q.L.

PL/SQL Development Tool DataWarehouse

fasi del processo di integrazione...

Integrazione di piattaforme e applicazioni:

Intranet

OLAP su Web

Data Warehouse Definition

Un DWH raccoglie i dati da diverse sorgenti trasformandoli in un formato consistente ed omogeneo.

Ogni Punto nel “ Cubo ” contiene “Fact Data” per una particolare combinazione di n “Dimension Data”. Nel caso a 3 dimensioni riportato a lato in particolare troviamo dati statistici organizzati per Product, Market, Time.

data entry data entry interrogazioni e reports

Sistema transazionale

O.L.T.P.

Data Mart

Data Warehouse

E xtraction, T ransformation L oading reports integrati ed analisi on-line

Ambiente di reporting e Decision Support System

O.L.A.P.

Strumenti per l’analisi on-line dei dati con caratteristiche di efficienza, facilità d’uso ed elaborazione grafica

Sistema transazionale

Database operazionale

Decision Support System Reporting tools

Data Warehouse

Estrazione

Meta Data

Trasporto Trasformazione

Architettura tipica di un DWH

Quanto abbiamo venduto l’anno scorso?

Chi ha contribuito all’aumento delle vendite?

Quanto si prevede di vendere quest’anno in base al trend attuale?

IT Sviluppa Utenti Consultano Cosa è successo?

Reporting Standard Oracle Reports Utenti evoluti Analisti Perchè è successo?

Analisi e Domande ad hoc Oracle Discoverer Cosa accadrebbe se?

Analisi Avanzate Oracle Express

Differenze di un data warehouse rispetto ad un database operazionale:

• differenti requisiti di hardware, storage e tuning • differente modellazione dei dati • diversi strumenti di interrogazione e reporting

Data Mart

• Schema composto da una o più tabelle dei

FATTI

contenenti l’elemento quantitativo e diverse tabelle delle gerarchia.

DIMESIONI

rappresentanti un tipo di • Schema a stella (

star schema

) • Più stelle collegate prendono il nome di costellazioni

Star queries

Il processo di join di Oracle per N tabelle è quello di creare dei risultati intermedi considerando in join due tabelle per volta (pair-wise).

Con l’ottimizzazione delle star query Oracle si riserva di mettere in join la tabella dei fatti per ultimo essendo quella di maggiori dimensioni.

Schema a Stella (

Star Schema

)

Oracle9i

Star Schema Example

Data Warehousing con ORACLE

• Viste materializzate / Snapshots • Tabelle partizionate • Tabelle Organizzate ad Indice • Cluster • Star query • Indici Function Based • Read only tablespaces

DataBase Owner Oracle Architectural Components (Logical) Tablespace DataFile Oracle Block O.S. Block Partition Schema Segment Table Index Cluster Snapshot Extent

( invoice_no NUMBER NOT NULL, sale_year NUMBER NOT NULL, sale_month NUMBER NOT NULL, sale_day NUMBER NOT NULL ) PARTITION BY RANGE (sale_year, sale_month, sale_day) ( PARTITION 01) sales_q1 VALUES LESS THAN (1998, 04, PARTITION 01) TABLESPACE tsa STORAGE (……….), sales_q2 VALUES LESS THAN (1998, 07, PARTITION 01) TABLESPACE tsb STORAGE (……….), sales_q3 VALUES LESS THAN (1998, 10, PARTITION 01) TABLESPACE tsc STORAGE (……….), sales_q4 VALUES LESS THAN (1999, 01, TABLESPACE tsd STORAGE (……….) );

Maintaining Partitions This section describes how to perform the following specific partition maintenance operations: 1.

2.

3.

4.

5.

6.

7.

8.

9.

Moving Partitions Adding Partitions Dropping Partitions Coalescing Partitions Modifying Partition Default Attributes Truncating Partitions Splitting Partitions Merging Partitions Exchanging Table Partitions

Index-Organized Tables

Regular table access IOT access ROWID Non-key columns Key column Row header

Index-Organized Tables Compared with Regular Tables

 Faster key-based access to table data  Reduced storage requirements  Main restrictions: Must have a primary key Cannot use unique constraints Cannot be clustered

Creating Index-Organized Tables

SQL> create table sales 2 (office_cd 3 ,qtr_end 4 ,revenue number(3) date number(10,2) 5 ,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10 storage (…);

Cluster

Use clusters to store one or more tables that: 1) Are primarily queried 2) Not predominantly inserted into or updated 3) Which the queries often join data of multiple tables in the cluster.

CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTFREE 5 TABLESPACE users STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0 ); CREATE TABLE dept (deptno NUMBER(3) PRIMARY KEY, . . ) CLUSTER emp_dept (deptno); CREATE TABLE emp (empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, . . .

deptno NUMBER(3) REFERENCES dept) CLUSTER emp_dept (deptno);

CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE users STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0) PCTFREE 5;

Index Function Based

Features of Function-Based Indexes Function-based indexes allow you to: 1) More powerful 2) Precompute the value of a computationally intensive function and store it in the index 3) Increase the number of situations where the optimizer can perform a range scan instead of a full table scan You must have the following initialization parameters defined to create a function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *) COMPATIBLE set to 8.1.0.0.0 or a greater value

Index Function Based

Example: Function-Based Index for Case Insensitive Searches The following statement creates function-based index idx on table emp based on an uppercase evaluation of the ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the SELECT statement uses the function-based index on UPPER(ename) To retrieve all employees with names that start with JOH: SELECT * FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing Arithmetic Expressions with a Function-Based Index This statement creates a function-based index on an expression: CREATE INDEX idx ON t (a + b * (c - 1), a, b); SELECT statements can use either an index range scan (in the following SELECT statement the expression is a prefix of the index) or index full scan (preferable when the index specifies a high degree of parallelism).

SELECT a FROM t WHERE a + b * (c - 1) < 100;

Discoverer Architecture

Viewer edition User edition Administration edition E nd U ser L ayer Business Area Database (OLTP, Data Warehouse, Data Mart) Database Complexity is Hidden From Users

Data Base DWH

Data Warehouse Tools

Low-end OLAP Tools per Simple Queries e Reports ad hoc

Adeguati per rispondere a domande della tipologia “che cosa?” Powerful Multi-Dimensional Analysis OLAP Tools

Supportano “drill down” in “detail data” per rispondere a domande della tipologia “perche?” e “come?”

OLAP Tools

Slicing del “Cubo”

Pivot

Ruotare l’approccio visuale di un punto del cubo stesso al fine di ottenere una nuova prospettiva 

Drill-Down

Data Reconciliation Steps

Data Reconciliation Process

Capture

Static - initial load

Incremental - ongoing update Scrub or data cleansing

Pattern recognition and other artificial intelligence techniques Transform

Convert the data format from the source to the target system

Record-Level Functions

Selection  Joining Aggregation (for data marts)

Field-Level Functions

Single-field transformation Multi-field transformationn