Do You Need an ETL Tool - New Zealand Oracle Users Group
Download
Report
Transcript Do You Need an ETL Tool - New Zealand Oracle Users Group
Do You Need an
ETL Tool?
Ben Bor
NZ Ministry of Health
1
Ben Bor
Over 20 years in IT, most of it in Information Management
Oracle specialist since version 5
Involved in Business Intelligence for over 10 years
Consulted the world’s largest corporations
Presents regularly on Information Management
Was annual Guest Lecturer at Sussex University
2
Contents
What is ETL
ETL tools vs. ‘handcraft’ code
PL/SQL techniques
3
What is ETL
ETL = Extract, Transform and Load:
Any source, target ;
Built-in complex transformations
Point-to-point vs. hub-and-spoke
4
Traditional ETL
5
Our Own ETL Requirements
Flat
Files
PL/SQL
Data Quality
6
PL/SQL
Travel Company Example
Estimated Volumes
150,000 Travel Agencies
500 Groups
50 Consortia
500,000 Consultants
3 million Bookings
1 million Brochure Requests
Key:
Aurora
Progress
Report
Tropics
future system
existing system
feed back
CRM
400,000 Questionnaires
Oracle Staging Area
CTQ
FileMaker
1 million
35,000
Brochure Reqs
Website
Others
Supplier
Supplier Types
Employees
3rd Party
Data
Load
Area
DQE
Cleansed
Data
No existing
process
Oracle
Financials
Business
Rules
Cleansed
250,000
Australian
Reservations
data
Calypso
7
Manual
Data
Effort
Audit
Report
Business Group
3rd Party
Marketing
Tools or Handcraft?
ETL Advantages:
Handcrafting Advantages:
Graphic User Interface
No limitation
Automatic documentation
reuse existing code & non
ETL
Off-the-shelf set of ready-touse transformations
No specific methodology
Built-in scheduler
No license cost
Database Agnostic
No impact on infrastructure
Transportable
Release & CodeManagement by script
8
Oracle ETL Facilities
External Tables
Merge
SQL Loader
PL/SQL
Database links
9
Why Use PL/SQL
Integrated environment (no installation required)
Available resources
Reuse code ‘snippets’
Good performance
Integration with and control of the database
10
PL/SQL Tips and Techniques
1. Quality
2. Techniques
3. Tricks
11
Quality
12
What is Quality?
[1] “Totality of characteristics of an entity that bears on its
ability to satisfy stated and implied needs.“
[The ISO 8204 definition for quality]
13
Quality 2
[2] Quality is a collection of “ilities”:
Reliability
- operate error free
Modifiability
- have enhancement changes made easily
Understandability - understand the software readily
Efficiency
- the speed of the software
Usability
- use the software easily
Testability
- construct and execute test cases easily
Portability
- transport the software easily
14
Quality 3
[3] “All the things you do today in your software
development, in order to bear fruit in the future.”
15
Standards & Conventions
Use meaningful names
V_Number_Of_Items_In_Array vs. i or no_itms
Distinguish between types:
V_
Variable
a_
Parameter
C_
Constant
G_
Global constant
16
Using Packages
Central package with utilities and all output
All error messages and numbers
All common constants (date format etc’)
Global variables
Statistics data
Other packages encapsulate related logic
Within package:
Procedures & functions have:
Meaningful name
A99_ prefix. A is the level (A highest). 99 unique ID
17
Example: procedure and variable naming
XXX_Write_Flat_File.U03_Write_Record_To_CSV(
a_File_Handle,
C_Field_Delim,
C_Field_Separ,
C_Record_Separ,
RM_REFERENCE_rec.REFTYPE,
RM_REFERENCE_rec.CODE,
RM_REFERENCE_rec.DESCRIPTION,
To_Char(RM_REFERENCE_rec.ISDEFAULT , '9')
);
18
Techniques
Error logging
Autonomous Transaction
Run statistics
Release mechanism
Overloading
19
Error Logging Technique
Global variables keep key information:
Record ID
Run ID
Location in code
Local error trapping decides severity and error code.
All error trapping passed up.
20
Error Logging Structure
TABLE ERROR_LOG(
ERR_TIME
DATE,
ERR_NUM
INTEGER,
SOURCE_URN
VARCHAR2(20),
SOURCE_SYSTEM_ID
VARCHAR2(5),
PLACE_IN_CODE
VARCHAR2(64),
ERR_LOCATION
VARCHAR2(255),
ERR_DESCRIPTION
VARCHAR2(512),
SEVERITY
NUMBER(6)
)
ERR_TIME
18-OCT-02 10:04:52
ERR_NUM
1001
SOURCE_URN
223010913
SOURCE_SYSTEM
CRS
PLACE_IN_CODE
In FLIP_PKG B06 ; 6(utils A08)
ERR_LOCATION
A08_Lookup_Type
ERR_DESCRIPTION
No match found for [Plan_Code] value [C3]
SEVERITY
10
21
Autonomous Transaction
-PROCEDURE
--
===================
E00_write_error_log(
===================
a_err_num
IN
a_Severity
IN
a_err_location
IN
a_err_description
IN
integer
Integer
VarChar
VarChar
,
,
,
)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
V_Place_In_Code
DW_Process.Error_Log.Place_In_Code%Type;
BEGIN
-END
--
V_Place_In_Code := G_Place_In_Code || '(utils ' || G_Place_In_UTILS_Code || ')' ;
INSERT INTO DW_Process.Error_Log
(
err_time,
err_num,
Severity,
BOROUGH_ID,
SOURCE_URN,
SOURCE_SYSTEM_ID,
Place_In_Code,
err_location,
err_description
)
VALUES
(
sysdate,
a_err_num,
a_Severity,
G_BOROUGH_ID,
G_SOURCE_URN,
G_SOURCE_SYSTEM_ID,
V_Place_In_Code,
a_err_location,
a_err_description
) ;
COMMIT ; -- commit the autonomous transaction, outside transaction is unaffected.
G_Stats_Rec.TOTAL_NO_OF_ERRORS := G_Stats_Rec.TOTAL_NO_OF_ERRORS + 1 ;
===================
E00_Write_Error_Log ;
===================
22
Run Statistics
G_Stats_Rec is a record with all the statistics fields
Defined in the central package (therefore resident in memory)
It is updated by the writing procedures (all central)
It is written out at the end of the run
23
Release Mechanism
Table of ‘release notes’
Each package has C_Version constant updated each
release
‘Show_Version’ scripts display versions and notes
Results shipped with each release
24
Remove Spaces
-===================
FUNCTION A04_Remove_Spaces(
-===================
a_Instring
IN
Varchar )
Return Varchar
IS
/*
** Removes all the spaces from a string, leaving the rest of the printable characters
*/
BEGIN
G_place_in_UTILS_code := 'A04' ; -- For use by the error trapping routine
RETURN TRANSLATE( a_Instring,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ ||
'\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*() ',
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890’ ||
'\|,<.>/?#~@;:[{]}=+-_`¬!"£$%^&*()' ) ;
-===================
END
A04_Remove_Spaces ;
-===================
25
Strip Leading non-numerics
-============================
FUNCTION
F09_Strip_Leading_non_digits(
-============================
a_String
IN
VARCHAR2 )
RETURN VARCHAR2
IS
/*
**
Remove leading non-digits from the input.
**
Example: Input string: 'abcde12345edcba'
**
Output string: '12345edcba'
*/
v_string
Varchar2(4000) ;
v_first_digit_pos
Integer ;
BEGIN
-- Replace all digits by 0
v_string := Translate(a_String, '1234567890' , '0000000000') ;
v_first_digit_pos := instr(v_string,'0') ;
RETURN F01_Right(a_String, v_first_digit_pos ) ;
-============================
END
F09_Strip_Leading_non_digits;
-============================
26
Overloading
-PROCEDURE
--
=======================
U03_Write_Record_To_CSV(
=======================
a_File_Handle
IN
utl_file.file_type ,
a_Field_Delim IN
VarChar
a_Field_Separ
IN
VarChar
a_Record_Separ IN
VarChar
a_String1
IN
VarChar :=
a_String2
IN
VarChar :=
a_String3
IN
VarChar :=
.
.
.
)
, -- the quotes, for CSV
, -- the comma , for CSV
, -- the Carriage Return + Line feed , for CSV
G_default_Value
,
G_default_Value
,
G_default_Value
,
IS
BEGIN
IF a_String1 = G_default_Value THEN
GOTO End_Of_Record ; END IF ;
U02_Write(a_File_Handle, a_Field_Delim || a_String1 || a_Field_Delim) ;
IF a_String2 = G_default_Value THEN
GOTO End_Of_Record ; END IF ;
U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String2 || a_Field_Delim ) ;
IF a_String3 = G_default_Value THEN
GOTO End_Of_Record ; END IF ;
U02_Write(a_File_Handle, a_Field_Separ || a_Field_Delim || a_String3 || a_Field_Delim ) ;
.
.
.
<<End_Of_Record>>
U01_Write_Line(a_File_Handle, a_Record_Separ) ;
-=======================
END
U03_Write_Record_To_CSV ;
---------------------------------------------------------------------------------------------------------------------------------------------------------------=======================
27
Summary
ETL or PL/SQL? Your choice.
Consider:
Overall cost
‘Politics’
Convenience
Portability
Speed of development
Reusability
IF PL/SQL : ensure Quality
28
Thank you !
29
30
Thank you !
I can be contacted at [email protected]
31