Transcript Document

Data Warehousing and Business Intelligence
On IBM i
Alan Jordan
Coglin Mill, Rochester, MN
[email protected]
STATUS User Group Meeting, November 9, 2010
Copyright, Coglin Mill, 2010
What is Business Intelligence?
REPORTING
WHAT HAPPENED?
ANALYSIS
WHY DID IT HAPPEN?
PREDICT
WHAT WILL HAPPEN?
Query/
Reporting
OLAP
Data Mining
Historical Data (Data Warehouses/Marts)
Trending/OLAP
MONITOR
WHAT JUST
HAPPENED?
Dashboards/
Scorecards
Real-Time Data (EAI)
Data Mining
(Predictive Analytics)
Business Performance
Management
Cleansed, Consolidated Data
Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003
OS/EAI-Operation Systems/Enterprise Application Integrations
Copyright, Coglin Mill, 2010
Enterprise Data Warehouse Architecture
Operational System(s)
Data Propagation
Extraction, Transformation and Loading
Data Staging Area
l
na
o
ti
ra ort
e
p p
l o sup
a
ic
ct sion
a
T ci
de
ODS
Data Warehouse
Data
Mart
Data
Mart
Data
Mart
Mfg
Finance
Sales
PC or Browser Web Visualization Products
Copyright, Coglin Mill, 2010
Cleansed,
Transformed
Data
BI
Applications
What will most BI tools do for you?
They allow you to visually represent your data:
< This
can be replaced by this >
Copyright, Coglin Mill, 2010
What will most BI tools do for you?
< or this
< or this
or this >
In other words, they give you powerful data presentation capabilities
Copyright, Coglin Mill, 2010
What will most BI tools do for you?
They often provide OLAP (On-line Analytical Processing) capabilities:
Copyright, Coglin Mill, 2010
Audience Participation
1.
Do you have a Data Warehouse?
2.
Do you have a front end BI tool?
3.
Has the tool dramatically transformed your BI
Reporting? What are the major benefits you
have enjoyed?
4.
What challenges do you still face?
Copyright, Coglin Mill, 2010
What issues do they usually not solve?
1
Do not understand where/how the data is stored
2
Operational data is in many tables – hard to join
3
Cryptic data values (codes, flags, embedded values)
4
Poor/unknown data quality
5
Multiple apps/systems – difficult to merge data
6
Query Performance/Impact on production system
7
Constantly changing requirements
Copyright, Coglin Mill, 2010
Issues
1
Do not understand where/how the data is stored

The database was not designed for YOU to access



There is probably no ‘user manual’ for the tables


The user manual/documentation is for the application
There is probably no consistency of design





It was designed for the software application (ERP, MRP)
Column names are not meaningful
The application grew over many years
Different developers
Merger of different applications
Redundant fields reused for different purpose
Significant number of tables/columns

Many are not useful for BI
Copyright, Coglin Mill, 2010
Issues
2
Operational Data is in many tables – hard to join

Principle of 3rd Normal Form



Join may require data manipulation


Avoid redundancy of data
Results in many tables
Character to numeric
May not be exact join

Join by ‘closest to’ value
Copyright, Coglin Mill, 2010
Issues
3
Cryptic/difficult data values

Legacy of costly disk



Embedded values and conditional rules





Why use 10 bytes when 1 or 2 will do
Maybe even use bits instead of bytes
2nd character of column X means ..
If column Y = ‘S’, value Z must be multiplied by -1
If record type is ‘1’, there must be a matching record in table B.
If type is ‘2, there may be a record.
Dates



Your dates are just numbers: how do you report by quarter or add a
month to a date?
How do you handle a date of zero, or all 9s?
Converting numeric dates to date data type in a query is very costly
Copyright, Coglin Mill, 2010
Example
T01045P
R02126P
AGFRCA
AGAC3EE
AGRRWA
AG6RYHA
AGR22PQ
AGDGSF
AGVYHA
AGVFSS
AGVGRE
AGVUY2
AGMPR2
AGVR2B
AGVR3B
AGACBEE
AGACP2R
AGRQAU1
AGGSBB
AGGDMB
AGMAR2
AGAC3EA
AG6TTHA
AGRSAPQ
AGHISF
3P 0
6P 0
5A
1A
9P 0
6P 0
14A
12A
3P 0
5P 2
2A
2A
2A
1A
10A
5A
1A
8A
1A
50A
1A
6P 0
6P 0
DSFTCA
DSRTBB
DSRTTB
DSMNTI
DSVB1B
DSVB2B
DSYT1LO
DSYT1LR
DSRRWA
DS6TYHA
DSTIIPQ
DSDRTF
DSVBHA
DSVBSS
DSVBPE
DSVBYI
DSMNTI
DSVR2B
DSVR3B
DSYT2WL
DSYTWLT
DSRRYUQ
T00032P
3P 0
5A
5A
1A
1A
1A
50A
12A
5A
1A
3P 0
6P 0
1A
2A
3A
5P 2
25A
25A
25A
12A
12A
6A
6 character field names
(RPG III legacy)
T05001P
FPPTWLT
FPLLPUQ
FPFTCA
FPLTTB
FPTNTP
FPYB1B
FPTNTP
FPYL2B
FPYLPB
12A
6A
1P 0
5A
1A
1A
25A
1P 0
25A
The database is designed to support
transactions, not query access!
Use of 3rd normal form to avoid
redundancy results in many tables.
Copyright, Coglin Mill, 2010
DFFTCA
DFRTBB
DFRTTB
DFMNTI
DFTG1B
DFTG2B
DFTG3B
DFTG4B
DFMNEE
DFMNEF
DFRERP
DFWELF
DFWILF
DFWILR
DFWILS
DFWILT
DFQI1W
DFQ2IW
DFTRES
DFYT1LL
DFYT1LO
DFYT1LR
DFRRWA
DF6TYHA
DFTIIPQ
DFDRTF
DFDRTG
DFDRTH
DFTPPL
DFTINM
DFTIR2
DFTIGL
DFTTDT
DFTTED
DFHHIJ
DFHHIK
DFTYHI
DFTYIA
DFTYKN
DFTTWK
DFTGHA
DFTGSS
DFTGPE
DFTGYI
T01046P
3P 0
5A
5A
1A
1A
1A
1A
1A
25A
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
5A
3A
10A
45A
12A
12A
5A
1A
1P 0
6P 0
6P 0
6P 0
1P 0
3P 0
30A
12A
6P 0
6P 0
4P 2
4P 2
5P 2
1A
1A
1A
1A
2A
3A
5P 2
KSFTCA
KSGSBB
KSGDMB
KSMARI
KSYT3LA
KSYT3LE
KSRRWA
KS6TYHA
KSTIIPQ
KSDGSF
KSVYHA
KSVFSS
KSVGTE
KSVUYI
KSMPTI
KSVR2B
KSVR3B
KSYTBEL
KSYTPIT
KSRQAU1
First 2 characters are file
prefix – so only 4
characters left for the
actual field name!
3P 0
5A
5A
1A
50A
6P 0
5A
1A
9P 0
6P 0
2A
2A
3P 0
5P 2
2A
2A
2A
10A
10A
5A
T03140P
TLFTCA
TLRTBB
TLRTTB
TLTNT3
TLKB1B
TLKB2B
TLTNT3
TLKR2B
TLKR3B
TLPT2WL
TLPTWLT
TLRRPUQ
3P 0
5A
5A
1A
1A
1A
25A
25A
25A
12A
12A
6A
Example
T01045P
R02126P
AGFRCA
AGAC3EE
AGRRWA
AG6RYHA
AGR22PQ
AGDGSF
AGVYHA
AGVFSS
AGVGRE
AGVUY2
AGMPR2
AGVR2B
AGVR3B
AGACBEE
AGACP2R
AGRQAU1
AGGSBB
AGGDMB
AGMAR2
AGAC3EA
AG6TTHA
AGRSAPQ
AGHISF
3P 0
6P 0
5A
1A
9P 0
6P 0
14A
12A
3P 0
5P 2
2A
2A
2A
1A
10A
5A
1A
8A
1A
50A
1A
6P 0
6P 0
The 1st character of this
column indicates the
Sales Region
Copyright, Coglin Mill, 2010
DSFTCA
DSRTBB
DSRTTB
DSMNTI
DSVB1B
DSVB2B
DSYT1LO
DSYT1LR
DSRRWA
DS6TYHA
DSTIIPQ
DSDRTF
DSVBHA
DSVBSS
DSVBPE
DSVBYI
DSMNTI
DSVR2B
DSVR3B
DSYT2WL
DSYTWLT
DSRRYUQ
T00032P
3P 0
5A
5A
1A
1A
1A
50A
12A
5A
1A
3P 0
6P 0
1A
2A
3A
5P 2
25A
25A
25A
12A
12A
6A
FPPTWLT
FPLLPUQ
FPFTCA
FPLTTB
FPTNTP
FPYB1B
FPTNTP
FPYL2B
FPYLPB
12A
6A
1P 0
5A
1A
1A
25A
1P 0
25A
T01046P
3P 0
5A
5A
1A
1A
1A
1A
1A
25A
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
5A
3A
10A
45A
12A
12A
5A
1A
1P 0
6P 0
6P 0
6P 0
1P 0
3P 0
30A
12A
6P 0
6P 0
4P 2
4P 2
5P 2
1A
1A
1A
1A
2A
3A
5P 2
These two columns hold the
same ‘value’ – but only one
of them is reliable
These are dates in
*CJUL format
T05001P
DFFTCA
DFRTBB
DFRTTB
DFMNTI
DFTG1B
DFTG2B
DFTG3B
DFTG4B
DFMNEE
DFMNEF
DFRERP
DFWELF
DFWILF
DFWILR
DFWILS
DFWILT
DFQI1W
DFQ2IW
DFTRES
DFYT1LL
DFYT1LO
DFYT1LR
DFRRWA
DF6TYHA
DFTIIPQ
DFDRTF
DFDRTG
DFDRTH
DFTPPL
DFTINM
DFTIR2
DFTIGL
DFTTDT
DFTTED
DFHHIJ
DFHHIK
DFTYHI
DFTYIA
DFTYKN
DFTTWK
DFTGHA
DFTGSS
DFTGPE
DFTGYI
KSFTCA
KSGSBB
KSGDMB
KSMARI
KSYT3LA
KSYT3LE
KSRRWA
KS6TYHA
KSTIIPQ
KSDGSF
KSVYHA
KSVFSS
KSVGTE
KSVUYI
KSMPTI
KSVR2B
KSVR3B
KSYTBEL
KSYTPIT
KSRQAU1
3P 0
5A
5A
1A
50A
6P 0
5A
1A
9P 0
6P 0
2A
2A
3P 0
5P 2
2A
2A
2A
10A
10A
5A
This date is in *YYMMDD
format. It often has a
value of 999999
T03140P
TLFTCA
TLRTBB
TLRTTB
TLTNT3
TLKB1B
TLKB2B
TLTNT3
TLKR2B
TLKR3B
TLPT2WL
TLPTWLT
TLRRPUQ
3P 0
5A
5A
1A
1A
1A
25A
25A
25A
12A
12A
6A
Example
T01045P
R02126P
AGFRCA
AGAC3EE
AGRRWA
AG6RYHA
AGR22PQ
AGDGSF
AGVYHA
AGVFSS
AGVGRE
AGVUY2
AGMPR2
AGVR2B
AGVR3B
AGACBEE
AGACP2R
AGRQAU1
AGGSBB
AGGDMB
AGMAR2
AGAC3EA
AG6TTHA
AGRSAPQ
AGHISF
3P 0
6P 0
5A
1A
9P 0
6P 0
14A
12A
3P 0
5P 2
2A
2A
2A
1A
10A
5A
1A
8A
1A
50A
1A
6P 0
6P 0
DSFTCA
DSRTBB
DSRTTB
DSMNTI
DSVB1B
DSVB2B
DSYT1LO
DSYT1LR
DSRRWA
DS6TYHA
DSTIIPQ
DSDRTF
DSVBHA
DSVBSS
DSVBPE
DSVBYI
DSMNTI
DSVR2B
DSVR3B
DSYT2WL
DSYTWLT
DSRRYUQ
3P 0
5A
5A
1A
1A
1A
50A
12A
5A
1A
3P 0
6P 0
1A
2A
3A
5P 2
25A
25A
25A
12A
12A
6A
These are the only
columns we care about
T05001P
FPPTWLT
FPLLPUQ
FPFTCA
FPLTTB
FPTNTP
FPYB1B
FPTNTP
FPYL2B
FPYLPB
Often fewer than 10% of
the available columns are
useful for BI reporting!
Copyright, Coglin Mill, 2010
T00032P
12A
6A
1P 0
5A
1A
1A
25A
1P 0
25A
DFFTCA
DFRTBB
DFRTTB
DFMNTI
DFTG1B
DFTG2B
DFTG3B
DFTG4B
DFMNEE
DFMNEF
DFRERP
DFWELF
DFWILF
DFWILR
DFWILS
DFWILT
DFQI1W
DFQ2IW
DFTRES
DFYT1LL
DFYT1LO
DFYT1LR
DFRRWA
DF6TYHA
DFTIIPQ
DFDRTF
DFDRTG
DFDRTH
DFTPPL
DFTINM
DFTIR2
DFTIGL
DFTTDT
DFTTED
DFHHIJ
DFHHIK
DFTYHI
DFTYIA
DFTYKN
DFTTWK
DFTGHA
DFTGSS
DFTGPE
DFTGYI
T01046P
3P 0
5A
5A
1A
1A
1A
1A
1A
25A
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
11P 2
5A
3A
10A
45A
12A
12A
5A
1A
1P 0
6P 0
6P 0
6P 0
1P 0
3P 0
30A
12A
6P 0
6P 0
4P 2
4P 2
5P 2
1A
1A
1A
1A
2A
3A
5P 2
KSFTCA
KSGSBB
KSGDMB
KSMARI
KSYT3LA
KSYT3LE
KSRRWA
KS6TYHA
KSTIIPQ
KSDGSF
KSVYHA
KSVFSS
KSVGTE
KSVUYI
KSMPTI
KSVR2B
KSVR3B
KSYTBEL
KSYTPIT
KSRQAU1
3P 0
5A
5A
1A
50A
6P 0
5A
1A
9P 0
6P 0
2A
2A
3P 0
5P 2
2A
2A
2A
10A
10A
5A
T03140P
TLFTCA
TLRTBB
TLRTTB
TLTNT3
TLKB1B
TLKB2B
TLTNT3
TLKR2B
TLKR3B
TLPT2WL
TLPTWLT
TLRRPUQ
3P 0
5A
5A
1A
1A
1A
25A
25A
25A
12A
12A
6A
Issues
4
Poor/unknown data quality

Missing/incorrect Values





Inadequate edit checks in application
Software errors
Changing business rules
Incorrect data conversions
Resulting in



Failed joins
Incorrect reports
Bad decisions
Copyright, Coglin Mill, 2010
Issues

Data Quality is one of YOUR MOST IMPORTANT
ISSUES!


your company will be making strategic decisions based on the data
used in reports. You hope it is accurate!
poor data quality is the most common reason for failure of business
intelligence initiatives
A 2002 study by TDWI found that
poor data quality costs US businesses
$600 BILLON per year!
Conservative estimates suggest that
poor data quality costs the typical organization
Up to 10% of potential revenue and
Up to 20% of potential earnings
Copyright, Coglin Mill, 2010
Issues
5
Different applications/databases

Totally different structures – but related information
Sales (DB2 for i5/OS)
Financials (DB2 for i5/OS)
BI Reporting
POS System (SQL Server)




Purchasing (Oracle)
Very difficult, if not impossible to join tables across databases
Different security, availability etc
Adds significant complexity
Poor performance
Copyright, Coglin Mill, 2010
The common “solution”

These issues are often solved in an ad-hoc way

Create extract files and write RPG programs to load them




As each reporting problem occurs, a new extract is written
No consistent approach
No documentation produced
Frustrated users create their own “solutions”


Copyright, Coglin Mill, 2010
Download data to Excel and manipulate it
Decide on their own rules
The usual result
A chaotic reporting environment!
These
These
reports
don’t
reports
balance
don’t
balance
with each
with
each
other.
other.
No one trusts
this report
~~~~~~~
~~~~~~~~
~~~~~~~
~~~~~~~~
100
100
80
60
East
40
West
20
North
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Purchasing
extract
(MS Access)
Joe downloads this manually
via Client Access every
Monday
.. except when he’s on
vacation or out with the flu!
Purchasing
Copyright, Coglin Mill, 2010
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
~~~~~~~
~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~
~~~~~~~~
~~~~~~~
~~~~~~~~
Summary Sales
by Region
Summary Sales by
Customer/Brand
Mary wrote this
extract. She left last
year and no-one
knows how it works.
Sales
The Net Sales
calculation in this
extract is different to
Mary’s
80
60
East
40
West
John spends 5 days
every month generating
this and massaging the
numbers until he thinks
it is correct
North
20
0
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Profitability
Extract
No one has yet realized
that this is loaded
incorrectly. The auditors
will be the first to discover
the problem
Financials
GL Summary
(Excel)
Enterprise Data Warehouse Architecture
Operational System(s)
Data Propagation
Extraction, Transformation and Loading
Data Staging Area
l
na
o
ti
ra ort
e
p p
l o sup
a
ic
ct sion
a
T ci
de
ODS
Data Warehouse
Data
Mart
Data
Mart
Data
Mart
Mfg
Finance
Sales
PC or Browser Web Visualization Products
Copyright, Coglin Mill, 2010
Cleansed,
Transformed
Data
BI
Applications
Extract Requirements

Identify and understand the required source data




Filtering/Selection




all rows or by transaction date
just new or changed records
exclude certain records
Decide on access mechanism



Copyright, Coglin Mill, 2010
local or remote
DB2 for i or other database
non-database (legacy structures, text files etc)
direct access
change data capture
staging of remote data
Extract /Stage Options

Change Data Capture



Non-intrusive on source systems
Can be based on remote journals
Possibly piggy-back on HA strategy
Capture – occurring constantly
Apply – at point in time
Capture
Detail
ed
Detailed
Journal Receivers
Apply
Detail
ed
Detailed
A
B
Journal
C
Application
Tables
Copyright, Coglin Mill, 2010
CDC
Consolidated
Messaging
Consolidate
d
ETL
Provides real-time ETL support
Extract /Stage Options

Direct Connection to remote DB2 for i data



DDM – performs well, especially if another LPAR on same system
Remote SQL connection
No need to stage data
BI System
Production System
DDM
SQL
Copyright, Coglin Mill, 2010
ETL
ETL
Data
Warehouse
Extract /Stage Options

Staging is often best approach for non IBM i data



Non IBM Family databases do not support full DRDA architecture
Often need to access same table in several ETL processes
Can simplify error recovery in event of a failure
BI System
Remote System
ODBC
JDBC
Copyright, Coglin Mill, 2010
ETL
ETL
Data Warehouse
Validation

Data Quality rules imply errors – which implies error reporting




stage the ‘bad’ data
do not load it, but just as importantly do not ignore it
provide a means to correct it and re-process
keep audit trail of errors
Source Tables
Good Data
ETL
Bad Data
~~~~~~~~~~
~~~~~~~~~~
~~~~~~~~~~
~~~
Error
Log/Report
Copyright, Coglin Mill, 2010
Transformations

The obvious (easy) transformations




calculations
string operations
lookup/replace
data conversion



Handling exceptions


if, then, else
Surrogate Keys


Copyright, Coglin Mill, 2010
numeric to character or vice versa
numeric dates & times to real date/time fields
artificial key, replacing the natural keys
give support for slowly changing dimensions
Surrogate Keys

multiple instances of same table, with duplicate key values
CUSTNO
1001
1002
1003
1004

Customer File - US
CUSTNAME
John Smith
Mary Jones
Chris Anderson
David Perry
Customer File - Canada
CUSTNO CUSTNAME
1001 Harry Potter
1002 Jeremy Carr
1003 Penny Hayes
1004 Debbie Thornton
or different versions of same entity


Incompatible data types
Duplicates
CUSTNO
1001
1002
1003
1004
Copyright, Coglin Mill, 2010
Customer File - US
CUSTNAME
John Smith
Mary Jones
Chris Anderson
David Perry
Customer File - Canada
CUSTID CUSTNAM
AA234
Julie Johnson
AA235
Fred Hunter
AB670
John Smith
BD309
Alan Jordan
Surrogate Keys
CUSTNO
1001
1002
1003
1004
Customer File - US
CUSTNAME
John Smith
Mary Jones
Chris Anderson
David Perry
Surrogate key is a
sequential number
with no correlation to
replaced value(s)
Customer File - Data Warehouse
CUSTNUMBER CUSTNAME
REGION OLDNUM
1
John Smith
US
1001
2
Mary Jones
US
1002
3
Chris Anderson
US
1003
4
David Perry
US
1004
5
Harry Potter
CANADA
1001
6
Jeremy Carr
CANADA
1002
7
Penny Hayes
CANADA
1003
8
Debbie Thornton
CANADA
1004
PK
Copyright, Coglin Mill, 2010
Customer File - Canada
CUSTNO CUSTNAME
1001 Harry Potter
1002 Jeremy Carr
1003 Penny Hayes
1004 Debbie Thornton
Secondary Index
Metadata

What is metadata?

descriptive information about the data sources, ETL
processes and target database (data warehouse, data
marts etc)




Technical metadata – column names, data types, keys etc
Business metadata – describes the use and meaning of pieces of
data, documents data quality rules etc
Administrative metadata – change history, authorities, load
statistics, usage statistics etc
Why is it important?


Copyright, Coglin Mill, 2010
it is the ‘roadmap’ to the information stored in the data
warehouse or data mart tables.
unlike the majority of end-user applications where access is
via program interfaces, some of your users will access the
tables directly. They need to know what the data means.
ETL Architecture
Production: IBM i
Remote
Journaling
Local
Production
data
Change Data
Capture
BI System or LPAR (IBM i)
Data Warehouse
ETL
Production
data
DDM/SQL
Data Marts
ETL
Other DB2 Family
DRDA
Production
data
ODS
Pre-stage
Other Databases, Sources
ETL
data
Metadata
Copyright, Coglin Mill, 2010
ETL - Build or Buy ?

Should I build my own ETL processes, or buy a tool?

do I have the resources?
do my people resources have all the necessary skills?
do I understand the size of the effort?
do I have the time to write a significant amount of code?

the answers to these questions is usually NO




The right ETL tool will




dramatically reduce the total cost
significantly reduce development time
allow you to concentrate on the business requirements
provide the environmental support for long term success
Copyright, Coglin Mill, 2010
ETL Tool Requirements

The tool should :

Be native to your platform (System i)




Provide remote data access
Support time and effort saving concepts such as change data capture from
journal images
Be highly productive






5 to 10 times increase in productivity compared to hand coding is not uncommon
Provide comprehensive error handling and reporting
Provide comprehensive auditing
Have comprehensive metadata support


critical for good performance
automatic capture
allow export to other tools
Include impact analysis tools
Provide a visual development environment
Copyright, Coglin Mill, 2010
RODIN ETL
Operational Data
Describe data
sources
Visual extract
builder
Data Warehouse &
Rule engine forData
business rules
Transformations
and data
Business Rules
quality rules
Extract
Mart Tables
Data Quality Rules
Detail
Define and build
data mart tables
Summary
Errors/Rejects
Manage rejects
and print error
reports
~~~~~~~~~~~
~~~~~~~~~~~
~~~~~~~~~~~
Error Report
Generate and store
extensive metadata
Metadata integrated
with DB2 Web Query
Copyright, Coglin Mill, 2010
Audit Data
Capture audit
data and print
~~~~~~~~~~~
reports
~~~~~~~~~~~
~~~~~~~~~~~
Audit Report
METADATA
Complete
development and
deployment
environment
RODIN Development Client
Visual
representations
used where
appropriate to aid
understanding
Copyright, Coglin Mill, 2010
RODIN Development Client
Powerful, easy to
use visual
development
environment
Copyright, Coglin Mill, 2010
RODIN Development Client
Sophisticated point
and click business
rules editor
Copyright, Coglin Mill, 2010
RODIN Development Client
Extensive set of
audit reports, error
reports and
metadata reports
available
Copyright, Coglin Mill, 2010
ETL Error management
Source Tables
Data Set
ETL
Process
Error suspense table.
Rejected data is written to
this table. After correction,
this data can be reprocessed by the extract in
error recovery mode.
Copyright, Coglin Mill, 2010
Error log.
Reason for error
is written out to
this table.
An Error report
is produced at
end of processing
if any errors
occurred.
~~~~~~~~~~~
~~~~~~~~~~~
~~~~~~~~~~~
ETL Auditing
Source Tables
Data Set
ETL
Process
Audit information includes
Number of source rows read,
omitted and rejected.
Hash totals for up to 3 numeric
source columns.
Number of data sets updated and
inserts/updates to each.
Elapsed time for all jobs.
etc…..
Audit Tables
~~~~~~~~~
~~~~~~~~~
~~~~~~~~~
Audit report produced at
end of extract.
Audit information retained in meta-data indefinitely
Copyright, Coglin Mill, 2010
Change Data Capture
Server Change Data Capture (Real-time)
Capture
Tables
Source
Tables
Journal
Receivers
Apply
Tables
Detailed
Detailed
Detailed
Detailed
Consolid
Consolidate
ated
Consolid
Consolidate
ated
Replica
Replica
Replica
Replica
A
B
Journal
CDC
Server
d
d
C
Both Local and
Remote Journals
supported
Copyright, Coglin Mill, 2010
Remote Data Access
Remote DB2 for i5/OS data
IBM
IBM
System i
System i
Remote Journaling
and Change Data
Capture
Real-Time connection
via DDM
IBM
RODIN Data Warehouse,
running on System i
Copyright, Coglin Mill, 2010
RODIN Remote Data Access
Direct DRDA connectivity
for IBM Family databases (DB2 UDB and Informix)
Mainframe
IBM
Unix/Linux/Wintel
IBM
Real-Time connections
to specified sources
via DRDA/CLI
Copyright, Coglin Mill, 2010
RODIN Data Warehouse,
running on System i
RODIN Remote Data Access
RODIN eGateway
for DB2, Oracle, MS SQL Server,
MS Access, Sybase and MySQL
Unix/Linux/Wintel
JDBC connections to
source databases
IBM
Pre-stages data on
System i in local DB2
tables
Copyright, Coglin Mill, 2010
RODIN Data Warehouse,
running on System i
RODIN ETL Architecture
Production System i
Change Data
Capture
DDM
Local
Production
data
Change Data
Capture
Data Warehouse
Remote
Journaling
Production
data
BI System or LPAR (System i)
RODIN
ETL
DDM
Data Marts
RODIN
ETL
Other DB2 or Informix
DRDA
Production
data
JDBC
ODS
Pre-stage
RODIN
ETL
data
Other Sources
JDBC
Production
data
Copyright, Coglin Mill, 2010
RODIN Metadata
Local Customers
The Tire Rack, South Bend
1st Source Bank, South Bend
Alro Steel, Jackson Michigan
Copyright, Coglin Mill, 2010
RODIN Success Stories
Copyright, Coglin Mill, 2010
For more information:
call 1-866-RODIN-DW
visit www.thinkrodin.com
(evaluation software available for download)
Copyright, Coglin Mill, 2010