click here for slides

Download Report

Transcript click here for slides

MIS5101:
Business Intelligence
Data Integration and Extract,
Transform, Load
Sunil Wattal
Where we left off…
Structured Data
vs
Unstructured Data
The Information Architecture of an
Organization
We’re here
now…
Data
entry
Data
extraction
Transactional
Database
Data
analysis
Analytical Data
Store
Stores real-time
transactional data
Stores historical
transactional and
summary data
Called OLTP:
Called OLAP:
Online
transaction
processing
Online
analytical
processing
How they all relate
The data in
the
operational
database…
…is put into
a data
warehouse…
…which
feeds the
data mart…
Now let’s address this part…
…and is
analyzed as
a cube.
Extract, Transform, Load - ETL
Copying data from the
transactional database
to a format where it
can be analyzed
Selecting and resolving
inconsistencies in data
to fill the analytical
data store
ETL Defined in a “relational” world
Extract
Transform
Load
from the
operational
data store
it into a
consistent,
analysis-ready
format
it into the
analytical
database
(the relational
database)
(the
dimensional
database)
Extract
Transform
Load
Query
Database 1
Query
Data conversion
How do these technologies
affect this process?
• In-memory analytics?
• NoSQL
Database 2
Query
Data conversion
Data Mart
Query
On-Demand
Reporting
Main ETL Issues: Conversion Stage
Data
Consistency
• What if the data is in different
formats?
Data
Quality
• How do we know it’s correct?
• What if there is missing data?
• What if the data we need isn’t
there?
Give examples of data inconsistences
in retail
in
healthcare
in finance
How do you resolve them?
Conflicts abound…
Why might there be
resistance to this type of
aggregation?
Is it an option to just “fix”
the transactional (source)
databases?
If two data elements
conflict, who’s standard
“wins?”