PowerPoint Template

Download Report

Transcript PowerPoint Template

Business Intelligence
Outline
 Data , Information & Knowledge
 Business Intelligence
– Definition
– Important
– History
– Architecture
 Data Warehouse
– Definition
– Characteristics
– OLAP
– Conceptual Modeling
– Cube
Page  2
Data , Information & Knowledge
 Data: symbols
 Information: data that are processed to be useful; provides answers
to "who", "what", "where", and "when" questions
 3. Knowledge: application of data and information; answers "how"
questions (Gene, 2004)
Page  3
Data , Information & Knowledge
Wisdom
Knowledge
Information
Data
Noisy data
Page  4
What is Business Intelligence?
The gathering, storing, analysis, and
dissemination of data/information to
facilitate informed business decisionmaking.
Page  5
Page  6
Page  7
How Important is BI?
Top 10 Business and Technology Priorities for 2011:
1. Cloud computing
2. Virtualization
3. Mobile technologies
4. IT Management
5. Business Intelligence
6. Networking, voice and data communications
7. Enterprise applications
8. Collaboration technologies
9. Infrastructure
10. Web 2.0
Source: Gartner’s 2011 CIO Agenda (aka “Reimagining IT: The 2011 CIO
Agenda”)
Page  8
History of Business Intelligence
Previous Names
 Decision Support Systems
 Executive Information
Systems
 Online Analytic Processing
Related Concepts
 Competitive Intelligence
 Market Intelligence
 Customer Intelligence
 Strategic Intelligence
 Technical Intelligence
• Knowledge Management
• Data Miming
Page  9
BI Architecture
DATA SOURCES
STORAGE AREA
DATA WAREHOUSE
DECISION SUPPORT
Application
Databases
_________
_________
_________
_________
_________
_________
Packaged
application/ERP
Data
Desktop Data
DATA
MARTS
EXTRACTION
TRANSFORMING
Loading
(ETL)
Reports
DATA
WAREHOUSE
OLAP
External Data
Web-based Data
Page  10
Statistical & Financial
Analysis
Extraction, Transformation, and Loading (ETL)
The process of data consolidation is often called
Extraction, Transformation, and Loading (ETL)
– The ETL process extracts data from the various source
systems
– Data is then transformed to make it consistent and improve
data quality
– The consolidated, consistent, and cleaned data is then
loaded into a data repository
Developing the ETL process often consumes 80% of
the development time
Page  11
Page  12
Page  13
Page  14
(ETL) Tools
Some ETL Tools
– Oracle Data Integrator (ODI)
– Informatica
– IBM Ascential
– SSIS (Microsoft SQL SERVER)
Page  15
What’s a Data Warehouse?
 A single, integrated source of decision support information
formed by collecting data from multiple sources, internal to
the organization as well as external, and transforming and
summarizing this information to enable improved decision
making.
It is designed for easy access by users
to large amounts of information,
and data access is typically
supported by specialized analytical
tools and applications.
Page  16
OLAP Tools
facilitates
accessing to a
Data warehouse
Data Warehouse Characteristics
Key Characteristics of a Data Warehouse
 Integrated
 Time-variant
 Non-volatile
Page  17
Integrated
• Data is stored once in a single integrated location
(e.g. insurance company)
Auto Policy
Processing
System
Customer
data
stored
in several
databases
Page  18
Data Warehouse
Database
Fire Policy
Processing
System
FACTS, LIFE
Commercial, Accounting
Applications
Subject = Customer
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly
longer than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical perspective (e.g., past 5-10
years)
Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time element”
Page  19
Non-Volatile
• Existing data in the warehouse is not overwritten or
updated.
External
Sources
Production
Databases
Data
Warehouse
Environment
Production
Applications
• Update
• Insert
• Delete
Page  20
• Load
Data
Warehouse
Database
• Read-Only
Data Warehouse vs. Operational DBMS

OLTP (on-line transaction processing)



Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)

Major task of data warehouse system

Data analysis and decision making
Page  21
What is OLAP?
 Online analytical processing (OLAP) refers to the general activity of
querying and presenting text and number data from data warehouses
and/or data marts for analytical purposes.
Page  22
OLTP vs. OLAP
OLTP
OLAP
users
clerk, IT professional
knowledge worker
function
day to day operations
decision support
DB design
application-oriented
subject-oriented
data
current, up-to-date
detailed, flat relational
isolated
repetitive
historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans
unit of work
read/write
index/hash on prim. key
short, simple transaction
# records accessed
tens
millions
#users
thousands
hundreds
DB size
100MB-GB
100GB-TB
metric
transaction throughput
query throughput, response
usage
access
Page  23
complex query
Conceptual Modeling of Data Warehouses
Modeling data warehouses: dimensions & measures
– Star schema: A fact table in the middle connected to a set of dimension
tables
– Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Page  24
Example of Star Schema
time
item
time_key
day
day_of_the_week
month
quarter
year
Sales Fact Table
time_key
item_key
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
Page  25
item_key
item_name
brand
type
supplier_type
location
location_key
street
city
state_or_province
country
Example of Snowflake Schema
time
time_key
day
day_of_the_week
month
quarter
year
item
Sales Fact Table
time_key
item_key
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
Page  26
item_key
item_name
brand
type
supplier_key
supplier
supplier_key
supplier_type
location
location_key
street
city_key
city
city_key
city
state_or_province
country
Dimensional Modeling
Dimensional modeling = data warehouse modeling technique
2 types of tables: facts and dimensions.
A fact table contains one or more measures (usually numerical)
of a subject that is being modeled for analysis.
Dimension tables contain various descriptive attributes (usually
textual) that are related to the subject depicted by the fact table.
The intent of the dimensional model is to represent relevant
questions whose answers enable appropriate decision making in a
specific business area
Page  27
From Tables to Data Cubes
 A data warehouse is based on a multidimensional data model which views
data in the form of a data cube
 A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
 Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)
 Fact table contains measures (such as dollars_sold) and keys to each of
the related dimension tables
 In data warehousing literature, an n-D base cube is called a base cuboid.
The top most 0-D cuboid, which holds the highest-level of summarization,
is called the apex cuboid. The lattice of cuboids forms a data cube.
Page  28
Cube: A Lattice of Cuboids
all
time
0-D(apex) cuboid
item
time,location
location
supplier
item,location
time,supplier
1-D cuboids
location,supplier
2-D cuboids
item,supplier
time,location,supplier
3-D cuboids
time,item,supplier
item,location,supplier
4-D(base) cuboid
Page  29
Typical OLAP Operations
 Roll up (drill-up): summarize data
 by climbing up hierarchy or by dimension reduction
 Drill down (roll down): reverse of roll-up
 from higher level summary to lower level summary or detailed data, or
introducing new dimensions
 Slice and dice: project
and select
 Pivot (rotate):
 reorient the cube, visualization, 3D to series of 2D planes
 Other operations
 drill across: involving (across) more than one fact table
 drill through: through the bottom level of the cube to its back-end relational
tables (using SQL)
Page  30
Page  31
Thank You