Transcript Step8-9-10

Business Intelligence
Dr. Mahdi Esmaeili
1
2
Step 8: Database Design
Operational Databases
BI Target Databases
•Geared toward eliminating redundancy, coordinating
updates, and repeating the same types of operations
many times a day, every day (for example, airline
reservations, deposits and withdrawals from bank
accounts, hotel room reservations).
•Geared toward supporting a wide range of queries and
reports. Queries and reports may vary from one business
analyst to another or from one department to another. All
of the queries and reports may not run on the same day
and may not run every day (for example, quarterly trend
analysis reports on regional sales, monthly order
fulfillment report).
•Most of the transactional systems require subsecond •Although response time is important, subseconds cannot
response time.
be expected. Typical response times are seconds, minutes,
or hours.
•Highly normalized to support consistent updates and •Highly denormalized to provide quick retrieval of a wide
maintenance of referential integrity.
range and a large amount of data. Data that belongs
together from an analytical reporting perspective is
usually stored together.
•Store very little derived data. Data is usually derived •Store large amounts of derived data. This saves time for
dynamically when needed.
the queries and reports.
•Do not store historical data. Historical records are
archived.
•Store large amounts of historical data, often at some level
of summarization, but just as often at a detailed level.
•Lightly summarized, mostly for reporting purposes.
•Many levels of precalculated, summarized data, from
lightly summarized to highly summarized.
3
Logical Database Design
• The Star Schema
• The Snowflake Schema
• The Constellation Schema
Physical Database Design
•
•
•
•
•
•
•
•
Implementation Options
Physical Dataset Placement
Partitioning
Clustering
Indexing
Reorganizations
Backup and Recovery
Parallel Query Execution
4
5
Deliverable Resulting
• Physical data model
• Physical design of the BI target databases
• Data definition language
• Data control language
• Physical BI target databases
• Database maintenance procedures
6
Roles Involved in This Step
• Application lead developer
• Data administrator
• Database administrator
• ETL lead developer
7
Step 9: Extract/Transform/Load Design
8
Integrated BI Implementation Strategy
9
Sets of ETL Programs
1
Initial Load
Initial population of BI target databases with current operational data
2
Historical Load
Initial population of BI target databases with archived historical data
3
Incremental Load
Ongoing population of BI target databases with current operational data
10
ETL Processes
11
Designing the Transformation Programs
Using the 80/20 rule, 80 percent of ETL work occurs
in the "T" (transform) portion when extensive data
integration and data cleansing are required, while
extracting and loading represent only 20 percent of
the ETL process
12
Source Data Problems
• Inconsistent primary keys
•
•
•
•
•
Inconsistent data values
Different data formats
Inaccurate data values
Synonyms and homonyms
Embedded process logic
13
Common Use of ETL Tools in the ETL Process
14
Evaluating ETL Tools
1. Perform a cost-benefit analysis to compare licensing (buying) an ETL
product with building the ETL process in-house
2. Compile a list of ETL products and vendors that are likely to meet your
requirements
3. Compare the ETL products and vendors to your weighted data
transformation requirements
4. Evaluate each ETL product objectively and prepare a scorecard that
compares the product features and their effectiveness
5. Check the vendors' client references by talking with people at organizations
that already use the tools you are considering
6. Narrow the list of ETL products and vendors to a short list of two or three
candidates
7. Arrange for product demos since "seeing is believing."
8. Test the vendor products even though it takes away time from your project
schedule
15
16
Deliverable Resulting
• Source-to-target mapping document
• ETL process flow diagram
• ETL program design document
• Staging area
17
Roles Involved in This Step
• Data quality analyst
• Database administrator
• ETL lead developer
• Subject matter expert
18
Step 10: Meta Data Repository Design
19
Centralized Meta Data Repository
20
Custom-Built Centralized Meta Data Repository
Advantages
Disadvantages
•A customized database design incorporates all
meta data requirements.
•Full-time staff is needed to maintain the meta
data repository database and the meta data
reports.
•The front end for access and the interfaces to
•The front end for access and the interfaces to
tools (ETL, OLAP, and so on) are custom designed tools must be programmed and maintained, both
to meet all requirements.
of which are time-consuming processes.
•Reports as well as help functions are designed
exactly as desired.
•The meta data repository would have to be
enhanced periodically (sometimes redesigned)
because it cannot be built with all functionality
from the start.
•Technicians have full control over the design and •Content may become out of synch with the
functionality of the meta data repository.
proprietary dictionaries of the tools and the
DBMS
21
Licensed Centralized Meta Data Repository
Advantages
Disadvantages
•Time is saved by not having to design and •The "plain vanilla" version of the licensed
build a meta data repository database,
product will probably not satisfy all meta
interfaces, front end, and reports.
data requirements. Therefore, a full-time
administrator is needed to maintain and
enhance the licensed product.
•Most licensed meta data repository
products come with interfaces, and most
come with a full set of APIs.
•There will be a learning curve to become
familiar with the product's architecture,
interfaces, and APIs.
•If the meta data repository product is
certified for the tools where the meta
data resides, it will provide the tool
interfaces.
•The more sophisticated the meta data
repository product is, the more expensive
it is, and the more skills the technicians
need to maintain it.
22
Decentralized Meta Data Repository
23
Decentralized Meta Data Repository
Advantages
Disadvantages
•Various owners can maintain and manage their
own sets of meta data separately.
•Controlling redundancy across multiple meta data
repositories and keeping the meta data consistent is
difficult.
•Meta data repository databases are smaller and
easier to use because each database contains only
those meta data components that are of interest to
a specific group of business people.
•It will take longer to maintain and manage multiple
databases on multiple platforms. There could also
be synchronization problems with new DBMS
releases.
•Each meta data repository can have its own meta
model, that is, its own customized design.
•Communication among the custodians of the
various meta data repositories will have to increase.
Plus, it will require maintaining a meta-meta model,
which is an integrated (merged) overall architecture
of multiple meta models.
•Reports can be customized for each individual meta •Relating meta data across various databases may
data repository.
be difficult. For example, business meta data is not
automatically linked to technical meta data if they
reside on different databases.
•A gateway makes the name and location of the
meta data repository transparent to the person
accessing it.
•The architecture of this solution is more
complicated and the learning curve to use multiple
databases with potentially different designs may be
high.
24
Distributed XML-Enabled Meta Data Solution
25
Distributed XML-Enabled Meta Data Solution
Advantages
Disadvantages
•XML tags enable access of meta data across any type of data
•The initial tagging of all meta data with XML
storage through standardized categorization and tagging of meta tags is a manual and laborious process. Plus,
data components.
XML tagging cannot be used for all meta
data.
•Meta data never has to be duplicated or moved from its original •XML tags add to the storage requirements
source (except for reporting purposes).
for the dictionary databases that store meta
data (DBMS and tool dictionaries).
•A gateway makes the location of the meta data transparent to
the person accessing it.
•A meta-meta model has to be created as a
map of all the various types of meta data
storage, each of which is designed according
to its own unique meta model.
•Standard Web search engines should be able to locate any meta •DBMS and tool vendors must follow
data anywhere.
industry standards[*] for meta data XML tags
in order to enable seamless meta data
access across all products. Multiple
standards need to be supported.
•Meta data and business data can be coupled and transmitted
simultaneously.
•Not all DBMSs and tools are XML-enabled.
This is a bleeding-edge and unproven
technology.
26
Designing a Meta Data Repository
• Entity-Relationship Design
• Object-Oriented Design
27
Entity-Relationship Design
Advantages
Disadvantages
•E-R designs are easy to read and easy to •Changes and enhancements may require a
understand.
database redesign, as well as unload-ing
and reloading the meta data repository.
•Because of the intuitive and explicit
nature of the design, queries can be
written with relatively simple SQL
statements.
•The physical meta model is fairly large,
with many objects and many relation-ships,
which makes the architecture somewhat
complex.
•E-R designs are easy to implement as
relational database structures.
•Meta data objects and their relationships
must be very well defined and under-stood
for the physical meta model to be accurate
28
Object-Oriented Design
Advantages
Disadvantages
•OO designs are extremely flexible; they
will not need any database redesigns
when changes are necessary.
•Since the object named Object contains
all instances (rows) of meta data, this
table will become very large. This may
affect access performance.
•OO designs are simplistic and therefore
easy to maintain and to enhance.
•Queries are much more difficult to write
and will require many recursive JOINs.
Advanced SQL knowledge is required.
•OO designs are easy to implement as
object-oriented database structures.
•OO designs require a high learning
curve. The very abstract physical data
model is difficult to comprehend, and the
extensive rules take time to understand
29
Licensing (Buying) a Meta Data Repository
(Expansion capabilities)
•
•
•
•
•
•
•
•
•
Adding meta data objects
Adding relationships
Changing inappropriate relationships
Adding meta-meta data attributes to the meta data objects
Changing the size and length of meta-meta data components
Customizing vendor-provided reports
Creating and storing code for additional reports
Importing meta data from other tools
Exporting meta data to other tools
30
Example of Product Evaluation Criteria with Weights
Criterion #
Product Evaluation Criteria
Weight
1
Product closely matches our logical meta model
10
2
Product can satisfy our important meta data requirements
6
3
Product can satisfy our optional meta data requirements
1
4
Product can be expanded
8
5
Product has interfaces
9
6
Product has a Web front end
4
7
Product has APIs
9
31
Example of Product Ratings
Criterion #
Product
1
2
3
4
5
6
7
Autumn
Dictionary
3
7
6
0
9
9
10
Helixor
9
0
2
8
6
0
5
Leeches
Repository
6
6
1
6
4
0
7
Springrep
8
2
0
10
10
2
10
Tamroller MDR
7
5
5
0
6
2
7
32
Example of Weighted Product Ratings
Product
1
2
3
4
5
Springrep
80
12
0
80
90
Autumn
Dictionary
30
42
6
0
Helixor
90
0
2
Leeches
Repository
60
36
Tamroller
MDR
70
30
6
7
Total
Score
8
90
360
81
36
90
285
64
54
0
45
255
1
48
36
0
63
244
5
0
54
8
63
230
33
34
Deliverable Resulting
• Physical meta model
• Data definition language for the meta data repository
• Data control language for the meta data repository
• Meta data repository programming specifications
35
Roles Involved in This Step
• BI infrastructure architect
• Data administrator
• Meta data administrator
36