Transcript Berg_EDW_v4

What you need to know to
get the most out of your
Enterprise Data Warehouse
Dr. Bjarne Berg
Comerit
© 2014 Wellesley Information Services. All rights reserved.
What We’ll Cover …
• Introduction
• EDW architectural options



Federated Data Warehouse
Centralized Data Warehouse
Distributed Data Warehouse
• Data Integration challenges
Masterdata
 Transaction data conversion
 Data cleansing

• The IT "Jail"
• Creating The Support Organization
• The top 10 EDW pitfalls
• Wrap-up
2
In this session.
 We will take a look at the pros and cons of EDW architectural
options, including federated, centralized, and distributed
EDW models, and explore when each approach is
appropriate.
 See options for consolidating different master and
transactional data.
 Weigh your options for building a centralized or a
decentralized EDW support organization.
 Examine the top 10 pitfalls companies face when
implementing SAP NetWeaver BW as their EDW and how to
overcome them.
3
A Quick Definition: BI Vs. Data Warehousing
Data warehousing is the act of extracting, transferring,
transforming, storing and retrieval of data for reporting
and analytical purposes.
Business Intelligence (BI) is a terminology for applications
that uses data stores for analytical purposes.
KEY CONCEPT:
BI applications are not required to run
on top of data warehouses, but the
majority does
4
Before you start: Know your DW Governance model
Many EDW efforts fail, due to the
IT governance changes needed to
be successful.
EDWs rarely succeeds in businesses
modeled as federal, feudal, IT
duopoly or anarchy.
Know your organization
before attempting a
centralized EDW effort - do
you have enough sponsorship
to make real changes? 5
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
6
A Logical Enterprise DW Architecture
Metadata
Source Data
Extract
Operational
Data Store
Transform
Data
Warehouse
Functional Area
Invoicing
Systems
Purchasing
Systems
General
Ledger
Other Internal
Systems
External Data
Sources
Custom
Developed
Applications
Purchasing
Data
Extraction
Integration
and
Cleansing
Processes
Marketing
and Sales
Corporate
Information
Data
Mining
Translate
Attribute
Summation
Calculate
Product Line
Derive
Location
BI Applications
Summarize
Segmented
Data Subsets
Summarized
Data
Synchronize
Statistical
Programs
Query Access
Tools
Data Resource Management and Quality Assurance
Source: Bjarne Berg, “Introduction to Data Warehousing”
The Federated Data Warehouse (FDW) Architecture
Metadata
IT Driven Data
Warehouses
SAP BW(s)
SAP
BW InfoCubes
SAP DSOs
SAP BWA
Data Warehouse(s)
DW
Star-schemas
DW ODSs
IT Developed
Semantic Layer
SAP BOBJ
OLAP
Universes
SAP BOBJ
SQL
Universes
Direct
Connections
SAP BOBJ
Data Services
IT Support &
Development
Business Driven
BI Applications
Security
Ad-Hoc
Webi
Training
OLAP
Analysis
User Support
Projects
Ad-hoc
Synchronization
Dashboards
Xcelcius
Batch reports
Crystal
BEx
Explorer
Custom and
3rd party
BPC
External
Applications
Enterprise Portal
Users
Financial
Report center
Sales
Report center
Employees
Manufacturing
Report center
HR
Report center
Customers
Partner facing
Report center
Customer facing
Report center
Ad-Hoc
Report center
Data Resource Management and Quality Assurance
Partners
8
Federated Data Warehouse (FDW) Architecture
• Federated Data Warehouses are best in very large organization where
development is separated by geography, organizational boundaries, or
where multiple data warehouses exists due to mergers & acquisitions.
• To make FDWs successful, there needs to be a rapid convergence to
standardized technologies. This include:
 Same type of databases and support pack levels (costs and compatibility)
 Same technical platforms Hardware, Backups and Archiving (costs)
 Shared Portal and user interface strategy (reduced training and support)
 Shared security design and centralized administration (risk management)
If the data is federated you gain faster response time to business needs,
can execute multiple projects in parallel, and work 24/7 across the
globe. But without any standardization, it can also be very costly.
9
The Centralized Data Warehouse (CDW) Architecture
Metadata
IT Driven Data
Warehouses
IT Developed
Semantic Layer
OLTP sources
SAP ECC
Siebel, JDE
Oracle
Others
SAP BW
SAP
BW InfoCubes
SAP DSOs
SAP BWA
SAP BOBJ
OLAP
Universes
SAP BOBJ
SQL
Universes
Direct
Connections
SAP BOBJ
Data Services
IT Support &
Development
Business Driven
BI Applications
Security
Ad-Hoc
Webi
Training
OLAP
Analysis
User Support
Projects
Ad-hoc
Synchronization
Dashboards
Xcelcius
Batch reports
Crystal
BEx
Explorer
Custom and
3rd party
BPC
External
Applications
Enterprise Portal
Users
Financial
Report center
Sales
Report center
Employees
Manufacturing
Report center
HR
Report center
Customers
Partner facing
Report center
Customer facing
Report center
Ad-Hoc
Report center
Data Resource Management and Quality Assurance
Partners
10
Centralized Data Warehouse (CDW) Architecture
• Centralized Data Warehouses are great for small and mid-size data
warehouses (less than 15-40Tb). There are great benefits in terms of
the ease to mange upgrades, support packs, enforcing development
standards, transport control, master data management and the overall
total cost of ownership
• To make CDWs successful, there needs to be:






Adequate funding of hardware, application servers, database servers
Serious consideration should be made to move BI and reporting to BWA
Focus on using the database capacity on storage and data loads-- not queries
No direct reporting from DSOs (takes too much system resources)
Broadcasting , caching and performance tuning is a dedicated support effort
A plan for data partitioning and archiving needs to be in-place as soon as the
system exceeds 5-8 TB.
If the data is centralized it is faster to develop new solutions for the
business and merging from different data sources are easier
11
The De-centralized Data Warehouse (DDW) Architecture
Metadata
IT Driven Data
Warehouses
IT Developed
Semantic Layer
SAP BW(s)
SAP
BW InfoCubes
SAP DSOs
SAP BWA
SAP BW(s)
SAP
BW InfoCubes
SAP DSOs
SAP BOBJ
OLAP
Universes
SAP BOBJ
SQL
Universes
Direct
Connections
SAP BOBJ
Data Services
IT Support &
Development
Business Driven
BI Applications
Security
Ad-Hoc
Webi
Training
OLAP
Analysis
User Support
Projects
Ad-hoc
Synchronization
Dashboards
Xcelcius
Batch reports
Crystal
BEx
Explorer
Custom and
3rd party
BPC
External
Applications
Enterprise Portal
Users
Financial
Report center
Sales
Report center
Employees
Manufacturing
Report center
HR
Report center
Customers
Partner facing
Report center
Customer facing
Report center
Ad-Hoc
Report center
Data Resource Management and Quality Assurance
Partners
12
De-centralized Data Warehouse (DDW) Architecture
• A Decentralized Data Warehouses makes sense if there are logical
division between business units, geographies and little shared reporting
I.e. in a conglomerate organization with diverse business units.
• The benefits of DDWs include the flexibility of the FDW with the
technology standardization and lower cost of ownership of the CDW. To
make DDWs successful, there needs to be:
A formal Masterdata Management (MDM) strategy with clearly defined standards
A rule based data cleaning and data integration plan for centralized reporting
A shared hardware location to keep costs lower
Tight integration with upgrades, support packs and interface standards
With DDWs there is a risk of creating stove-pipe data marts that
cannot be integrated at the corporate level without very high costs.
13
Recommendations CDW, FDW and DDW Architectures
Federated Data Warehouse
(FDW)
Best for very large organization
where development is separated
Organization by geography, organizational
boundaries, or where multiple
data warehouses exists due to
mergers & acquisitions.
Max. Size
Success
factors
Centralized Data
Warehouse (CDW)
De-centralized Data
Warehouse (DDW)
Best for small and mid-size data
warehouses in organizations.
If there are logical division
between business units,
geographies and little shared
reporting I.e. in a conglomerate
organization with diverse
business units.
Virtually unlimited
40+ Tb
Virtually unlimited
Use same type of databases, ETL
tools and support levels (costs &
compatibility)
Use the same O/S, Hardware,
Backups and Archiving systems
(costs)
Shared Portal and user interface
strategy (reduced training and
support)
Shared security design and
centralized administration
(information risk management)
Adequate funding of hardware,
application servers, database
servers
Implement BWA
A formal Masterdata
Management (MDM) strategy
with clearly defined standards
A rule based data cleaning and
data integration plan for
centralized reporting
Use the database capacity on data Use a shared hardware location
loads not queries
to keep support costs lower
Direct reporting from DSOs
should not be allowed
Tight integration with upgrades,
support packs and interface
standards
Performance tuning should be a
dedicated support team effort
Without any standardization, it
can be very costly.
Issues
Performance can be poor.
An archiving plan is essential
when the system exceeds 5-8 Tb.
There is a risk of creating stovepipe data marts that cannot be
integrated at the corporate level
without very high costs.
14
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
15
The 3-Tiers of Information Management
For all data warehouses 60-80% of the effort is to move, store, retrieve
and integrate data from various source systems.
Information management is six distinct efforts. Therefore, several tools
exists with different capabilities
Applications
ERP, SCM,
CRM
Information
Management
Structured
Data
Business
Intelligence
Performance
Management
Data Synchronization &
Migration
Data Federation
Text Analysis
Masterdata Mgmt.
Data Integration
Data Quality
Metadata Mgmt.
RDBMS
RDBMS
ERP
ERP
Unstructured
Data
Notes
Web
Email
Docs
1616
The BI Data Services Architecture
Data integration in an EDW can be done with ETL tools like SAP BOBJ
Data Services. The tool architectural can be illustrated in terms of source
data, process and target data.
Target
Data
PeopleSoft
Siebel
Oracle DB
SQL DB
XML
Mainframe
Excel
Oracle
Apps
SAP R/3
SAP ECC
SAP BI
NetWeaver
DB2
Files
Others
Process
Data
Validation
Data
Cleansing
Data
Auditing
Data Services Engine
Data Profiling
Source
Data
PeopleSoft
Siebel
Oracle DB
SQL DB
XML
Mainframe
Excel
Oracle
Apps
SAP R/3
SAP ECC
SAP BI
NetWeaver
DB2
Files
Others
17
Reconciliation Between Systems
The majority of time
spent on maintaining a
complex EDW is the
time spent on reconciliation
of the data
You have to prove that the data in the warehouse is equal to the data
you extracted, or your financial reporting systems will have no
credibility.
You are also legally required to have a reconciliation process that can
be tracked, if you use the warehouse for financial reporting to external
entities.
Reconciliation Between Systems- Dashboards
Many companies invest in developing manual control queries, while
others use reconciliation products that are powered by SAP NetWeaver
An example of a reconciliation Dashboard built
on SAP BW. In this example:
1. A reconciliation memo was written on Feb. 1st
2. PCA reconciliation between BW and R/3 failed
on Feb. 16th
Interesting use for SAP NetWeaver BI
Using an ETL Tool like BOBJ Data Services you can
consolidate data from many source systems, cleanse and
integrate them before you send it to the EDW. This avoids
complex logic.
Source systems
- Oracle
- JDE
- Peoplesoft
- Baan
- Siebel
- Custom
- Hyperion
- Other.
2020
Data Cleansing Capabilities
The Validation
Validation allows you to create rules for cleaning data prior to
loading it to the system. You can have a pass rule and an 'Action on
Failure' that can provide complex logic.
21
Data Cleansing Capabilities
The Audit
The Auditing selection allows
you to take complex actions
when the data quality is poor.
You can:
1.
Send an email to an administrator
2.
Load the data to a table for later
correction
3.
Modify the data through scripts
4.
Create custom functions for your
own processing logic
22
Universal Data Cleansing: Example of Enhanced Party Masterdata
You can also add new items such as geocodes for visualization in SAP
BI I.e. maps
You can add new
characteristics to the data
such as:
1) Legal tax jurisdictions
2) Census track ID
3) Block group ID
4) Insurance rating territories
5) Tax authority name
6) Tax authority FIPS codes
7) Longitude & Latitude
8) City type
9)...
GREAT FEATURE: The Census track ID allows
you to analyze your customers and partners using
government census information
Source: SAP AG, 2009
Universal Data Cleansing: Customer Aggregating & Discovery
A common way to look at
customer data is by Households
instead of single records.
BOBJ DQ allows you to look at customer's addresses and create
shared master records, customer mapping keys, aggregating data
(i.e. aggregated sales data for the household), check "no-call" lists,
examining churn (apparent customer turn-over).
You can also integrating all master data from many records into a
single "super record" that contains all the unique master data you
have about a single customer or partner.
Universal Data Cleansing: Data integration & BAS
The Business Address Service (BAS) feature can:
1) Use Postal reference files from 190 countries to clean address,
including suggestion lists
2) Data scans and searches in SAP for duplicate records using partial
user input.
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
26
Separate the Data Warehouse from the BI solutions
IT cannot hold BI ‘hostage’ with long delivery times
and slow responses to changing user demands.
The only way to be successful is to provide flexible data structures
and cleansed, integrated data to the business and let the business
groups take over the BI development.
So what is needed is a stronger emphasis on scalable, fast IT
solutions and a ramp up of BI capabilities of the business units.
Keeping BI front-end solutions such as Webi, Visual Composer
and Analysis in the hands of IT instead of the business will create
inflexible systems that are unlikely to succeed.
27
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
28
BI Support Organization — Big Picture
You need to separate the operations of BI systems from the project work
If there is no support organization, the BI system quickly becomes an
orphan when the project ends
Without a
support org. there
is a risk that
future BI projects
are delayed since
the project team
has to support
previous
projects
The BI Help Desk — Level 1 Support
The first level support
should be done by
Power Users in the
organization
You will have to train
these resources,
empower them to make
changes, and leverage
them as much as
possible, even when it
is easy to “jump to
solutions”
Query related support tickets from a central
location/Web site should be routed to the
power users in each department.
The power user can escalate the ticket to Level2 support if he/she is unable to resolve it.
The BI Help Desk — Level 2 Support
The second level support
is used for issues that are
not related to queries,
presentations, reports,
and formatting
This include data loads,
performance, security,
availability, training
schedules, etc.
Some support ticket types are always
routed to Level 2 support.
This is addressed by the
central support team
It is important to have a generic email
address for Level 2 support that is not
related to an individual. Emails to this
address should not be deleted.
Break-Fix - Splitting Projects & Support Environments
Break fix and Production stack
BWB
Project Stack
BWD
BWS
BWQ
BWP
The Break-Fix and
production stack as well
as the training
environment is owned
by the support team.
Training
BWT
The project teams own
the development and
Sandbox environments
(BWS and BWD).
By Introducing a Break-Fix (BWB) environment, the support team can
correct break-fixes and move code into the Testing environment (BWQ) and
Production environment (BWP) without impacting the project team
Transports can be captured in the buffer and moved to the Development
environment (BWD) on a periodic basis
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
33
Pitfall #1: Lack of Reasonable SLA with EDW Support Team
Some examples of reasonable performance include:
1.
2.
3.
4.
90% of all queries run under 20 seconds
System is available 98% of the time
Data loads are available at 8am — 99% of the time
User support tickets are answered within 30 minutes
(first response)
5. User support tickets are closed within 48 hours — 95% of the time.
6. System is never unavailable for more than 72 hrs — including
upgrades, service packs, and disaster recovery
7. Delta backups are done each 24 cycle and system backups are
done every weekend
More EDW Pitfalls….
Pitfall #2: Jack-of-all-trades  Master of none….
 BI is complex with many different tools and technologies. Don’t rely
on a single person with no specialized skills.
 Make each person responsible for a focused technology/task.
Pitfall #3: An army of ‘Architects’ who don’t understand Technology.
 Have one ‘architect’ – quality is more important than quantity
 Architecture is technical by nature. PowerPoints only gets you a
small part of the way.
 The BI architect should know the technology better than anyone in
the room and be able to design solutions.
More EDW Pitfalls….
Pitfall #4: Not separating the Support Team from the Project team
 Keeping the ‘lights-on’ is a core focus area.
 Many EDWs fail because of lack of training, production and user
support, and by having nobody around to do continuous
improvements.
Pitfall #5: A Firm Belief in Monolithic Data Warehouses
 Google runs on over 500,000 servers, why must your data warehouse
run on one?
 Divide and concur when the performance becomes a too-large
problem.
 You don’t need a monolithic castle, but storage
& performance
More EDW Pitfalls….
Pitfall #6: Analysis Paralysis.
 You will never have perfect EDW requirements – get over it….
 The business will change and so will the BI system. Change is a sign
of success not failures (people who cares wants to make it better).
 Not moving forward and keep analyzing is a costly decision…
Pitfall #7: A Single User Interface will solve all my EDW problems..
 There are no magic bullets. Most companies need 2-3 end user tools.
 Start with OLAP, then continue with ad-hoc querying, and finalize with
dashboards. All other tools are great, but not a starting point.
 Remember you first crawled and walked before you ran.
More EDW Pitfalls….
Pitfall #8: Enforce EDW Standards
 Standards are not a word document buried in a file cabinet
 If you allow ‘exceptions’ the standards quickly become meaningless.
 It costs to keep your house clean, but data management and data
integration will benefit greatly from it. Remember: “the road to hell is paved
with good intentions” - unknown.
Pitfall #9: Keep Your EDW Support Team motivated
 The average application developer stays on the job for 47 months, the
average support person is only there for 25 months!
 It is very expensive to use the support team as a training ground for
technical staff and it hurts performance.
 Make the support team a ‘cool’ place to work with
flexible hours and defined career paths.
Final EDW Pitfall.
Pitfall #10: Not Creating a ‘BI Technology Advisory Board’ for the EDW

Use ad-hoc best practice advise from external experts on an
periodic basis.

If you are struggling with something, there are many others who
have ‘cracked the nut’ already – leverage their experiences.

Attend BI conferences, take good notes and leverage the many
experts at the booths, the speakers and the forums.

You are not alone, but your team needs to get ‘plugged into’ the
many EDW and Technology on-line communities.
What We’ll Cover …
• Introduction
• EDW architectural options
Federated Data Warehouse
 Centralized Data Warehouse
 Distributed Data Warehouse

•Data Integration challenges
Masterdata
Transaction
data conversion
Data cleansing
•The IT "Jail"
•Creating The Support Organization
•The top 10 EDW pitfalls
•Wrap-up
40
Resources
• Fundamentals of Data Warehouses

by Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and
Panos Vassiliadis
• Implementing Enterprise Data Warehousing: A Guide for
Executives

by Alan Schlukbier
41
7 Key Points to Take Home
• There are more than one way to architect an EDW. However, you
need to make sure your BI solution is designed, not evolutionary
• Consider FDW and DDWs when data volumes are extremely high or
your company just underwent a merger or acquisition
• Make the front-end independent from the backend
• Formalize a data integration strategy with MDM and Reconsolidation
as key focus areas
• Invest in people, not just technology –Great support staff is key to
EDW success
• Create a BI technology advisory board and have periodic meetings
42
Your Turn!
How to contact me:
Dr. Bjarne Berg
[email protected]
43
Disclaimer
SAP, R/3, mySAP, mySAP.com, SAP NetWeaver®, Duet™, PartnerEdge, and other SAP products and services
mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany
and in several other countries all over the world. All other product and service names mentioned are the trademarks
of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP.