Transcript Document

Overcoming Challenges to
Implementing DW and BI
Applications in DoD & the Federal
Government
Tom McCullough
DAMA-NCR Presentastion
13 May 2003
Tom McCullough, CCP
Sr. Enterprise Data Architect, SI International, Inc.
•
Quick resume:
•
B.S. Mathematics, university of Pittsburgh, Summa cum Laude, Phi Beta Kappa
•
MBA (concentrations: finance & CIS), University of Rochester, Wm. E. Simon Graduate School of
Business Administration
•
Eastman Kodak co., Management Services Division, Kodak Park, Rochester, NY: systems designer, systems
analyst, project manager, DBA, chief corporate manufacturing software package analyst
– Part time adjunct instructor State University of New York at Brockport, Dept of Comp Sci
•
The Riggs National Bank of Washington, D.C. Lead data base analyst
•
Emhart/ATI, Baltimore, Md.: Principal consultant, Data Administration
•
CSX Technology: Technical Manager and consultant to: CSX Transportation, Inc.; Westinghouse ESG
(now Northrop Grumman); T. Rowe Price Investment Technologies
•
Zurich financial services inc: VP of business intelligence & data warehousing
•
SI Consulting, Inc./A subsidiary of SI international, Inc. Reston, Va.: Sr. Enterprise Data Architect. Client
list includes:
– US Dept of State, Bureau of Consular Affairs, Consular Systems Division
– US Dept of Interior, National Park Service
– US Dept of Energy
– US Army National Guard, National Guard Bureau
What We’ll Cover Today
•
What are some key industry challenges today in data
warehousing and BI?
•
What challenges are specific to the federal
government space?
•
SI International’s experience delivering integrated
data architecture solutions: A case study.
•
Modeling tips from SI International, Inc.
•
SI International’s top 10 key ingredients to achieve
success in data warehousing and BI for Uncle Sam.
What Are Some of Today’s Key Industry Challenges
Today in Data Warehousing and BI?
---
How Risky Are Data Warehouse Projects?
• Only 15% of companies in a 2002 cutter survey of 142
companies call their data warehousing efforts to date a
major success.
• Data warehouse systems are complex and expensive.
• 60% to 90% of all data warehouse projects either fail to
meet expectations or are abandoned.
How Risky Are Data Warehouse Projects?
• Technology remains part of the problem: only 27% of
survey respondents were confident about the current
state of data warehouse technology, while 58% were
‘cautious.’
• Stovepipe data marts and web silos!!
– Dealing with users with clout who must have their
“cubes” by tomorrow or else!!
–
Source: InformationWeek, January 9, 2003, “data warehouses are still high risk.”
Why Have There Been So Many Failures?
Take a Look at Just a Few…
•
Failing to manage user expectations.
– “The greatest cause of disappointment is expectations.” - Benjamin
Franklin.
•
Poor data quality in the source systems or an entire lack thereof.
•
Over focus on the technology; Under focus on the business problem.
•
Lack of aligning objectives with the business’s strategy.
•
A lack of understanding of fundamental temporal relational modeling.
– Normalization requires knowledge of the inherent structure of the
data.
•
Attempting to model the whole enterprise first before delivering.
Why Have There Been So Many Failures?
Take a Look at Just a Few…(cont’d)
•
Lack of sufficient sponsorship at the enterprise level.
•
Lack of adequate training and education in the specialized tools.
•
Inadequate data stewardship.
– Not tied to management by objectives.
– No incentives in place to foster it.
•
Attempting to manage decision support projects like operational
transaction processing system projects.
•
Inadequate attention to user security concerns.
•
Failing to get the source system subject data experts recognized and
fully on board.
Why Were DW’s Created in the First Place?
• Long before data warehouses were popularized by
Bill Inmon, there was a concept known as the
corporate information center.
– Technical “chauffeurs” existed to get at the data
from upper management.
• Attempts by new management to make an “end run”
around the unwieldy legacy stovepipe systems.
• Trying to address the needs of true ad hoc users.
What Are the Practical Reasons for Why You
Need a Data Warehouse?
• It functions as
– A repository for historical snapshots to support
strategic analysis and decision support
– A flexible source of data for use by mining,
reporting and OLAP tools
– The ultimate detailed historical data source for
drill downs
• It eliminates
– Massively redundant data extractions
– Conflicting and confusing technical meta data
– The need for redundant, uncoordinated data
cleansing processes
»
Source: Claudia Imhoff, “building a BI environment on a shoestring” presentation to Dama international 2003 symposium
Where Do the ODS & DW Fit into The
“Information Supply Chain”?
What Is the Principal Technical Challenge?
The Extract, Transformation, Load, & Metadata Problem
Modeling
Tool
(2) Source
Definitions
(1)
Warehouse
Model
Metadata Repository & Transfer
Mechanism
Physical
Model
Logical
Model
Source
Defs.
Source
Systems
(3) Table Definitions
(5) Map & Transform Info
(6) Extracted Data
SourceTarget Maps
Load
Stats
(4) Source to
Target Maps
Data
Staging
Tool
(8) Load Stats
(5a) Physical Info
(table spaces, etc.)
(7) Transformed
Data
Data
Warehouse
Is there a difference between Inmon’s normalized
data warehouse and Kimball’s “staging area”?
• Challenge: how to get the data into one place where
it’s:
– Consolidated from various sources,
– Arriving at different times.
– Kept in abeyance before updating production data
and publishing it to users.
• Reality: if the data warehouse = the staging area,
there will be a need for a pre-staging area.
9-May-03
Proprietary and Confidential
7
What Are the Benefits of Dimensional
Modeling?
• Star schema’s are easy for users to understand.
– Measures.
– Dimensions.
– What’s not in the model is obvious.
• The process facilitates requirements validation.
• The process is a good way to quickly formulate a
preliminary target data model.
What Are Some Problems with Dimensional
Modeling?
• The focus of a pure star schema (ie., Dimensional
data model) is primarily on user requirements and
ease of data access, not data update nor the inherent
structure of the data.
• Fabian will have more to say. Stay tuned.
This Has Given Rise to the Great Debate
• Bill Inmon (father of the data warehouse).
– Advocates normalized enterprise data warehouse.
– Time variant data structures.
– Don’t be concerned about requirements too much up front.
» “Build it and they will come.”
• Ralph Kimball (dimensional modeling):
– Start with clearly defined user requirements.
– Build a subject area at a time based on a “star schema.”
– The data warehouse is the union of all the data marts but
only if the dimensions conform across all the fact tables.
– Before loading the facts and dimensions, first concentrate on
the “staging area.”
What Other More Fundamental Issues Are at
Stake?
•
Tension between operational databases and warehouses.
•
Dr. Joseph firestone coined the more fundamental problem of “dynamic
integration”:
– Data sources and data targets must be identified, designed, and maintained.
– Both the source and target designs are moving targets thus complicating the
process of configuration management.
– Get the source system support people and their users on your side when
implementing and maintaining any DW solution.
•
Data warehousing is an attempt at data integration.
What Are the Sociological Challenges to
Achieving Integration?
• Data integration requires business integration.
• Political and “Ricebowl” obstacles to overcome:
– “Attempts to integrate will expose
inconsistencies/incompatibilities that create human
insecurities and power issues.”
»
Source: Information Management Directions: The Integration Challenge, Section 5.7.7 on “sociological implications”, NIST
Special Publication 500-167.
• Every database represents a budget.
• Every action will create an equal and opposite reaction.
• Incumbent contractors / obfuscation hurdles.
How Are Enterprise Application Portals
Affecting the Industry?
• Portals make data staleness painfully obvious
and more visible to all.
• Portals are driving organizations to respond
more quickly to information, but…
• Processing and analyzing data in near real time
are still recognized as the holy grail of
enterprise data management, yet are still
difficult to achieve.
• This has also given rise to the notion of web
services.
Are “Web Services” a True Panacea or
Simply the Latest Silver Bullet?
• Beware: numerous vendors and authors are
promising web services as the ultimate answer
to real-time data analysis.
• The “fly in the ointment” remains the challenges
of the working out the semantics of the
underlying data elements.
• Attend Fabian’s presentation tomorrow.
What About the Debates about ETL vs. EAI
vs. EII to Achieving Integration?
• High volumes argue in favor of ETL tools.
• EAI tools work better with low volumes.
• Some ETL vendors have more data cleansing
and transformation technology in their products
and pull from wide variety of sources (ex., Data
junction).
• Enterprise Application Integration is more about
integrating processes (ex., Biztalk).
• Enterprise Information Integration tools only
move data when the application or user
requests it.
What Challenges Are Specific
to the Federal Government Space?
---
What Are Uncle Sam’s DW and BI
Challenges?
• Achieving stated but elusive goals:
– Government Performance and Results Act of 1993
– Clinger/Cohen act of 1996
– HIPAA
– Internal regulations, policy & procedure memos
– Available funding
• Contract vehicles: contractors must overcome legal
hurdles to get into the game to begin with
Army Regulations, Directives and
Requirements
•
AR-25-1 (information management)
•
AR 380-19 (information systems security)
•
•
AR 380-5 (army information security program)
•
•
AR 380-53 (information systems security monitoring)
•
DII COE configuration management plan
•
AR 71-9 (materiel requirements)
•
DII COE user interface specifications v 4.0
•
AR CADM (army core architecture data model)
•
DoD 5200.28 (security requirements for automated information
•
Army enterprise architecture guidance document
(AEAGD)
•
Army enterprise vision (20 July 1993)
•
CMM (capabilities maturity model)
•
Computer security act of 1987
•
DA Pam 26-6-1 (army acquisition planning for
information systems)
•
DA Pam 73-7 (software test and evaluation guidelines)
•
DII COE (release 3.3) security features users guide
(SFUG) for windows NT 4.0
•
•
DII COE (release 3.3) security functional designers guide
(SFDG) for HP-UX 10.20
DII COE (release 3.3) security functional designers guide
(SFDG) for Solaris 2.5.1 manual)
•
DII COE (release 3.3) security functional designers guide (SFDG) for windows NT
4.0
SYSTEMS (AISS) 21 mar 88)
•
DoD 8320.1-DOD data administration
•
DoD 8320.1-M data administration procedures
•
DoD 8320.1-m-1 (data standardization procedures)
•
DoD 8510.1-M (department of defense information technology security certification
•
And accreditation process (DITSCAP) application
•
DoD 8910.1-M (DoD procedures for management of information requirements)
•
FIPS 184 (integration definition for information modeling (Idef1x))
•
IEEE 12207 (standard for information technology)
•
IEEE std 829-1983 (standard for testing documentation)
•
Implementing DOD standard data elements
•
JTA (joint technical architecture)
•
JTA-A (joint technical architecture-army V6.0)
•
Mil-HNBK-61 (configuration management guidance)
DoD’s Information Management Challenges
• Literally thousands of stove-piped systems.
• Existing heterogeneous infrastructure with which to
integrate.
• Data Mgmt Driver: The the net-centric environment
movement toward “content” that users demand.
– From a recent RFP: “For effective “content,”
there must be an approach to make data visible,
accessible,
trusted,
understandable,
and
interoperable.”
–
Source: Solicitation DCA100-03-T-4029.
The ARNG ODS and DW
A Case Study
Tom McCullough
Current ARNG Information Technology Environment
16 Databases
10 Platforms
54 States &
Territories
140+ Applications
6,000 DA Civilians
24,700 Military
Technicians
352,000 Total
23,000 AGR
12 Languages
Application Access
AKO
Entry Point
Access to AKO
Administered by
AKO Staff
GuardNet Web
Entry Point
Access to GuardNet Web
Administered by ARNG
Guardnet Staff
Application Specific
Web pages
Access to Application Specific
Web pages and Applications
Administered by Application
Administrator
NGB-AIS-AP Development Process
Functional
has an Idea
AIS/CIO
Review
ITPD
ITROIRE
BPR
CAPR
APB
Wrksht
Alpha
Test
Test Plan
Test Scripts
Test Cases
Beta
Test
All Final Docs
Legend:
•Into CM
•Assign PM
•Assign
Work Order
IT RCB
Design &
Code
ASEIG
Review
PMP
IT RCB
Analysis
Signed SRS
DM, DD
Models
IT ROI
Source Code
Final DM
Final DD
Accredit
Release
and turn over to
Maintenance
Functionally
Technically
Security
BPR – Business Process Re-engineering
PMP – Project Management Plan
CM – Configuration Management
PM – Project Manager
DM – Data Model
DD – Data Dictionary
ITROIRE – Information Technology Return on Investment Rationale Estimate
SRS – Software Requirements Specification
ITPD – Information Technology Project Description
What Is SI International's Approach to Delivering
Integrated Enterprise Information Asset Management
for the US ARNG
• Use and adaptation of SI’s data warehousing methodology
• What Effect Does the Web Have on the DW?
• High Level Components of the ARNG DW Architecture
• DW and ODS Tips from SI International’s Consulting Practice
• General Modeling Tips from SI International’s Consulting
Practice
• Lessons Learned: Top 10 Ingredients for Success in DW and BI
7-Jul-15
Proprietary and Confidential
31
What Effect Does the Web Have on the DW?
National Guard Bureau Web Enabled Data Warehouse Vision
Operational
Readiness
DMOSQ
Funding
Budgetary
Equipment
Readiness
Additional
Areas
Extract
Extract
Extract
Extract
Extract
Extract
Supply
STAGING AREA
Preparation and Transformation
Prepare
PREPARED DATA
Load
Logical Data Model
Operational MetaData (Technical)
Integrate
ENTERPRISE DATA WAREHOUSE
Informational MetaData (Business)
Load
Drill Through
Operational MetaData
Publish
PRE-AGGREGATED DATA MARTS
Informational MetaData
Gateway
Create
MDDB Server
Data
Server
API
Web
Server
R/OLAP Server
Web Browser
OLAP Tool
NGB IS Admin Users
R/OLAP Tool
Data Mining
Query
NGB IS Admin Users
NGB Strategic
Power User
Push Button NGB
Knowledge Workers
Consume
NGB User
Components of NGB DW High-level Architecture
Local
Application
Server(s)
Metadata
ARC
ARC
ART
ART
Data &
Metadata
Extraction
ARP
ARP
AR.x
AR.x
Staging
Load /
Update
Data
Cleansing &
Transformation
Facts and
Dimensions
Local
Reports
(Standard and
Ad-hoc Queries)
Data Warehousing Tips from SI
International
• The data warehouse serves as a staging area for the data marts.
• Load the incoming data into a “pre-staging” area.
• All fact tables should be appropriately “time-variant relational
data structures.”
• Re-generate the OLAP cubes from the data warehouse.
• Use regulations and policy directives as an aid to gleaning the
inherent structure of the data and the business requirements.
• Use the star schema approach when the true normalized
structure is not known and insufficient time exists to discover it.
Data Warehousing Tips from SI
International
• Adhere to normalization for the sake of integrity and ease of
updates to the data warehouse when possible.
• Add a physical creation date & time column to all tables.
– Default it to the system date with a column constraint of
NOT NULL.
• Look for features in the BI tool suite that address the problem of
“slowly changing dimensions.”
– Incorporate them into modeling efforts.
Tips and Challenges on Modeling
Tom McCullough
Tips on Modeling from Experience
•
Data modeling is about classification and the challenges involved.
–
Level of generalization in the data model drives the functional design, but not all
notational formats support generalization on the process side.
–
Critical decisions are involved in determining what level of abstraction to choose.
–
“Use of subtypes is single most constructive creativity technique in data modeling
because they make different levels of generalization possible” (source: Graeme
Simsion, university of Melbourne, April 2003).
•
Decide whether the modeling process or the modeling product is the objective.
•
How do you obtain consensus for enterprise level definitions?
•
Dealing with alternative designs.
More on Tips and Challenges for Modeling
• Challenge of getting time to validate the model w/users.
• Challenge of making the model understandable for
programmers and discussing the implications.
• Look for unwarranted duplicate relationship paths.
• Managers: beware of letting the developers do the data
modeling.
More on Modeling Challenges
• Avoiding being perceived as the bottleneck.
• Model management and collision analysis.
• Synchronizing the conceptual model with changes
made to the logical model and physical model after
the forward engineering has been done.
More Modeling Issues and Challenges
•
Need clear distinction of the responsibility between the data modeler
and the DBA.
•
Tradeoff between suites and best of breed tools.
•
Integrating the meta data between multiple tools in a collaborative
development.
•
Adhere to a notational format and the rules inherent to it.
Which Modeling Notational Format to Use?
•
IDEF1x – relatively simple; Still in use although on the way out
•
UML
–
In vogue;
–
In use by major software vendors;
–
Single toughest challenge with UML: for a given problem, which UML
diagram(s) applies and which diagram symbols should be used to model the
solution?
•
Oracle Designer: unique to the tool; Can easily be confusing
•
Information Engineering – relatively simple; No longer “in vogue”; Perhaps easiest for users
to understand
Stick to Fundamentals
•
“He who ignores the fundamentals is destined to design an overcomplicated
solution!”
•
ANSI 3 Schema Model for managing information: internal; Conceptual; And
external schema.
•
Adopt and adhere to consistent naming conventions:
– Classification of data elements:
» prime words; Class words; And qualifying words.
» Object class term; property term; qualifier term; representation term.
Remember the Fundamentals
•
The questions asked in the user/client fact finding interview are paramount.
•
“If you listen carefully, the patient will reveal the diagnosis.”
•
Don’t re-invent the wheel. Look at other sources that have tried to solve the
same problem.
– Institute for Defense Analysis models.
•
Creative data modeling requires creative process and activity modeling.
Lessons Learned: What Are SI
International’s Top 10 Ingredients
for Success in DW and BI as of
April 2003?
Tom McCullough
Top 10 Ingredients for Success in the BI
Space
• #10: The data warehouse is only one component of the
organization’s enterprise architecture  a DW is not the solution
to every IT problem.
• #9: IT is not the solution to every problem. Remember the
lessons from industrial engineering, “streamline first, then
automate!”
• #8: It takes perseverance to succeed in data warehousing.
» “To persevere, one must first experience frustration…”
–
Source unknown.
Top 10 Ingredients for Success in the BI
Space
• #7: Know and apply the fundamentals of data management.
• #6: Identify The Acceptance Criteria.
– “It All Comes Down to a 17 Inch Screen and a Mouse.”
»
Source: Dr. Claudia Imhoff, 2003 Dama International Symposium, Orlando, Florida.
• #5: Software Engineering is a team sport and data warehousing
is a specialized version of software engineering.
–  One person cannot do it all. Look for team players.
Top 10 Ingredients for Success in the BI
Space
• #4: In addition to ability, you must have your mind
and heart into the job. When deciding upon
recruiting for your team, remember the words of Bela
Carosi (coach of Mary Lou Retton), “You musta’
looka’ for the fire in the eyes.”
• #3: Learn from others; our patterns are what we
bring to the table.
• #2: Study the tenets & methods of other professions:
– Building architecture & construction many
analogies here.
– The doctor / patient relationship.
– Journalists: They’re experts at asking follow-up
questions on the fly.
Top 10 Ingredients for Success in the BI
Space
• #1: “What you have just heard was actually performed by
professionals. Do not attempt this at home.”
– Lesson: if you want world class results, hire/contract world
class professionals. You always tend to get what you pay for.
– Remember: when buying medical, dental, information asset
management products and services, parachutes, or life
preservers beware of vendors offering price discounts or
other incentives and gimmicks to get you to buy their
products and services.
Suggestions for Further Reading
on Critical Thinking for Data Architects
•
Temporal Data and the Relational Model, by Date, Darwen, and Lorentzos
•
Introduction to Database Systems, 7th edition, ISBN: 0201385902
•
The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse, ISBN 0471-37680-9
•
The Data Warehouse Toolkit, 2nd Edition, ISBN 0471200247
•
The Common Warehouse Metamodel, ISBN: 0-471-20052-2
•
The Federal Enterprise Architecture Framework
•
“The DoD framework” (previously “the C4ISR framework”)
•
Oracle 8i and Microsoft SQL server 2000 integration, ISBN: 0-7645-4699-6
Suggestions for Further Reading
on Critical Thinking for Data Architects
•
Data Warehousing: Using the Wal-Mart Model, ISBN 155860684X
•
Information Management Directions: The Integration Challenge, NIST special publication
500-167
•
Guidelines to Implementing Data Resource Management, Version 4, ISBN: 0-967-6674-1-0
•
Guide on Data Entity Naming Conventions, NBS special publication 500-149
•
Information Engineering for the Practitioner, ISBN: 0-13-464579
•
Larry Greenfield’s website: http://www.dwinfocenter.org
•
Dr. Joe Firestone’s website: http://www.kmci.org/
•
Fabian Pascal’s website: http://www.firstsql.com/dbdebunk/
Suggestions for Further Reading
on Critical Thinking for Data Architects
•
Building, Using, and Managing the Data Warehouse, ISBN 0-13-534355-0
•
The Balanced Scorecard, ISBN: 0-87584-651-3
•
The Strategy Focused Organization, ISBN: 1-57851-250-6
•
Practical Issues in Database Management, ISBN: 0471585386
•
Out of the Crisis, ISBN: 0-911379-01-0
•
Data Quality, The Accuracy Dimension, ISBN: 1-55860-891-5