Architecture Processes & CPM

Download Report

Transcript Architecture Processes & CPM

Building A Data Quality Program From
Scratch
DAMA Chicago
October 19, 2011
John Grage – Sr. Mgr. Discover
Financial Services
Agenda
•
•
•
•
•
•
•
•
•
•
•
•
•
2
Company Introduction
Card Acceptance
Data Quality Defined
The Six Factors of Data Quality
Best Practices for Improving Data Quality
Origins of Poor Data Quality
Benefits of High Data Quality
Who is Responsible for Data Quality?
Let’s Get Started
Celebrate the Wins
Recommendations
Core Functional Requirements of a Data Quality Tool
Q&A
Company Introduction
• Discover Financial Services (NYSE: DFS)
– Direct Banking and Payment Services Company
– Founded in 1986
– We Offer Many Consumer Products
•
•
•
•
Credit Card (One of the Largest Credit Card Issuers in the U.S.)
ATM/Debit Card
Loans (Student, Credit Card, and Personal)
Banking (Online Savings Accts, CDs, and Money Market Accts)
– We Own Three Payments Networks
• Discover Network: has millions of merchants and cash access locations
• PULSE: one of the nation’s leading ATM/debit networks
• Diners Club International: a global payments network with acceptance
in 185 countries and territories
–
–
–
–
3
Riverwoods, IL Headquarters
Approximately 10,500 Employees
Approximately 50 Million Card Holders
Sites Include: www.discovercard.com and www.discoverbank.com
Card Acceptance
• Discover Card
– North America – U.S. / Canada / Mexico
– Central America – Costa Rica / El Salvador / Panama and others
– South America – Brazil / Ecuador
– Caribbean – Bahamas / BVI / Jamaica / Puerto Rico and others
– Europe – Austria / Finland / Poland / Turkey and others
– Asia – Mainland China / Japan / South Korea
– Africa – South Africa
– Many Other Countries Coming Soon
– See http://www.discovercard.com and select ‘International
Acceptance’ under ‘Help and Support’ for up to date list
4
Data Quality Defined
• Many Definitions
– The degree of excellence exhibited by the data in relation to the portrayal of
the actual scenario.
– The state of completeness, validity, consistency, timeliness and accuracy
that makes data appropriate for a specific use.
– The people, processes and technologies involved in ensuring the
conformance of data values to business requirements and acceptance
criteria.
– People (must), Process (must), Technology (tools needed at some point)
• Myths and Misconceptions
– More than defect correction
– Not a one time action
– Seldom about perfection
5
The Six Factors of Data Quality
• Context
– The purpose for which it is used
• Storage
– Where the data resides
• Data Flow
– How the data enters and moves through the organization
• Work Flow
– How work activities interact with and use the data
• Stewardship
– People responsible for managing the data
• Continuous Monitoring
– Processes for regularly validating the data
6
Best Practices for Improving Data Quality
• Every Data Quality Effort Starts with Data Profiling
• Tool Based Data Profiling is More effective Than Manual
Methods
• Data Profiling is Not a One Time Task
• Data Profiling, Integration and Quality are Closely Related
• Proactive Order Can Reduce Reactive Chaos
• Improving Data When It’s Created or Changed is Easier Than
Fixing It Later
– Garbage in, garbage out
– An ounce of prevention is worth a pound of cure
– Data quality needs to move upstream
7
Origins of Poor Data Quality
• Inconsistent Definitions for Common Terms
• Any Manual Intervention in the Data Flow Process
(employees/customers)
• Data Migration or Conversion Projects
• External Data
• Customer, Product and Financial Data are More Prone to Data
Quality Problems Compared to Other Types of Data
8
Benefits of High Data Quality
•
•
•
•
•
•
•
Greater Confidence in Analytic Systems
Less Time Spent Reconciling Data and/or Fixing Problems
Single Version of the Truth
Increased Customer Satisfaction
Reduced Costs
Increased Revenues
Compliance
– Compliance can drive your DQ program if you can’t sell the other benefits
– Make friends with you audit staff
– HIPAA, GLBA, SOX, Basel II, FDIC, Federal Reserve and others
9
Who is Responsible for Data Quality?
•
•
•
•
•
•
•
•
•
•
10
Information Technology
Business Analysts
Business
Front-Line Workers
DQ Analysts
Data Steward
Corporate Executives
Board of Directors
No One
All of Us Are – We just play different roles
Let’s Get Started (Metadata)
•
•
•
•
•
Information = Data (content) + Metadata (context)
Your DQ Program Needs to Address Both Data and Metadata
“Don’t Boil the Ocean”
Start with a Focus on Structured Data (get this right b4 tackling others)
Start With Selecting a Handful of Business Attributes From:
–
–
–
–
–
–
–
Customer
Product
Vendor / Supplier
Employee
Financial
Master Reference Data
or an attribute(s) someone brings to you. Don’t turn away this opportunity
• Find Data Steward / SME / or Someone with Business
Knowledge About Attribute Who is Willing to Work With You
• Find Published Metadata About Those Attributes
– Verify Metadata is current and accurate with your SME
– If Metadata does not exist then that is your first step
11
Let’s Keep Going (Discovery)
• Update and/or Publish Your Metadata on These Attributes
–
–
–
–
Great if you already have a single metadata repository tool
If not, that should be one goal of your data governance program
Document and train individuals on how to find and use this metadata
Enterprise LDM should be in your repository
• Business subject areas, critical entities, attributes and relationships
• Metadata about these attributes is your golden record
• Discovery (where do these attributes reside?)
– Almost impossible to get 100% coverage without a tool
– Could write lots of SQL and interrogate lots of programs and copybooks
– Either way you will have something to work with – just how complete is it?
12
Let’s Keep Going (POC)
• Start With a POC Within One LOB
–
–
–
–
–
–
13
1-2 week effort
Examine a small number of attributes
Gather a small set of business rules
Profile the data
Share findings with SME
This is your chance to show value within a LOB that a DQ program can
bring
Let’s Keep Going (Project)
• Expand to Data Quality Project for That LOB
–
–
–
–
–
–
–
–
–
–
–
14
1-6 month effort
Expand to full set of attributes
Expand to full set of business rules
Profile the data
Share findings with SME and LOB
Build action plan to address DQ issues
Fix DQ issues
Build in monitoring and reporting activities
Start looking upstream
Publish results – gain corporate awareness of what you have accomplished
May need to do more than one LOB before preceding to next step
Let’s Keep Going (Enterprise)
• Expand to Data Quality Project Across the Enterprise
– 6-12+ month effort
– This is where you start to enter into MDM
– Look at critical business entities / attributes that span the enterprise
• May be some of the same attributes that you looked at individually
within their LOB
– Look at full set of business rules across the enterprise
– Profile the data across multiple LOBs
– Share findings with enterprise SME and Data Governance Council
– Work with DGC to prioritize next steps
– Build action plan to address DQ issues
– Fix DQ issues
– Build in monitoring and reporting activities
– Focus upstream - need to address DQ issues in operational systems
– Publish results – gain corporate awareness of what you have accomplished
15
Let’s Keep Going (6 Key DQ Dimensions)
• Completeness
– Are data values missing or in an unusable state?
– Nullability
• Conformity
– Should data conform to specified formats?
• Consistency
– Do distinct data instances provide conflicting information?
– Are values consistent across data sets?
• Accuracy
– Does data accurately represent the “real-world” values they are expected to
model? i.e. incorrect spellings and not current data
• Duplication
– Are there multiple, unnecessary representations of the same data?
• Integrity
– What data is missing important relationship links? The inability to link
related records together may introduce duplication across your enterprise
16
Let’s Keep Going (Profile)
• Run Data Profiling Against Your Attribute(s)
17
– A DQ tool makes your life much simpler
– Report on
• Source system
• Entity name
• Attribute name
• Data type and length
• Nullability
• Identify if attribute is a PK or FK
• Total number of rows (or %) examined (may not want/need to look at all
rows)
• Cardinality
• Min and max values for the attribute
• Classification (SS#, postal code, name, address, etc.) DQ tools good at
this
• Number of data quality issues (attributes not in-line with business rules)
• Provide explanations and examples for each exception
Let’s Keep Going (Analyze / Fix)
• Analyze Your Results
–
–
–
–
–
–
–
–
–
18
Look at results from your analysis regarding DQ dimensions looked at
Identify data quality issues
Determine with SME the impact to LOB or company these exceptions bring
$ is the best message to bring
Compliance is equally as effective
Build action plan to fix
Determine cost to fix
Take action to fix if cost effective (remember it’s not about perfection)
Save results
Let’s Keep Going (Swim Upstream)
• Trace Data Flow in Reverse from Data Quality Issue
• Data was Corrupted Somewhere Along Data Flow
– Right off the bat – as data entered the company
• Bad vendor file
• Bad data entry from customer service rep (telephone call)
• Bad data entry from customer (online application)
– Programming error in operational system
– Data Transformation processes as data moves along
– ???
• Find Where Corruption is Occurring and Fix It
• Beware: Corruption May be Occurring in Multiple Places
19
Let’s Keep Going (Monitoring)
• Build Monitoring Process to Audit Your Fix
• Monitoring Process Should be a Scheduled Automated Process
• Need to Review Results to Determine if Data is No Longer
Being Corrupted
• Take Action if Data Quality is Being Compromised
20
Let’s Keep Going (Non-Compliance)
• Use Pie charts, Bar Graphs, etc to Pictorially Illustrate Effect of
Not Addressing Discovered DQ Issues
• Tie to Regulatory Compliance if Helpful. Refer to HIPAA, Basel
II, SOX, FDIC, Federal Reserve.
• Tie to $
– Increased cost
– Decreased revenue
• Present to Data Governance Council
21
Celebrate The Wins
•
•
•
•
22
Celebrate
Publish Wins on Scorecard
Show $ Saved or Revenue Increased
Constantly Remind Enterprise of What You are Doing and Value
You are Providing
Recommendations
•
•
•
•
•
•
•
•
•
•
•
•
23
Start Small (POCs)
Show Some Quick Wins - $
Grow From There
Focus on What You Have to Work With, Not What You Don’t
Have to Work With
Profile Data More Deeply and More Often
Find Solutions in Tools
Establish Both Proactive and Reactive Processes
Take Data Quality Upstream
Use Regulatory Compliance to Drive Data Quality
Use MetaData to Drive Quality
Address Enterprise Data Quality
Derive EDQ Org Structure and Support Through Data
Governance or other Executive Support
Core Functional Requirements of a DQ Tool
• Profiling
– Capture statistics (metadata) providing insight into the quality of the data
and help to identify data quality issues
• Parsing and Standardization
– Decomposition of text fields into component parts and the formatting of
values into consistent layouts based on industry standards, local standards,
user defined business rules and knowledge bases of values and patterns
• Generalized “Cleansing”
– The modification of data values to meet domain restrictions, integrity
constraints or other business rules that define when the quality of data is
sufficient for organization
• Matching
– Identifying, linking or merging related entries within or across sets of data
• Monitoring
– Deploying controls ensuring data continues to conform to business rules
that define data quality for the organization
• Enrichment
24
– Enhancing the value of internally held data by appending related attributes
from external sources (i.e. consumer demographic attributes or geographic
descriptors)
The End
Thank You!
Questions?
25