Transcript Document

Using MDM as a Practical Approach to Get Started in Data Governance Todd Goldman: VP Products and Marketing

Precursor to Data Governance is Data Management

Data Governance Manifesto

• Data should be 1. Understood 2. Secure 3. Consistent 4. Accessible 5. Managed

Data Governance Management

(People, Processes, Procedures)

Security Consistency Accessibility Understanding

(Discover, Validate) 3

Governance Requires a Foundation of Understanding

Management

(People, Processes, Procedures) •

Understanding

(Discover, Validate) If you don’t know how data in different systems is related How can you make sure they are consistent (MDM)?

How can you measure overall data quality?

How can you measure the quality of business rules?

• If you don’t know where the sensitive data is How can you protect it?

• If your data is not secure, consistent and accessible What does it mean to manage it?

4

Even if you understand your data landscape starting data governance is difficult.

It is a new religion.

Current Data Religion – Mystery Cult

• Data is shrouded in mystery • Its meaning is only accessible through data priests • Data Priests use omens (metadata) and personal experience to divine the meaning of data in return for (financial) sacrifices • Meaning is often obscure, misleading, incomplete and wrong • Data Priests are often blockers to data governance programs • More paperwork • • Slows us down Don’t need it 6

A Common Myth: “We know our data”

I’m a professional. Of course I know my data!

But, once it leaves my hands, it is someone else’s problem!

Wow, that transformation is complex. Are you sure that is in my data?

I’m going to start my own consulting firm • • • Subject matter experts (SMEs) only know their own systems But they can’t tell you how it changes and is transformed as it moves from system to system Relationships between systems are complex: • SMEs sometimes change jobs!

7

More of the Myth: “Our Data is Consistent”

All of my data follows the business rules for this system!

I can’t keep up with all the acquisitions and reorganizations. They mess up the way systems work together. It is very inconvenient.

• Business rules are broken all the time as data crosses business and system boundaries: • 83 year old man in system A is a “youthful driver” in system B • Bond yield is listed as 5% in system X and 5.3% in system Y • Exceptions result in lost revenue, customer dissatisfaction, and regulatory fines • Business rules change as organizations change • Mergers and Acquisitions • • • • New products or services Products/services are retired Reorganizations New IT systems are added 8

Data Ecosystem

Data Priests (SMEs)

• “Know” the data • Assist with data problems

Warriors

• Have immediate, acute data problem • Focus on

feasibility and time

Tradespeople

• Have a short term (project) business problem • Focus on

value

Reformers

• Have a long term business problem • Focus on

control and scalability

9

Data Governance

= New Data Religion

• “Governed Data”: • Data is documented, consistent and secure • Governance is a must for MDM projects • To succeed, a new religion needs • Reformers or prophets – people who believe in it and sponsor it • • Priests – people to educate, explain and promote it New Miracles –successful projects and spectacular results 10

Successful Data Governance Program Roll Out

Help Warriors win battles Use victories to convince Reformers to change religions Convert the priests Have priests drive adoption to trades people 11

Key = Winning the First Battle!

• Pick your battles • Find appropriate initial project • Achieve quick win • Early success • Build a Trojan Horse • Govern the data without calling it that 12

Picking The First Battle

• Appropriate Project • • Immediate Business ROI Project success directly linked to Data Governance practice and methodology • • Cross-Silo Must have for the company • Examples • Basel II • Master Data Management • • Application migration Cross-BU Reporting and Analytics 13

Avoid False Starts

• Projects to Avoid • Future ROI – next project will benefit • Boil the ocean • False Start Examples • • • Refactoring Metadata repository Enterprise (fill in the blank) 14

Quick Win

• Iterative Approach • Agile Development • Immediate results • Automation is Critical • Data Discovery tools • Repeatability of results • • Validation and consistency tools Incident/Exception workflow • Visual and Intuitive Presentation of Results • Business oriented • Graphically presented 15

Barriers to the Quick Win

Data Governance Gap

The Peaks of Data Understanding “Design specifications get lost or outdated, subject matter experts leave companies, databases and business rules get changed without updating documentation, mergers and acquisitions wreak havoc on databases, all leading to a company not knowing exactly what they have... The end result is inconsistent data.” Fern Halper, Hurwitz & Associates Data Nightmare Data Governance “

70%

or more of the time and effort involved in completing most data integration projects is consumed by defining and implementing the business rules by which data will be mapped, transformed, integrated, and cleansed.” Ted Friedman Vice President, Gartner Group 17

Current Tools Weren’t Developed to Discover a Distributed Data Landscape

• ETL, EAI, Cleansing • Not discovery solutions. They depend on discovery • Metadata matching • Doesn’t work in a real environment • Profiling • Focused on a single data source • Today’s tools weren’t created to analyze a distributed data landscape • Data analysts

manually

examine

data values

to figure out the business rules in the distributed data landscape • The most sophisticated tools commonly used today is: 18 Most Widely Used Business Rule Discovery Tools

Case Study: Asset Master

Reminder for Todd: Show Dawn’s slides Vice President

Charlotte, NC based Commercial bank

• •

Project:

IT Asset Master Consolidating 8 asset management systems to a single asset master “ We had 9 subject matter experts spend 9 months and we still didn’t know enough to be able to consolidate our data into a master.” 19

Data Analysis: The Lack of Understanding A Case Study (Note: This is NOT what you want to do!)

Data Analyst Case Study

• The story you are about to hear is true • Only the names have been changed to protect the innocent 21

This is Denise

Data Analyst: Denise

• Experienced Data Analyst • Extremely successful career working for data software companies • • Very Personable Very Intelligent • • Impeccable references Bills at $2000/day • Hired by a dental insurance company for a

3 week

data analysis/MDM integration project • Tools used: • Profiling • • • TOAD SQL Highlighter 22

Manual Data Discovery Timeline

Data Analyst: Denise

• Get metadata specs and begin to check business rules between one table with six columns against first of three source systems • Expected result: •

3 Weeks

23

Manual Data Discovery Timeline

• Get initial results from unit test with inconsistent data for 1st column • So far, so good

Data Analyst: Denise

24

Manual Data Discovery Timeline

• Retest and debug • Still on track

Data Analyst: Denise

25

Manual Data Discovery Timeline

• Go to data architect to question • Architect pings owner of application (SME).

• NOTE: Data analyst not allowed to consult with SME directly.

Data Analyst: Denise Data Architect

26

SME

Manual Data Discovery Timeline

• Meeting with architect and SME to review.

• Initial answer received .

Data Analyst: Denise Data Architect

27

SME

Manual Data Discovery Timeline

• Rewrite business rules and test.

• Find second column with inconsistent data.

• Retest and debug.

Data Analyst: Denise

28

Manual Data Discovery Timeline

• Go to data architect to question • Architect pings owner of application (SME).

• SME asks upstream application owner

Data Analyst: Denise Data Architect

29

SME Application Owner

Manual Data Discovery Timeline

[email protected]

[email protected]

• Flurry of emails between the 4 players, as upstream app owner in different time zone.

[email protected]

• Decision on how to proceed agreed upon

Data Analyst: Denise [email protected]

[email protected]

[email protected]

[email protected]

Data Architect

30

SME Application Owner

Manual Data Discovery Timeline

• Rewrite business rules in SQL and test.

• Find

more

inconsistent data.

• Retest and debug.

Data Analyst: Denise

31

Manual Data Discovery Timeline

• Go to data architect to question • Architect pings owner of application (SME).

Data Analyst: Denise Data Architect

32

SME

Manual Data Discovery Timeline

• Meeting with architect and SME to review.

• Decision made to review specs with a larger group

Data Analyst: Denise Data Architect

33

SME

Manual Data Discovery Timeline

• Meeting with larger group.

• Original specs validated and corrected

Data Analyst: Denise

34

Manual Data Discovery Timeline

• At weekly status meeting, project manager asks, “why have 17 days passed when this phase was to be completed in 3 weeks?”

Data Analyst: Denise

35

Manual Data Discovery Timeline

• Rewrite SQL and test.

Data Analyst: Denise

36

Manual Data Discovery Timeline

• Pass first source system SQL to ETL developers for coding and QA

Data Analyst: Denise

37

Manual Data Discovery Timeline

• Get specs and begin to verify relationships with second of three sources systems – an outside feed

Data Analyst: Denise

38

Manual Data Discovery Timeline

Data Analyst: Denise

• Go to data architect to question • Architect pings owner of application (SME).

• SME asks upstream application owner • Feed vendor liaison is consulted

Feed Vendor Liason Data Architect

39

SME Application Owner

Manual Data Discovery Timeline

• Flurry of emails between the 4 players, plus vendor liaison.

[email protected]

[email protected]

• More people involved consumes even more time • Decision on how to proceed agreed upon

Data Analyst: Denise Feed Vendor Liason [email protected]

[email protected]

[email protected]

[email protected]

Data Architect

40

SME Application Owner

Manual Data Discovery Timeline

• Recode SQL and test. • Repeat experience of days 7-16, with new inconsistent data

Data Analyst: Denise

41

Manual Data Discovery Timeline

• Recode SQL and test. • Repeat experience of days 7-16, with new inconsistent data

Data Analyst: Denise

42

Manual Data Discovery Timeline

• The project now 18 days overdue, with no clue as to how long it will take to complete the remaining work. • Repeat variations of days 21-37 several times

Data Analyst: Denise

43

Manual Data Discovery Timeline

Data Analyst: Denise

• Pass 2nd source system business rules to ETL developer and QA. • Project phase is now 70 days overdue, with one entire source system still to code. • Red flags being raised • Search for sacrificial lambs.

44

Manual Data Discovery Timeline

• Go on preplanned, and much overdue vacation

Data Analyst: Denise

45

Manual Data Discovery Timeline

• Get specs and begin to check business rules with third of three sources systems.

• Repeat variation of days 20-89.

Data Analyst: Denise

46

Manual Data Discovery Timeline

Data Analyst: Denise

• Pass 3rd source system code to ETL developer and QA. • Project is 152 days late • = 30 weeks • = 7 months • Company paid for 30 weeks more consulting time than expected • $300K overrun 47

What does this mean for your Data Governance Project?

MDM deployment: $10MM in Services for Every $1MM in Software

Services Software

MDM Hub • Merge • Purge • Match 49

According to the Experts:

Ted Friedman

Vice President Gartner Group

70%

or more of the time and effort involved in completing most data integration projects is consumed by defining and implementing the business rules by which data will be mapped, transformed, integrated, and cleansed.” 50

70% of Services are for Data Analysis 30% of Services Are for Deploying the “Data Hub”

Services

• • •

Discover Map Validate Data Analysis Services MDM Deployment Services Software

MDM Hub 51

According to the Experts:

Malcolm Chisholm

MDM Industry Expert AskGet, Inc.

“MDM won’t ever provide a positive return on investment to businesses if the cost and risk of the data analysis and mapping component is not reduced by an order of magnitude…

you have to automate the process

” 52

Recap So Far:

You must overcome big hurdles you must overcome to implement Data Governance

Technical:

• Presumes data understanding • Requires automated data discovery, validation, remediation for a

distributed data landscape Financial:

• Cost of deployment must justify the project

Cultural:

• You may be changing religions 53

Data Governance Epic

The Peaks of Data Understanding Data Nightmare Data Governance 54

Data Governance Epic

• But you know the alternatives are unthinkable, so you and your team of data governance warriors boldly go where no man has gone before. 55

Data Governance Epic

• • • Scale the cliffs of data relationship discovery Pick your way through data inconsistency glaciers Battle Data Priests for budget and mindshare 56

Data Governance Epic

• And eventually, if you are very, very persistent and very, very lucky, you may even get there 57

Case Study: Potential for the False Start

• •

Manufacturing Firm :

Corporate mandate to improve data quality (the CEO demanded a new religion) Created their initial identity master • • • •

Initial Identity Master:

Required 5 analysts to map 4 data sources Merge purge match process is governed But… Quality of data in the master is suspect Result: No downstream users • •

Next project

: 16 more sources to map Will require 20 more data analysts • • •

The Problem:

Hiring 20 data analysts is not financially feasible Data mapping and analysis is the critical path Millions of dollars have been spent on software and services already 58

There’s got to be an easier way!

Need a Quick Win

59

What if you could Automate Cross System Data Understanding?

60

Automating cross system data discovery would change the economics of governance from this:

Services Data Analysis Services MDM Deploy ment Services Software

MDM Hub 61

Automating cross system data discovery would change the economics of governance from this

Services Data Analysis Services MDM Deploy ment Services Software

MDM Hub 62

Automating cross system data discovery would change the economics of governance to this • Provides the foundation of good data management • Automates understanding of the current data landscape • Replace services with software (10x differential) • Creates repeatability • Makes data governance projects financially feasible • Accelerate deployment • Reduce project risk • Turn negative NPV into positive NPV • Provides the “Trojan Horse”

Analysis Services MDM Deploy ment Services Software Analysis

MDM Hub 63

Case Study: The Trojan Horse

• •

Truck Manufacturer

Migrating from one finance application to another Data must be mapped and migrated as part of the process • • •

The Trojan Horse:

Some data in the finance application is master data Using automated tools to map the data and will leverage the map to create a master • Did a pilot project where automation took 3 days vs 6 months for manual mapping Planned savings from automation are being rerouted to purchase an MDM system • •

Critical Factors

: Governance processes will be required to clean up the data as part of the migration • They are not calling this governance… they are just doing it All mapping efforts will be leveragable because they are repeatable and verifiable • Repeatable and verifiable are good words •

Future Challenges:

They must execute 64

Data Discovery Automation Technology: A Primer

Automated Cross System Data Discovery: What is it?

• New data analysis

methodology

and

tools

• Arms the warrior with a new weapon • Allows you to quickly understand your current data landscape • Establishes data understanding within data sources and between data sources • Automates discovery of business rules, lineage, transformations and data inconsistencies across data sources • Goes well beyond profiling • Examines: • Data Values • Data Values • Data Values • Establishes a

methodology

• for cross system data analysis Each data project becomes a building block, not a “one-off” 66

Data-Driven Approach: Aligns Rows Across Datasets

Data-Driven Discovery Engine Step 1

: Discovery Engine analyzes the

data values

to

automatically discover

the key that aligns rows across disparate datasets: • Works for hundreds of tables • Works for millions of rows Member = ID (Table 25) 987,623 987,624 67

Data-Driven Approach: Aligns Rows Across Datasets

Data-Driven Discovery Engine Step 1

: Discovery Engine analyzes the

data values

to

automatically discover

the key that aligns rows across disparate datasets: • Works for hundreds of tables • Works for millions of rows

Row

1 2 3 4 5 6

Member

595846226 567472596 540450091 514714372 490204164 466861109

SS #

123-45-6789 138-27-1604 154-86-4196 173-44-7900 194-26-1648 217-57-3046

Age

15 8 22 55 4 66

Phone

(123) 456-7890 (138) 271-6037 (154) 864-1961 (173) 447-8996 (194) 261-6476 (217) 573-0453

Sex

M F M F F M 987,623 987,624 444629628 423456789 243-68-1812 272-92-3629 25 87 (243) 681-8107 (272) 923-6280 F M 68

Table 25 ID

595846226 567472596 540450091 514714372 490204164 466861109

Demo1

0 1 2 3 1 0 444629628 423456789 3 2

Data-Driven Approach: Discovers Business Rules & Sensitive Data

Data-Driven Discovery Engine Step 2

: With rows now aligned, analyzes the

data values

to

automatically discover

: • Forgotten Business Rules • Data Lineage • Hidden Sensitive Data CASE: If age<18 and Sex=M then 0 If age<18 and Sex=F then 1 If age>=18 and Sex=M then 2 If age>=18 and Sex=F then 3 = Demo1

Row

1 2 3 4 5 6

Member

595846226 567472596 540450091 514714372 490204164 466861109

SS #

123-45-6789 138-27-1604 154-86-4196 173-44-7900 194-26-1648 217-57-3046

Age

15 8 22 55 4 66

Phone

(123) 456-7890 (138) 271-6037 (154) 864-1961 (173) 447-8996 (194) 261-6476 (217) 573-0453

Sex

M F M F F M 987,623 987,624 444629628 423456789 243-68-1812 272-92-3629 25 87 (243) 681-8107 (272) 923-6280 F M 69

Table 25 ID

595846226 567472596 540450091 514714372 490204164 466861109

Demo1

0 1 2 3 1 0 444629628 423456789 3 2

Data-Driven Approach: Discovers Business Rules & Sensitive Data

Data-Driven Discovery Engine Step 3

: With business rules now discovered, analyzes the

data values

automatically discover: • Unknown Data Inconsistencies to Hit Rate: 98% CASE: If age<18 and Sex=M then 0 If age<18 and Sex=F then 1 If age>=18 and Sex=M then 2 If age>=18 and Sex=F then 3

Row

1 2 3 4 5 6

Member

595846226 567472596 540450091 514714372 490204164 466861109

SS #

123-45-6789 138-27-1604 154-86-4196 173-44-7900 194-26-1648 217-57-3046

Age

15 8 22 55 4 66

Phone

(123) 456-7890 (138) 271-6037 (154) 864-1961 (173) 447-8996 (194) 261-6476 (217) 573-0453

Sex

M F M F F M 987,623 987,624 444629628 423456789 243-68-1812 272-92-3629 25 87 (243) 681-8107 (272) 923-6280 F M 70 = Demo1

Table 25 ID

595846226 567472596 540450091 514714372 490204164 466861109

Demo1

0 1 2 3 1 0 444629628 423456789 3 2

What Complex Business Rules are Discovered from the Data?

• • Scalar • • • • • One to one Substring Concatenation Constants Tokens Conditional logic • • • • Case statements Equality/Inequality Null conditions In/Not In • Conjunctions • Joins • • Inner Left Outer • • • • • Aggregation • • • • Sum Average Minimum Maximum Column Arithmetic • • • • Add Subtract Multiply Divide Reverse Pivot Cross-Reference Custom Data Rules 71

Case Study: Worldwide Financial Institution

• •

Financial Services Firm :

Integration of legacy system with reference master system. First of 40 to be integrated •

Manual results for first dataset:

Estimated to take 6 months elapsed • •

Data-Driven Mapping results

: 2.5 weeks of elapsed time Also centralized data analysis expertise • • •

Benefits:

Significant time to market savings: 5 months+ Significant project risk reduction Data inconsistencies found as part of process

Master Data Management

(Deployment time)

6 5 4

Months

3 Manual Mapping Data Driven Discovery 2 1 0 Manual Mapping Data Driven Discovery

72

What does this all mean?

• Makes it much easier and cheaper to map your distributed data landscape • This is the foundation upon which the rest is built • The economics of governance will look very different • Faster, repeatable victories • Turns point projects into governance building blocks • “Undoable” projects become “doable” • Turns data governance projects on their heads 73

Recap I

Culture

: use victories to build the case for better data governance and quality

Trojan Horse

: Start governing your data without calling it governance

Financial

: Use better data management to deliver positive ROI

Technical

: Automate data discovery and management 74

One more point about Culture Change

Strategy Organizational Structure Culture Communications Training/Skills Rewards You can’t just change culture. You have to turn other knobs that affect culture 75

Recap II

Pick the

right battles

, arm the Warriors with 21 st century achieve

cross system discovery

and help them

quick victories

Use victories to convince Reformers to change religions Convert the priests Have priests drive adoption to trades people 76

Data Governance Success

Happy CXO Management Team 77

Questions and Answers

Thank You for Attending!

For more information, contact : Todd Goldman: Web: www.exeros.com

Email: [email protected]

Phone: +1.408.213.8910

Or stop by Exeros Booth in the exhibit hall 78