Transcript Slide 1

Lineage:
a necessity or an exaggerated benefit
April 16th, 2014
Saad Yacu
© Allstate Insurance Company Proprietary and Confidential
Agenda
• Allstate at a Glance
• Introduction
• What is Lineage
• Lineage Benefits
• Types of Lineage
• Presenting Lineage
• Building Lineage
• Integrating With Glossary
• Lineage Design Questions
• Final Thoughts
• Questions
© Allstate Insurance Company Proprietary and Confidential
2
Apr 16th, 2014
Allstate at a glance
• The Allstate Corporation (NYSE: ALL) is the nation's largest
publicly held personal lines insurer, serving approximately 16
million households through its Allstate, Encompass, Esurance
and Answer Financial brand names and Allstate Financial
business segment.
• Allstate branded insurance products (auto, home, life and
retirement) and services are offered through Allstate agencies,
independent agencies, and Allstate exclusive financial
representatives, as well as via www.allstate.com,
www.allstate.com/financial and 1-800 Allstate®, and are widely
known through the slogan "You're In Good Hands With
Allstate®."
© Allstate Insurance Company Proprietary and Confidential
3
Apr 16th, 2014
What is the presentation about?
• Explain what lineage means to the technology and
business users
• Explain lineage concepts
• Not based on any specific vendor implementation
© Allstate Insurance Company Proprietary and Confidential
4
Apr 16th, 2014
What is Lineage
According to techopedia.com
Data lineage is generally defined as a kind of data
life cycle that includes the data's origins and where it
moves over time. This term can also describe what
happens to data as it goes through diverse
processes.
Posted by: Cory Janssen
© Allstate Insurance Company Proprietary and Confidential
5
Apr 16th, 2014
Lineage Benefits
•
Understand where is the data
•
Understand how the data moves
•
Understand what happens to the data as it moves
•
Impact analysis
•
Dependency analysis
•
Pictorial view of the whole process
Front End
Back End
Reports
Report 1
Field 1
Operation 1
Field 3
Field 2
Operation 2
Report 2
Field 4
© Allstate Insurance Company Proprietary and Confidential
6
Report 3
Apr 16th, 2014
Types of Lineage
•
Technical Lineage
• Traces the data as it moves through the physical columns
•
Business Lineage
• Provides a business friendly view of how attributes traverse across the various applications
•
System Lineage
• Provides a high level view of how data moves between systems
•
Process Lineage
• Provides a view of the various business processes acting on the data
Front End
Front End
System 1
Process 1
Back End
Reports
Back End
Reports
DATA 1
Address
System 3
Street
Report 1
Field 1
Field 3
Process
Warehouse
Operation 1
City
Field 2
Process 2
System 2
DATA 3
3
Process 4
Address
Full Address
Report 2
Operation 2
`
DATA 2
Field 4
System 4
Zip
© Allstate Insurance Company Proprietary and Confidential
Address
7
Report 3
Apr 16th, 2014
Presenting Lineage
• Graphical
• Textual
© Allstate Insurance Company Proprietary and Confidential
8
Apr 16th, 2014
Building Lineage
As
Built
As
Designed
Hybrid
© Allstate Insurance Company Proprietary and Confidential
9
Apr 16th, 2014
Building Lineage – As Built
Lineage is built from the ETL graphs which move/transform the data.
Pros
• Most accurate form of lineage, as it represents what the ETL is doing
to the data
• Most 3rd party tools will be able to generate this lineage, especially
from their own ETL graphs
• Most metadata tools can read ETL graphs metadata from other
vendors to generate one lineage map.
Cons
• Not easy to traverse lineage of data flowing through non-ETL
applications, like programming code
• Not easy to understand data moving through disconnected services
like Web Services or Message Queues easily
© Allstate Insurance Company Proprietary and Confidential
10
Apr 16th, 2014
Building Lineage – As Designed
Lineage is generated from the mapping design documents. Lineage is
created by “Stitching” the same column from the different mapping
documents to get a holistic picture as the data moves between columns.
Pros
• Lineage can be provided for any system not necessarily an ETL
process
• Lineage can be customized to satisify the required detail level
Cons
• Lineage might not reflect how the data move was actually
implemented
• Lineage will not automatically update as processes change
• Manual process that is expensive and difficult to have the discipline to
maintain
© Allstate Insurance Company Proprietary and Confidential
11
Apr 16th, 2014
Building Lineage – Hybrid
Lineage is generated by combining the “As Built” lineage mainly and
completing the flow in the missing sections using “As Designed” lineage.
Pros
• Most complete system lineage view, as it show a view of the end-end
data movement
• Many vendors now allow for “Patching” the As-Built lineage with the
As-Designed lineage
Cons
• Not very easy to implement
• Some lineage sections have to be manually maintained
© Allstate Insurance Company Proprietary and Confidential
12
Apr 16th, 2014
Lineage Landscape..
What should the lineage cover?
•
•
•
•
•
•
•
•
•
•
•
•
•
Reports & Report Fields
Database Tables & Columns
Database Views, Materialized Views
Database Packages, Functions, Triggers, and Stored Procedures
Flat Files
BigData Stores
Applications and Systems
Hierarchal Structures Elements like XML, JSON, BSON, Avro
Legacy Copybooks Files & Fields
ETL Transformations and Graphs
Programming modules – Cobol, Java, .NET
Messaging Services & Message Queues
Web services
© Allstate Insurance Company Proprietary and Confidential
13
Apr 16th, 2014
Enterprise Business Glossary..
What does the lineage not cover?
•
•
•
•
•
•
•
•
•
•
•
•
Business name
Business definition
Specific notes about usage
Classification
Sensitivity
Stewards/owners/custodians
Auditing information
Operational Information
Quality Information
Super/Sub types
Related items/fields
Other implementations
© Allstate Insurance Company Proprietary and Confidential
14
Apr 16th, 2014
Integrating Lineage With Business Glossary
• Lineage Without Glossary Integration
• Lineage With Glossary Integration
© Allstate Insurance Company Proprietary and Confidential
15
Apr 16th, 2014
Lineage Design Questions
•
•
•
•
•
•
Versioning
Variation by Context
Keeping Current
Identifying Breakage
Variation Between Design & Build
Too Detailed or Not Detailed Enough
Before 2011
Front End
Back End
Home
Fron t End
Back End
Front End As DesignedField 1
Front End
Front End
Field 1
Auto
Operation 1
Operati on 1
Field 1
Field 2
Field 1
Field 1
Field 1
Field 1
Field 1
Field 1
Field 1
Reports
Fie ld 3
Report 2
Front End
Field 1
Field 1
X
Field 1
© Allstate Insurance Company Proprietary and Confidential
X
Field 3
Field
1 4
Fie ld
Field 1
1
FieldReport
1
Field 1
Back
FieEnd
ld 3
Field 1
Operati on 1
Reports
Report 1
Operation 1
Field 1
Report
Report
Repo rts
Field 1
Back
Field 1 End
Field 1
Field 1
Operati on 1
Field 1
FieldField
1 2
Field 1
Field 1
Back End
Field 1
Fie ld 2
Field 1
Field 1
As
Fie ldBuild
1
Field 1
Reports
Reports
Report 2
Field 1
Fron
t End
After
2011
Fie ld 2
Field 1
Report 1
Back End
Back End
Field 1
Repo rts
Reports
Field 3 End
Back
Field 1
Field 1
Front
End
Field 1
Report 1
Back End
Front End
Fie ld 1
Reports
Field 1
Operation 2
Field 1
Reports
Report 2
Operati on 2
Field 1
Report
Report
Report 2
Report
Report 3
Report
Fie ld 2
Field 4
16
Report 3
Apr 16th, 2014
Final Thoughts
• Lineage is an important and necessary item in the suite of data
management & data governance utilities
• To provide context added value, specifically to the business users,
Lineage should be tightly coupled with the enterprise business
glossary
• Properly built lineage is a huge asset to improving data quality in the
enterprise, as it gives insight into what is happening to the data as it is
moves between the different systems
• Lineage helps enterprises understand where the data is, and hence is
a helpful utility in identifying the locations that hold sensitive data
which needs to be secured
© Allstate Insurance Company Proprietary and Confidential
17
Apr 16th, 2014
Questions
Saad Yacu
[email protected]
© Allstate Insurance Company Proprietary and Confidential
18
Apr 16th, 2014