Defensive ETL - Tim Mitchell

Download Report

Transcript Defensive ETL - Tim Mitchell

The World’s Largest Community of SQL Server Professionals

Defensive ETL

Tim Mitchell Artis Consulting

Session Objectives • • • What is Defensive ETL?

Review threats and challenges Discuss countermeasures in SSIS

The World’s Largest Community of SQL Server Professionals

Tim Mitchell • • • • • Business Intelligence Consultant – Artis Consulting, Dallas TX Contributing author, MVP Deep Dives 2 Coauthor, SSIS Design Patterns SQL Server MVP TimMitchell.net | Twitter.com/Tim_Mitchell

The World’s Largest Community of SQL Server Professionals

• • Questions Surveys Housekeeping

The World’s Largest Community of SQL Server Professionals

Common Data Challenges • • Accidental – Data corruption – Data loss – Transactional inconsistency Malicious – Data theft – SQL injection

The World’s Largest Community of SQL Server Professionals

Common Data Challenges • System – Hardware/software failures unrelated to the operation of the ETL – Unexpected downtime – Resource contention

The World’s Largest Community of SQL Server Professionals

ETL Challenges • Data sources/destination challenges – Inconsistent types – Inconsistent lengths – Unreliable connections – Just plain bad data

The World’s Largest Community of SQL Server Professionals

ETL Challenges • Transformation challenges – Business logic errors – Generated duplicates – “Lost” data – Time allotment overrun

The World’s Largest Community of SQL Server Professionals

What Is Defensive ETL?

• • Two universal truths: I’m doing it right.

Everybody else is crazy.

The World’s Largest Community of SQL Server Professionals

What Is Defensive ETL?

• • • Assumptions of a Defensive ETL professional: Any input/output over which I do not have complete control cannot be trusted.

Any process that relies on either hardware or software cannot be trusted.

Murphy’s Law is inescapable

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • • • • • Multifaceted approach: Package Sources/destinations Transformations Error/event handling Restartability Validation

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS

Package-Level Defenses

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • Transactions Package, container, and/or task

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • Checkpoints Task or container level restart Avoid use with transactions

The World’s Largest Community of SQL Server Professionals

Logging Defensive ETL in SSIS

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • • ETL Framework Structured approach Shared logging and error handling Restartability

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS Event Handling

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS

Control Flow Defenses

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • Precedence Constraints Alternate paths

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • Script Task Wait/pause Retry

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS

Data Flow Defenses

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • Connections Metadata is configurable for some connection types

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • Data Sources Error or Truncation Column by column

The World’s Largest Community of SQL Server Professionals

Defensive ETL in SSIS • • Inline Cleansing Conditional Split Derived Column transform

The World’s Largest Community of SQL Server Professionals

Defensive strategies • Incoming data – Inline cleansing – Post-ETL cleansing – Third-party cleansing • DQS in SQL Server Denali

The World’s Largest Community of SQL Server Professionals

Defensive strategies • Incoming data – Data type/length compensation – Loops/retry

The World’s Largest Community of SQL Server Professionals

Defensive strategies • Transformations – Error/lookup outputs – Branching – Rowcount validations

The World’s Largest Community of SQL Server Professionals

The World’s Largest Community of SQL Server Professionals

Thanks for Attending

Visit www.sqlservercentral.com for free SQL Server eBooks, articles, videos, blogs, news, and more.

Please Don’t Forget to Turn in Your Evaluations