Session Name

Download Report

Transcript Session Name

November 10th, 2011
DQS BOOTCAMP
Microsoft
SQL Server 2012
DAVID FAIBISH, SENIOR PROGRAM MANAGER
SQL SERVER DATA QUALITY SERVICES
Our Day Together …
2
DATA QUALITY
101
4
Top 3
impediments
Source: Information Week Reports, 2011
5
Top Barrier for BI
Source: Information Week Reports, 2011
6
DQ is MDM top driver
Source: Information Week Reports, 2011
7
Demand is on the rise.
Overall market size for
DQ software in 2010
was $800M.
12.6% increase over
2009.
Forecasted 16% yearly
grow in next five years.
- Gartner, 2011
It’s not only the
breadth of functional
capabilities.
Focus on the business
User.
Leverage your business
resources.
Business process – For
data quality (and
MDM) initiatives to be
a success – they need
to support integration
with the existing
business processes
- Gartner, 2011
20.1%
30.4%
SAS Institute
IBM
Informatica
15.9%
SAP
QAS
5.3%
Other Vendors
13.0%
15.2%
Data Integration market ($2.6B in 2009)
Source: Gartner
8
Data
Quality
Issue
Sample Data Problem
Standard
Are data elements consistently
defined and understood?
Gender code = M, F, U in one system and
Gender code = 0, 1, 2 in another system
Complete
Is all necessary data present?
20% of customers’ last name is blank,
50% of zip-codes are 99999
Accurate
Does the data accurately
represent reality or a verifiable
source?
A Supplier is listed as ‘Active’ but went out of
business six years ago
Valid
Do data values fall within
acceptable ranges?
Salary values should be between
60,000-120,000
Unique
Data appears several times
Both John Ryan and Jack Ryan appear in the
system – are they the same person?
9
DQ Issues and DQ Dimensions
Before
After
Name
Gender
Street
House #
John Doe
Male
60th street
45
Jane Doe
Male
Jonathan ln
36
Name
Gender
Street
House #
John Doe Male
E 60th St
Jane Doe
Jonathan
Lane
Female
Completeness
Before
After
Accuracy
Name
John Smith
Margaret & John smith
Maggie Smith
John Smith
Address
Name
John Smith
Margaret & John smith
Maggie Smith
John Smith
Address
Zip code
City
State
D.O.B
New York
New York
08/12/64
10023
Poughkeepsy
NY
21-dec-1954
City
State
D.O.B
45W
Zip
code
10022
New York
NY
08/12/64
36
10023
Poughkeepsie
NY
12/21/54
Consistency
Conformity
545 S Valley View Drive # 136
545 Valley View ave unit 136
545 S Valley View Dr
545 Valley Drive St.
545 S Valley View Drive # 136
545 Valley View ave unit 136
545 S Valley View Dr
545 Valley Drive St.
Postal Code
34563
34563-2341
34253
Zip Code
34563
34563-2341
34253
Uniqueness
City
Anytown
Anytown
Anytown
NY
State
New York
New York
New York
NY
City
Anytown
Anytown
Anytown
NY
State
New York
New York
New York
NY
Cluster
1
1
1
2
10
Amend, remove or
enrich data that is
incorrect or incomplete.
This includes correction,
enrichment and
standardization .
Analysis of the data
source to provide
insight into the quality
of the data and help to
identify data quality
issues.
Cleansing
Matching
Profiling
Monitoring
Identifying, linking or
merging related
entries within or
across sets of data.
Tracking and
monitoring
the state of Quality
activities and Quality
11
of Data.
11
INTRODUCE
DQS
Contained Database Authentication
AlwaysOn
ColumnStore Index
Unstructured Data Performance
Multiple Secondaries
Flexible Failover Policy
Data Quality Services
Reporting Alerts
Power View
Distributed Replay
Availability Groups
SharePoint Active Directory Support
T-SQL
Data Quality Services (DQS) is a Knowledge-Driven data
quality solution enabling data stewards to easily improve
the quality of their data
High quality data is
critical to effective
business intelligence
and to business
activities
DQS is an on-premise
Data Quality product in
SQL Server 2012,
extendible with
knowledge from
multiple parties thru
Azure DataMarket
Richer DQ knowledge
and capabilities in the
cloud will make it
even easier to provide
high quality data
14
Knowledge-Driven
Based on a Data Quality
Knowledge Base (DQKB)
Semantics
Data Domains capture the
semantics of your data
Knowledge Discovery
Acquires additional knowledge
the more you use it
Open and Extendible
Add user-generated knowledge &
3rd party reference data providers
Easy to use
User experience designed for
increased productivity
Knowledge
Management
Build
Integrated
Profiling
Connect
Knowledge
Base
Use
DQ Projects
16
DQ Clients
DQS UI
Knowledge
Discovery
and
Management
Interactive
DQ Projects
Data
Exploration
SSIS DQ
Component
MDS Excel
Add in
Future Clients –
Excel,
Dynamics
Azure Market Place
Categorized
Reference Data
Services
DQ Server
MS DQ
Domains Store
Categorized
Reference Data
RD Services API
(Browse, Set,
Validate…)
Reference Data API
(Browse, Get,
Update…)
3rd Party
/ Internal
DQ Engine
Knowledge
Discovery
DQ Projects Store
DQ Active
Projects
Data
Profiling &
Exploration
Cleansing
Matching
Common Knowledge Store
MS Data
Domains
Reference
Data
Knowledge Base Store
Local
Data
Domains
Published
KBs
Reference
Data
Services
Reference
Data Sets

Define
Coordinate
 Measure
 Continuously Improve
 Control and Monitor

With DQS the IW / Data Expert can get actively
involved in Data Quality initiatives
Knowledge-Driven
•
•
•
•
Open and Extendible
Easy to use
•
•
•
•
•
Rich semantic Knowledge Base
Continuous improvement as
knowledge is discovered
Build once, reuse for multiple DQ
improvements
Focus on cloud-based Reference
Data
User-generated knowledge
Integration with SSIS and MDS
Focus on productivity and
user experience
Designed for business users
Out-of-the-box knowledge (DQ
content)
http://northamerica.msteched.com
www.microsoft.com/teched
www.microsoft.com/learning
http://microsoft.com/technet
http://microsoft.com/msdn
DQS Blog
DQS Movies
DQS Forum
Tips, tricks and
guidance on best
practices for using DQS
– courtesy of the DQS
team
A set of getting
started movies for an
easy introduction to
DQS
Come participate in
DQS related discussions
in our DQS forum on
MSDN
blogs.msdn.com/b/dqs
Available Here
Available Here