Data Warehouse ROI for Showcase Corp.

Download Report

Transcript Data Warehouse ROI for Showcase Corp.

Comparison of Enterprise
Data Platforms
1
WILLIAM MCKNIGHT
PRESIDENT
MCKNIGHT CONSULTING
GROUP, LLC
William McKnight
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Advisor, Architect, Strategist, Project Manager
20 years of information management experience
Ran a Best Practices Business Intelligence Program as VP/IT
Many clients have gone public with their success case study where
William was advisor or architect of the solution
International keynote speaker, 100+ talks
Monthly columnist in Information Management Magazine, 9 years,
80+ articles
www.b-eye-network.com – blog, video blogs, channel, radio shows
12+ White papers
Searchdatamanagement.com ‘ask the expert’
Worked on over 50 client Information Management programs
Author: “90 Days to Success in Consulting”
Best Practices Judge, Expert Witness
DB2 Version 1 Developer
MBA
© McKnight Consulting Group, 2010
Outline
3
 DBMS Market
 The Data Warehouse Appliance
 Columnar Data Storage
 Open Source BI
 On-Demand BI Platforms
 Virtualization
 Next Steps
© McKnight Consulting Group, 2010
What I See In Large Enterprise Customers
 Common Objectives
Get leaner; Be more agile
 Become more real-time
 Significantly reduce costs
 Similar Situations
 Extreme data complexity and volumes
 BI is becoming a commodity
 New economics
 commoditization, consolidation, cloud…
 New opportunities
 analytic appliances, virtualization, …
 Tipping Points
 Enterprise data warehouse necessary, not sufficient
 Pragmatic buyers asking practical questions
 Enterprise software model becoming increasingly untenable
 Maintenance 45% of revenues

DBMS Market
5
© McKnight Consulting Group, 2010
Platform



Many components linked together
Determines
 Processing power
 I/O bandwidth
 High number of users
 High complexity of data access
 How much data can be stored
Must be adaptable to growing requirements over time
The data warehouse platform must be high performance, so much each component.
© McKnight Consulting Group, 2010
Data Volume Explosion








Historical
All customer touch-point data
Granular data
Clickstream data
Operationally-needed data
Data democratization
RFID, call center data
Increased usage as success has beget success
© McKnight Consulting Group, 2010
Uniprocessor
RAM
I/O
Bottleneck
© McKnight Consulting Group, 2010
Parallel Processing


Single-CPU Limitations
 Signal speed limited by speed of light
 Circuit size limited
 Higher performing processors cost exponentially
more than their performance take-up
Three types
 Functional specialization
 Data
 Workflow
© McKnight Consulting Group, 2010
SMP


Added Processors
Same programming paradigm as Uniprocessor
RAM
I/O
Bottleneck
© McKnight Consulting Group, 2010
Clusters



Multiple buses
CPU, memory and bus
make up a node
A set of nodes is a cluster
 Shared-nothing vs.
shared-disk
RAM
I/O
Interconnect
RAM
I/O
Bottleneck
© McKnight Consulting Group, 2010
RAM
I/O
MPP






Large cluster with more I/O
bandwidth
Up to thousands of processors
SMP Nodes
 Shared-Nothing versus
Shared-Disk
“Mesh” Interconnect
Faster Interconnect
Variants
 Remote Memory Cache
(NUMA)
© McKnight Consulting Group, 2010
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
RAM
I/O
Row-based MPP examples
 Teradata
 DB2 Open Systems
 Netezza
 Oracle Exadata
 DATAllegro/Microsoft Madison
 Greenplum
 Aster Data
 Kognitio
 HP Neoview
© McKnight Consulting Group, 2010
Typical design choices in row-based MPP
 “Random” (hashed or round-robin) data
distribution among nodes
 Large block sizes

Suitable for scans rather than random accesses
 Balanced hardware
 High-end interconnect
© McKnight Consulting Group, 2010
Multidimensional Data Storage
 Separate physical file structure
 Data navigation path predefined
 Access is mainly through desktop tools
© McKnight Consulting Group, 2010
Database Specialization is upon us
16
 General purpose DBMS
 Multidimensional OLAP
 Data Warehouse Appliance
 Memory Resident DBMS
 Columnar DBMS
The field is changing
© McKnight Consulting Group, 2010
Analytic Specialist DBMS
 Aster Data
 Netezza
 Dataupia
 Oracle Exadata
 Exasol
 ParAccel
 Greenplum
 Sybase IQ
 HP Neoview
 Teradata
 IBM DB2 BCUs
 Vertica
 Infobright/MySQL
 Kickfire/MySQL
 Kognitio
© McKnight Consulting Group, 2010
The Data Warehouse Appliance
© McKnight Consulting Group, 2010
Definition of
Data Warehouse Appliance





Hardware
OS
DBMS
Storage
Proprietary Software
It is all of these on a preconfigured platform.
© McKnight Consulting Group, 2010
Components:
Traditional vs. Appliance
Traditional
RAM
Appliance
I/O
RAM
Cost savings from commodity components and reduced personnel cost.
© McKnight Consulting Group, 2010
I/O
Node Configuration:
Traditional vs. Appliance
Traditional
Appliance
I/O
I/O
Physical proximity of processing power and disk.
Disk is on direct attach to processing module, no disk arrays.
© McKnight Consulting Group, 2010
Appliance Workload Distribution
Example: Select … Where State = “CA”
Selection and Projection are done at the FPGA level (Netezza)
© McKnight Consulting Group, 2010
Appliance Challenges
Low cost, low power components
 All table scans (Netezza), no indexes
 FPGA limitations on query context management may
limit concurrent processing
 No disk options
 Mean time to repair nodes
 ODBC tool integration
 Pre-fetch memory limitations

© McKnight Consulting Group, 2010
Columnar Data Storage
24
© McKnight Consulting Group, 2010
Moore’s Law at Work
25
 In last 30 years
 Transistors per chip > 100,000
 Disk density > 100,000,000
 Disk Speed > 12.5
© McKnight Consulting Group, 2010
DBMS Design over the years
26
 RDBMS design is virtually unchanged, except for
parallelism
 Hardware, however:



Disk capacity has increased tremendously (and got far
cheaper)
CPU performance has improved too, but…
Transfer rates and seek times have increased modestly
© McKnight Consulting Group, 2010
Making OLTP Analytic
 Specialized indexes
 Star indexes
 Materialized views
 Other indexes
 OLAP cubing
 Summary tables
 Partitioning
 Packaged analytics
© McKnight Consulting Group, 2010
Transaction Processing
28
 SQL Server on modern spec Intel box can do 20,000
TPS



That’s 100,000 disk I/Os per second it COULD do
If not that I/Os per second can only do far less
You’d need hundreds of drives per CPU today
 Disk drives are slower
 Makes random disk I/O relatively slow
© McKnight Consulting Group, 2010
Cache/Memory
29
 Multiple levels of cache (L1, L2)
 CPU in wait mode… a lot
 L2 Cache misses
© McKnight Consulting Group, 2010
Columnar DBMS examples
 Sybase IQ
 SAND
 Vertica
 ParAccel
 InfoBright
 Kickfire
 Exasol
 MonetDB
 Microsoft SQL Server 2008 R2 Gemini/Vertipaq
 Oracle – hybrid columnar (future)
Mix of storage, compression and materialization
Row-Wise DBMS Stores Data in Rows
31
CustomerIDCompanyName
1119 m4ii
1120 Aris
1121 Stolt Offshore MS Ltd
1122 Medtronic, Inc.
1123 Beckman Coulter
1124 Banco de Bogotá
1126 The Boeing Company
1127 IT/1 Consulting
1128 Banco de Bogotá
1133 The HArtford
1134 CGI Group
1135 Metavante Corporation
1138 CP Associates
1142 PRSB
1143 aft
1144 Zamba Solutions
1146 MR Consultancy
1147 Intellor Group
1148 Banco de Bogotá
© McKnight Consulting Group, 2010
ContactFirstName
dhamotharan
Doug
Craig
Mark
Tim
José Alfredo
Mike
Leif B.
JOSE ALFREDO
Jimmy
Terry
Ron
Wilson
Ming Long
greg
Jeff
Mukesh
Robin
José Alfredo
ContactLastName
achaiyan
Johnson
Lennox
Kohls
Parsons
López Arias
Roberts
Soerensen
LOPEZ ARIAS
Chen
Petherick
Kundinger
Mak
Wu
tanner
McCall
Rughani
Martin
López Arias
ContactTitle
solutions architect
Practice Director
Mr
Principle Database Administrator
Business Systems Manager
Administrador DWH
Senior Business Process Architect
Data Warehouse Consultant
Administrador DWH
Business System Analyst
Senior Consultant
Assistant Vice President
Consultant
Assistant Administrator
cto
Executive Vice President
Mr
Project Coordinator
Administrador DWH
PhoneNumber
91222507176
206-676-5636
+66 1226 712519
763.516.2557
+61 22 996 0963
5713320032
(206)655-7155
+65 26236691
5713320032
215-653-2662
613-236-2155
616-577-9227
252-92593731
226-2-23931261 ext 719
303.233.6122
602-626-6125
+66 (0)1379 662219
301-202-6766
5713320032
Data Page Layout
32
Page Header
1120Aris
Director
Doug
206-676-5636
[email protected]
Johnson
Practice
Records
1121Stolt Offshore MS Ltd Craig
Lennox
+66 1226 71269
[email protected]
Mr
1122Medtronic, Inc.
Mark
Kohls
Database Administrator
763.516.2557
[email protected]
Principle
© McKnight Consulting Group, 2010
Row IDs
Page
Footer
Columnar DBMS Stores Data in Columns
33
CustomerID
1119
1120
1121
1122
1123
1124
1126
1127
1128
1133
1134
CompanyName m4ii
Aris
Stolt Offshore MS Ltd
Medtronic, Inc.
Beckman Coulter Banco de Bogotá The Boeing Company
IT/1 Consulting
Banco de Bogotá The HArtford
CGI Group
ContactFirstName dhamotharan
Doug
Craig
Mark
Tim
José Alfredo
Mike
Leif B.
JOSE ALFREDO Jimmy
Terry
ContactLastName achaiyan
Johnson
Lennox
Kohls
Parsons
López Arias
Roberts
Soerensen
LOPEZ ARIAS
Petherick
ContactTitle
solutions architect Practice Director Mr
PhoneNumber
91222507176
206-676-5636
Principle DatabaseBusiness
Administrator
Systems Administrador
Manager DWHSenior Business Process
Data Warehouse
Architect Consultant
Administrador DWHBusiness System Analyst
Senior Consultant
+66 1226 712519 763.516.2557
© McKnight Consulting Group, 2010
Chen
+61 22 996 0963 5713320032
(206)655-7155
+65 26236691
5713320032
215-653-2662
613-236-2155
Data Page Layout
34
Page Header
Records
1120
1121
1122
1123
1124
1125
…
Page
Footer
© McKnight Consulting Group, 2010
Bitmapped Indexes
Bitmapped Representation for STATE
Row ID
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
AK
AL
AR
CA
CO
DC
FL
GA
HI
ID
IN
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Columnar only stores
the 1’s
Makes the index
very small...
Handling Not Equal To Query
Bitmapped Representation for STATE
Row ID
AK
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
AL
AR
CA
CO
DC
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
Columnar engine uses the
same ‘CA’ Bit Map
FL
GA
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
HI
ID
IN
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
WHERE state != ‘CA’
The Proof Is In The Speed
“How many MALES are NOT INSURED in CALIFORNIA?
RDBMS
800 Bytes x 10M = 500,000 I/Os
Gender
State
Insured
M
NY
Y
16K Page
M
CA
Y
10M
ROWS
F
M
M
-
CT
MA
CA
-
N
Y
N
800 Bytes/Row
Gender Insured State
1
2
3
4
M
M
F
M
Y
N
Y
N
CA
CA
NY
CA

Process large amounts of
unused data

Often requires full
table scan
10M Bits x 3 col / 8
16K Page
10M
Bits
1
1
0
1
+
0
1
0
1
+
= 235 I/Os
1
1
0
1
=
2
Run Length Encoding (RLE)
38
Qtr
Store#
Sales
Qtr
Q1
Q1
Q1
Q1
Q1
Q1
Q2
Q2
Q2
Q2
Q2
32
35
36
39
42
43
32
35
36
39
42
6
9
11
8
5
14
31
4
16
7
9
Q1
Q2
Q3
Store#
32
35
36
1
501
1000
500
999
1498
1
2
3
1
2
3
(Value, StartPosition, Count)
© McKnight Consulting Group, 2010
Dictionary Encoding
39
Q1
0
Q2
1
Q3
2
Q4
3
Dictionary Map
© McKnight Consulting Group, 2010
Qtr
0
0
0
0
0
0
1
1
1
1
1
Dictionary Encoding Example
Original data value
Orig.
Size*
Dictionary
Entry
Compressed
Value
New size
(bytes)
England
30
val[0]=England
0
1
England
30
In dictionary
0
1
United States of America
30
val[1]=United States of America
1
1
United States of America
30
In dictionary
1
1
Japan
30
val[2]=Japan
2
1
Argentina
30
val[3]=Argentina
3
1
Sri Lanka
30
val[4]=Sri Lanka
4
1
Japan
30
In dictionary
2
1
United States of America
30
In dictionary
1
1
Totals
270
9
* Fixed length, 30 bytes per value
© 2009 ParAccel. All Rights Reserved.
40
Open Source BI
41
© McKnight Consulting Group, 2010
What is Open Source BI?
42
 Access to source code, ability to share/modify code
 Free and open source vs. Commercial open source
 All categories of BI covered:
 Reporting
 OLAP
 Data mining
 Data visualization
 GIS
 ETL
 Data Quality
© McKnight Consulting Group, 2010
Who’s Adopting Open Source BI
43
 Packaged applications/ISV market
 And Data Warehouse Appliances
 Proof of Concepts
 Edge uses (non EDW)
 New needs
 When “good enough” is
 Price conscious
 Academic
 < 50 GB applications
© McKnight Consulting Group, 2010
Challenges to Open Source BI
44
 Software maturity
 Commercial version needed for needed functionality
 Service and support
© McKnight Consulting Group, 2010
On-Demand BI Platforms
45
© McKnight Consulting Group, 2010
BI Software as a Service/Cloud
46
 Delivery model with hosted software, paid for on a







subscription basis
Multi-tenant vs. single-tenant
Data security concerns
Stovepipe architecture concerns
Regulations may prevent storage arrangement
Customization is a must
Fault tolerance, high availability and on-demand
capacity
Vendor viability
© McKnight Consulting Group, 2010
SaaS/Cloud BIBenefits
47
 Lowered initial cost
 MTM by per-user/enterprise/concurrent users/volumes of
data
 Fees: enhancements, customization, cancellation
 Speed to some development
 Business focused
 Installation, maintenance taken care of
© McKnight Consulting Group, 2010
SaaS/Cloud BI Challenges
48
 Some vendors small
 Long-term cost
 Understanding limitations
 Integration with in-house and other hosted systems
© McKnight Consulting Group, 2010
SaaS/Cloud BI Fit
49
 Limited knowledge of BI
 Web-based environment
 Customization capabilities
 Speed is essential
 SMB EDW, Large co. mart
© McKnight Consulting Group, 2010
Virtualization
50
© McKnight Consulting Group, 2010
When Should I Use Data Virtualization?
 Industry Business Driver or Key Initiative…
 Cost reduction, customer care, competitive response, compliance,
consolidation (M&A)
 Business Function….
 Marketing, Sales, Research, etc.
 Consuming Applications…
 BI, Portal, Dashboards, Composite Applications, etc.
 Technology Trend…
 SOA, Cloud, Mashups, etc.
 Adjacent Technology…
 BI, MDM, Data Warehouse, ETL, Data Quality, etc.
 Specific IT Project…
 Project X, Project Y, Project Z in the upcoming portfolio
Five Data Virtualization Patterns
Solve Most Needs
DV Pattern
Image
For More Information
Data Federation
Optimize queries across your disparate data sources
Data
Warehouse
Extension
Extend data breadth to enrich your reporting and analytics
Enterprise Data
Sharing
Overcome data complexity and simplify your data architecture
Real-time
Enterprise Data
Infrastructure
When real-time is the right time
Cloud Data
Services
Prepare your data for the cloud
Next Steps
53
© McKnight Consulting Group, 2010
Future Trends in Data Management
54
 Confusion, opportunities
 Master data management and other movement to







operational BI
Appliance acceptance, even for EDW
Selective columnar
Virtualization
Selective on-demand BI
Data clouds
Terminology battles!
Open source, MapReduce – wild cards
© McKnight Consulting Group, 2010
Next Steps
 Stay informed
 Develop information use cases (function,
performance)



Current
Anticipated Future
Probable Future/Vision
 Short-List gap based on
 Database storage format (row, column, MOLAP)
 Storage model (appliance, cloud, SaaS)
 Pricing model (open source) and budget
 Run Proofs-of-concept
© McKnight Consulting Group, 2010
Summary
56
 Adopt columnar, Data Warehouse Appliances (or




both) for edge applications
Consider Data Warehouse Appliances for EDW
POC Open Source
Selectively add on-demand BI (or be prepared for it)
Add Operational BI capabilities
© McKnight Consulting Group, 2010
Remember!
 This is never a total plug-and-play
 Data modeling is important
 You have to staff DW expertise
 The business does have to be involved
 Data quality is important
 What is today will not always be
 TCO includes hardware, software, people, reality
 All BI does not happen in one place
 Architecture is important
 No excuse for not understanding your business well
 Get a readiness assessment before making big moves
© McKnight Consulting Group, 2010
Comparison of Enterprise
Data Platforms
58
Presented by:
William McKnight
President
McKnight Consulting Group LLC
(214) 514-1444
[email protected]
www.williammcknight.com