Teradata - ISG - University of California, Irvine

Download Report

Transcript Teradata - ISG - University of California, Irvine

Teradata Past, Present and Future
Todd Walter
CTO – Teradata Labs
2 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Teradata Company Highlights
•
•
•
•
Founded 1979 – West LA
First product to market – 1984
First Terabyte system – 1987
Acquired by AT&T and
merged with acquired NCR – 1992
• Tri-vested as part of NCR - 1997
• Teradata Corporation – (re)Launched October 1, 2007
> Global Leader in Enterprise Data Warehousing
– EDW/ADW Database Technology
– Analytic Solutions
– Consulting Services
> Positioned in Gartner’s Leaders Quadrant
in data warehousing since 1999
• Top 10 U.S. publicly-traded software company
>
>
>
>
S&P 500 Member
Listed NYSE: “TDC”
NYSE Arca Tech 100
2007 - $1.7B revenue
• Global presence and world-class customer list
> More than 850 customers
> More than 2,000 installations
• 5,500+ associates
3 > 09/2009
Copyright Teradata © 2007-2009 – All rights
4 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Continuous (R)evolution
+ Analytic
applications
+ Data models
and reports
+ Consulting
+ Database
Hardware
5 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Continuous (R)evolution
Sell applications with
consulting, SW and HW
inside
Sell solving business problems –
and technology to solve them
Sell the SW with
some HW to run on
Sell the HW, give
everything else away
6 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Continuous (R)evolution
10% R&D
90% integration
Xeon Quad Core
20% R&D
80% integration
Pentium
70% R&D
30% integration
i486
90% R&D
10% integration
80286
7 > 09/2009
Copyright Teradata © 2007-2009 – All rights
1906
1903
1907
TRADEMARK
1901
1905
1909
1939
1920
1950
1963
1941
1971
1985
An AT&T Company
1991
1991
Global Information
Solutions
1994
8 > 09/2009
1997
Copyright Teradata © 2007-2009 – All rights
Scale
• Every dimension of the technology must scale to meet today’s requirements
> Data, Data model complexity, Users, Performance, queries, Data loading, …
• What is a big Data Warehouse?
• Total spinning disk?
> 2.5 Petabytes
• Big table?
> 150 billion rows
• Number of tables?
> 300,000
• Insert/Update per day?
> 5 billion records
• Identified users?
> 100,000
• Queries per day?
> 5 million
• Data Turnover rate?
> 1TB per 5 seconds
9 > 09/2009
Copyright Teradata © 2007-2009 – All rights
The Problem
Operational Systems
Decision Makers
Accts. Payable
Marketing
Accts. Receivable
Invoicing
Supply Chain
Sales/Orders
Finance
Finance G/L
Risk Management
Customer Support
Maintenance
HR
Payroll
Sales
Purchasing
Operations
Order Fulfillment
Inventory
Manufacturing
Call Center …
Inventory …
Proliferation of Data Marts has resulted in
fragmented data, higher costs, poor decisions
10 > 09/2009
Copyright Teradata © 2007-2009 – All rights
The EDW Solution
Operational Systems
Decision Makers
Accts. Payable
Marketing
Accts. Receivable
Invoicing
Supply Chain
Sales/Orders
Finance
Finance G/L
Customer Support
HR
Enterprise
Data
Warehouse
(EDW)
Risk Management
Maintenance
Payroll
Sales
Purchasing
Operations
Order Fulfillment
Inventory
Manufacturing
Call Center …
Inventory …
Integrated data provides consistency of data,
lower costs, better decisions
11 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Active Enterprise Intelligence™
An Obvious Trend: More Speed, More Users
Seconds
Days
Strategic Intelligence
Operational Intelligence
Enterprise Data Warehouse
BI Tools & reports
Analysis & visualization
Predictive Analytics
EDW Enterprise Integration
Mixed workload management
SOA, BPMS, IDEs
Portals/composite applications
12 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Active Enterprise Intelligence™ enabled by an
Active Data Warehouse™
Suppliers Customers
Call
Center
Logistics
Executive
OPERATIONAL INTELLIGENCE
Active Enterprise Integration
Active
Workload
Management
Active
Availability
Teradata Warehouse
13 > 09/2009
Marketing
Business Intelligence
Tools and Applications
Active Events
Active
Load
Finance
STRATEGIC INTELLIGENCE
Workflow & Applications
Active Access
Product/
Services
Copyright Teradata © 2007-2009 – All rights
Active Enterprise Intelligence™ in Retail
Detecting Retail Fraud
Situation
Thieves make copies of cash register
receipts, walk into the store, pick up
merchandise, and return items for cash.
Problem
Associates in returns department did not
have historical POS receipt retrieval access
to verify against previously “returned”
receipts or to do returns without receipts.
Solution
Associates query Teradata to quickly check
if a return has already occurred on that
receipt number. Also used by analysts to
understand and prevent excessive returns.
14 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Impact
(for 500-store chain)
• 100% ROI in 5 months
• Stopped a crime ring on
the first day of rollout
• “Cost savings have been
huge”
Active Enterprise Intelligence™ in Retail
Single View of the Customer Across All Channels
Situation
Needed to add Web channel for selling
shoes.
Problem
Impact
Too much time and cost to keep multiple
customer systems synchronized. Realized
they needed just one customer database,
not one more for the Web, in addition to
Call Center, and POS/Store databases.
Solution
Adopted an ADW strategy, moved all
customer data to one Teradata system,
revised data models to cover all channels,
added web channel for commerce, used
web services, added TASM to handle
multiple workload types
15 > 09/2009
Copyright Teradata © 2007-2009 – All rights
• 1M tactical hits to the
EDW per day from the
POS, Call Center, and
Web with 0.11 sec
response time
• Runs simultaneously
with back-office BI,
reports, and ETL
workloads
• Eliminated all other
customer data systems
Change is Fast and Getting Faster
New Challenges for Database Technology
What is the Measure of a Great Architecture?
Handle huge changes of underlying
technologies and dependent components
while continuing to deliver the key value
proposition.
17 > 09/2009
Copyright Teradata © 2007-2009 – All rights
18 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Processor Roadmap
CPU power radically increasing
90nm
process
2003
65nm
process
45nm
process
2005
2007
Source – Intel Corporation
19 > 09/2009
22nm
process
2009
Dual Core
2011
Multi Core
DUAL/MULTI-CORE
PERFORMANCE
SPECInt2000
Hyper-Threading
32nm
process
5X
SINGLE-CORE
PERFORMANCE
2000
2004
Copyright Teradata © 2007-2009 – All rights
2008+
What Does Shared Nothing Mean?
• 1985 – Every hardware part, every line of software –
“pure” shared nothing
• 1995 – Multiple units of parallelism sharing CPU, memory
• 2004 – Multiple units of parallelism sharing multiple
cores, memory
• 2009 – Multiple units of parallelism sharing same
physical spindles – but still not sharing data
• Future – Multiple units of parallelism in Virtual
machines/cloud not even knowing what physical machine
it is on or sharing
20 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Teradata MPP Server Architecture
• Nodes
Dual BYNET Interconnects
> Incrementally scalable to
1024 nodes
• Operating System
> Linux, Windows, Unix
• Storage
> Independent I/O
> Scales per node
SMP Node1
SMP Node2
SMP Node3
SMP Node4
Operating Sys
Operating Sys
Operating Sys
Operating Sys
CPU1
CPU1
CPU1
CPU1
CPU2
Memory
CPU2
Memory
CPU2
Memory
CPU2
Memory
• BYNET Interconnect
> Fully scalable bandwidth
• Connectivity
> Fully scalable
> Channel – ESCON/FICON
> LAN, WAN
• Server Management
> One console to view
the entire system
Server
Management
21 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Shared Nothing - Dividing the Work
• “Virtual processors” (vprocs) do the work
• Two types
> AMP: owns and operates on the data
> PE: handles SQL and external interaction
• Configure multiple vprocs per hardware node
> Take full advantage of SMP CPU and memory
• Each vproc has many threads of execution
> Many operations executing concurrently
> Each thread can do work for any user, transaction
• Software is equivalent regardless of configuration
> No user changes as system grows from small SMP to huge
MPP
22 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Shared Nothing - Dividing the Work
• Basis of Teradata scalability
> Each AMP owns an equal slice of the disk
> Only that AMP reads that slice
Coordination
cost
• No single point of control for any operation
> I/O, Buffers, Locking, Logging, Dictionary
> Nothing centralized
> Exponential communication costs avoided
Teradata
# Nodes
Logs
AMPs
Locks
Buffers
I/O
23 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Teradata Data Distribution
• Rows automatically distributed evenly by hash
partitioning
>
>
>
Even distribution results in scalable performance
Done in real-time as data are loaded, appended, or changed.
Hash map defined and maintained by the system
Table A Table B Table C
–
>
>
>
2**32 hash codes, 64K buckets distributed to AMPs
Prime Index (PI) column(s) are hashed
Hash is always the same - for the same values
No reorgs, repartitioning, space management
Primary Index
Teradata Parallel Hash Function
AMP1
AMP2
P
M
24 > 09/2009
AMP3
P
D
M
M
P
D
Data Fields
AMP4 ……………………………………………………… AMPn
P
D
RowHash (Hash Bucket)
M
P
D
M
Copyright Teradata © 2007-2009 – All rights
P
D
M
P
D
M
P
D
M
P
D
M
D
8
7
6
5
5.5
6.4
6.0
.155
4
3
.080
2
1
36 GB
73 GB
146 GB
Disk Drive Capacity
Random I/O; 48K block; 80% read
25 > 09/2009
Copyright Teradata © 2007-2009 – All rights
.044
Performance per Capacity
MB/Sec/GB
Disk Drive Bandwidth (MB / Sec)
Disk Capacity Exploding
with Little Increase in Performance
Platform Change
• Focus used to be
> Optimization of expensive CPU cycles
> Micro-management of precious disk space
• Now
> Manage I/O
> Balance CPU power to the I/O capacity
> Find new ways to optimize I/O, trading for CPU use as
necessary
> Pulling 2.5GB/sec per node continuous
• Discontinuity coming
> SSDs become price competitive and reliable
26 > 09/2009
Copyright Teradata © 2007-2009 – All rights
File System
• Teradata wrote a new rule book
> Old one written by IBM 35 years ago, used by all mainstream
DBMSs today - except Teradata
• File system built of raw slices
• Rows stored in blocks
> Variable length
> Grow and shrink on demand
> Rows located dynamically
– May be moved to reclaim space, defrag
> Maximum block size is configurable
– System default or per table
– 8K to 128K
– Change dynamically
• Indexes are just rows in tables
• Has evolved from direct management of single spindles to
completely virtualized storage, not even knowing spindle
location
27 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Workload Management Evolution
• 1984 – pure timeshare
• 1987 – 4 priorities, defined by user
• 1995 – multiple priorities in multiple partitions
• 2000 – weighted workload groups
• 2004 – queuing, reserved resources, focus on tactical
work
• 2009 – Visualization and detailed workgroup
management
• Future – Set service level goals, our job to deliver
28 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Active Workload Management
• Manage workloads
> Reduce server congestion
• Dynamically adjust
in-flight task priority
Speed
> Turn the dial – change
priorities
> Performance, performance,
performance
• Get maximum throughput
Speed
60
• Fast active access queries
75
Active Data
Warehouse
Speed
10
Active
Load
29 > 09/2009
Active
Access
Active
Events
Copyright Teradata © 2007-2009 – All rights
Speed
25
Query and
Reporting
TASM Reporting/Monitoring - 13.10
30 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Availability Requirements
Strategic Intelligence
Operational Intelligence
Users
1000000
100000
Dual
Active
10000
1000
Mission Critical
100
Business Critical
10
IT, Finance,
Planners,
Power
Users,
Data Miners
31 > 09/2009
Executives,
Middles
Managers,
Marketing
Category Mgr, Operational
Line
Employees
Managers,
Service
Managers
Copyright Teradata © 2007-2009 – All rights
Consumers
Suppliers
B2B
“Always ON” – An Elusive Challenge
• Unplanned downtime
> Hardware faults
> Software faults
> Hangs
• Planned downtime
> Software upgrade
> Hardware upgrade
> Data center maintenance
• “Disasters”
> Multi-component failures
> Building disasters
> Area disasters
• And optimize resource value to the business
• And avoid hidden costs and surprises
> Eg Major performance variations
• Major opportunity for research – but must be holistic
> Reaches far beyond core database
32 > 09/2009
Copyright Teradata © 2007-2009 – All rights
Real time Operational Actions
2. Flight rerouted
causing missed
connections.
1. Customer makes
multi-segment
travel
reservation
Strategic
Intelligence
WebSphere MQ,
Oracle AQ,
Microsoft MSMQ
Operational
Intelligence
6. Customer rebooked
and notified.
7. Airport
operations
adjusted
33 > 09/2009
“Active”
Enterprise Data
Warehouse
Copyright Teradata © 2007-2009 – All rights
3. What are the
customers’ flying
history?
4. How profitable is
each customer?
5. Which customers
experienced delays
or other problems
in last 6 months?
Real Time Customer Management
4. Is this customer
approaching the
predicted loss rate
for their segment?
5. What offers are
available for this
customer?
1. Customer
inserts Total
Rewards Card
at Slot Machine
6. Message sent to
floor Luck
Ambassador with
customer offer to
prevent additional
losses.
Strategic
Intelligence
TIBCO
Operational
Intelligence
“Active”
Enterprise Data
Warehouse
34 > 09/2009
2. What is the customer’s
past spending history in
all our casinos?
3. What is a significant
loss for this person
based on market
segment, past and
predicted behavior?
Copyright Teradata © 2007-2009 – All rights
That’s a Wrap!
• Business requires a new level of decision making
> Many more decisions by many more people much faster
> Current representation of the state of the enterprise
• Data Warehouse must evolve to support the
requirements of Active Enterprise Intelligence
• Technology must evolve to deal with the new
requirements
> Rich area for research and innovation
> Change view of what data warehouse/BI means
• Teradata driving an aggressive roadmap to meet real
business requirements
35 > 09/2009
Copyright Teradata © 2007-2009 – All rights
36 > 09/2009
Copyright Teradata © 2007-2009 – All rights