Commodity Database Servers Jim Gray Microsoft Research [email protected] http://Research.Microsoft.com/~Gray/talks Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s.

Download Report

Transcript Commodity Database Servers Jim Gray Microsoft Research [email protected] http://Research.Microsoft.com/~Gray/talks Outline • Status report on Commodity Server Performance • Why Most VLDBs will be Multi-Media Servers • Preview of Microsoft’s.

Commodity Database Servers
Jim Gray
Microsoft Research
[email protected]
http://Research.Microsoft.com/~Gray/talks
1
Outline
• Status report on Commodity Server
Performance
• Why Most VLDBs will be Multi-Media
Servers
• Preview of Microsoft’s SQL Server 7
2
Status Report on Commodity
Server Performance
• Standards:
– TPC,
– SpecWeb, ...
• Product benchmarks: e.g.
– SAP,
– PeopleSoft,…
• Both indicate that
– NT is 18 months behind Unix-SMP performance
– but clusters can make up the difference
3
TPC-C
Cluster
• IBM SP2 12x8 cpu
Oracle 8.2
57 ktpmC, 148$/tpmc
• Predict:
large & inexpensive NT
cluster number this year.
SMP
• HP 9000 16 cpu,
Sybase 11
52.1 ktpmC, 82$/tpmC
• NEC 8 cpu
SQL Server
14.9 ktpmC, 60$/tpmC
40
35
Diseconomy of Scale:
Big systems are Expensive
27$/tpmC vs 148$/tpmC
tpmC per k$
30
25
20
15
10
5
0
11007
16101
52117
tpmC
57053
4
TPC-D
• Performance Champions:NCR/Teradata
– 1 TB:32x4 node clusters
– 300 GB: 24x4 node cluster
– 100 GB: 8x4 cluster
• All use Teradata software on NCR
World-Mark Intel-based hardware
1,000 GB NCR
300 GB NCR
100 GB NCR
WorlkMark Server
WorlkMark Server
WorldMark Server
(QppD)
3069
9260
12149
(QthD)
1205
3117
3912
5
Outline
• Status report on Commodity Server
Performance
• Why Most VLDBs will be Multi-Media
Servers
• Preview of Microsoft’s SQL Server 7
6
VLDB Reality Test
• California DMV
– ~ 20 million cars, drivers, doctors,
barbers,..
– Some drivers have moving violations
– DMV knows about 1.5 KB about
each one
– 30 GB total.
• Microsoft: too big says DoJ
– 40B$ revenue (in company life time)
– ~1 billion unit sales: @ 100 B = 100
GB
– ~100 M customers: @1 KB = 100
GB
• Wall Mart (no one bigger!)
– Sells 10 B items per year
– 100 bytes/item => 1 TB
• ATT
– 300 M calls per day (peak day)
– 10 B calls per year
– 100 b/call = 1 TB
7
VLDB Reality Test
• Its HARD to find 1 TB
of transaction data
– 100 M web hits/day
– 250 B/hit
– 1TB/year
• Its HARD to find 1TB
of text data
– 100 M web pages
– 10 KB/page
– = 1 TB
• How do they do it?
• Lots of indices?
– No: that is only 3x
• Precomputed Aggregates?
– Yes: OLAP benchmark
• Start at 30 MB
• Use 2.7 GB or 6GB database
– But: this is dumb
• Email?
– Microsoft: 6 TB
– Hotmail: 3.5 TB
– AOL?
8
Data Tidal Wave
• Seagate 47GB drive @ 3k$
– 100 GB penny per MB drive coming in 2000
• 10 $/GB = 10 k$/ Terabyte! (in y2k)
– Everyone can afford one
• What’s a terror bite?
– If you sell ten billion items a year (e.g Wal-Mart)
– And you record 100 bytes on each one
– Then you got a Terror Bite
• Where will the terror bytes come from?
– Multimedia (like the TerraServer) and...
9
Multi Media: Very Large DBs
• Photo is 100 KB, not 100 B
– So, photo DBs are 1,000x larger
• Examples:
–
–
–
–
Scanned documents
Photo records of products/people/places
Surveillance
Scientific monitoring
10
Some TerrorByte Databases
• EOS/DIS (picture of planet each week)
– 15 PB by 2007
• Federal Reserve Clearing house: images of checks
– 15 PB by 2006 (7 year history)
• Sloan Digital Sky Survey:
– 40 TB raw, 2 TB cooked
• TerraServer:
11
Scaleup - Big Database
• Build a 1 TB SQL Server database
– Show off Windows NT and
SQL Server scalability
– Stress test the product
• Data must be
–
–
–
–
1 TB
Unencumbered
Interesting to everyone everywhere
And not offensive to anyone anywhere
• Loaded
– 1.1 M place names from Encarta World Atlas
– 1 M Sq Km from USGS (1 meter resolution)
– 2 M Sq Km from Russian Space agency (2 m)
• Will be on web (world’s largest atlas)
• Sell images with commerce server.
12
TerraServer
World’s Largest PC!
• 324 disks (2.9 terabytes)
• NT EE & SQL 7.0
• 8 x 440Mhz Alpha CPUs
• Photo of the planet
USGS and Russian
images
• 10 GB DRAM
13
Background
• Someday
• Earth is 500 Tera-meters square
– USA is 10 tm2
• 100 TM2 land in 70ºN to 70ºS
• We have pictures of 6% of it
•
•
•
•
– 3 tsm from USGS
– 2 tsm from Russian Space Agency
Compress 5:1 (JPEG) to 1.5 TB.
Slice into 10 KB chunks
Store chunks in DB
Navigate with
– Encarta™ Atlas
– multi-spectral image
– of everywhere
– once a day / hour
1.8x1.2 km2 tile
10x15 km2 thumbnail
20x30 km2 browse image
40x60 km2 jump image
• globe
• gazetteer
– StreetsPlus™ in the USA
14
USGS Digital Ortho Quads (DOQ)
• US Geologic Survey
• 3 TeraBytes
• Most data not yet published
• Based on a CRADA
– TerraServer makes data available.
1x1 meter
4 TB
Continental
US
New Data
Coming
USGS “DOQ”
15
Russian Space Agency(SovInfomSputnik)
SPIN-2 (Aerial Images is Worldwide Distributor)
•
•
•
•
•
•
1.5 Meter Geo Rectified imagery of (almost) anywhere
Almost equal-area projection
De-classified satellite photos (from 200 KM),
More data coming (1 m)
Want to sell imagery on Internet.
Putting 2 tm2 onto TerraServer.
SPIN-2
16
Demo
http://www.TerraServer.com
Microsoft
BackOffice
SPIN-2
17
Hardware
1TB Database Server
AlphaServer 8400 4x400. 10 GB RAM
324 StorageWorks disks
10 drive tape library (STC Timber Wolf DLT7000 )
SPIN-2
18
Software
Web Client
Image
Server
Active Server Pages
Internet
Information
Server 4.0
Java
Viewer
broswer
MTS
Terra-Server
Stored Procedures
HTML
The Internet
Internet Info
Server 4.0
Sphinx
(SQL Server)
Microsoft Automap
ActiveX Server
Terra-Server DB
Automap Server
Terra-Server Web Site
Internet Information
Server 4.0
Microsoft
Site Server EE
Image Delivery SQL Server
Application
7
19
Image Provider Site(s)
System
Management &
Maintenance
Backup and Recovery
–
–
–
–
STC 9717 Tape robot
Legato NetWorker™
Sphinx Backup/Restore Utility
Clocked at 80 MBps!!
SQL Server Enterprise Mgr
– DBA Maintenance
– SQL Performance Monitor
20
TerraServer File Group Layout
• Convert 324 disks to 28 RAID5 sets
plus 28 spare drives
• Make 4 NT volumes (RAID 50)
595 GB per volume
• Build 30 20GB files on each volume
• DB is File Group of 120 files
E:
F:
G:
H:
21
Gazetteer Design
• Classic Snowflake Schema
• Fast First hint to Optimizer
PlaceGrid
Place
CountrySearch
AlternateName
CountryID
GazSrcID
1148
Country
CountryID
CountryName
UNcode
264
StateSerach
AlternateName
CountryID
StateID
FreatureID
GazSrcID
State
StateID
CountryID
StateName
1083
PlaceID
ImageFlag
AlternateName
Name
CountryID
StateID
TypeID
GazSourcID
Latitude
Longitude
UGridID
ZGridID
DOQdate
SPIN2date
3776
1,089,897
ZGridID
BestPlaceName
XDistance
YDistrance
50,000,000
FeatureType
TypeID
Description
13
GazetteerSource
GazSrcID
Description
1
22
Image Data Design
• Image pyramid stored in DBMS (250 M recs)
OriginalMetaData
ImageMeta
OrigMetaID
SrcID
ImageSource
Agency
SourcePhotoID
SourcePhotoDate
SourceDEMDate
MetaDataDate
ProductionSystem
ProductionDate
DataFileSize
Compression
HeaderBytes
…
80 other fields
ImgMetaID
OrigMetaID
ImgStatus
ImgDate
ImgTypeID
JumpPixHeight
JumpPixWidth
BrowsePixHeight
BrowsePixWidth
ThumbPixWidth
ThumbPixHeight
CutCol
CutRow
MidLat
MidLong
NELat
NELong
NWLat
NWLong
SELat
SELong
SWLat
SWLong
UGridID
UTMZone
XUtmID
YUtmID
XGridID
YGridID
ZGridID
650 k SPIN2
2 M USGS
ImgSource
ImgType
ImgTypeID
ImgFileDesc
ImgFileExt
MimeStr
SrcID
SrcName
SrcTblName
SrcDescription
GridSysID
ImgTypeID
Pick
Log
UGridHits
Name
Description
Link
PickDate
URL
Time
<extensive
list of action
parameters
URL
UGridID
ZTileGridID
count
10
TileMeta
xxx
xxx
Jump
Browse
Thumb
Tile
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
1
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
.65 M SPIN2
1.5 M USGS
.65 M SPIN2
1.5 M USGS
.65 M SPIN2
1.5 M USGS
16 M SPIN2
96 M USGS
ImgMetaID
OrigMetaID
SrcID
ImgStatus
ImgDate
ImgTypeID
TilePixHeight
TilePixWidth
CutCol
CutRow
MidLat
MidLong
NELat
NELong
NWLat
NWLong
SELat
SELong
SWLat
SWLong
UGridID
UTMZone
XUtmID
YUtmID
XGridID
YGridID
ZGridID
16 M SPIN2
96 M USGS
23
4
2
650 k SPIN2
2 M USGS
Image Delivery and Load
DLT
Tape
DLT
Tape
“tar”
NT
DoJob
\Drop’N’
LoadMgr
DB
Wait 4
Load
Backup
LoadMgr
LoadMgr
...
ESA
Alpha
Server
4100
100mbit
EtherSwitch
60
4.3 GB
Drives
Alpha
Server
4100
ImgCutter
\Drop’N’
\Images
10: ImgCutter
20: Partition
30: ThumbImg
40: BrowseImg
45: JumpImg
50: TileImg
55: Meta Data
60: Tile Meta
70: Img Meta
80: Update Place
Enterprise Storage Array
STC
DLT
Tape
Library
108
9.1 GB
Drives
108
9.1 GB
Drives
108
9.1 GB
Drives
Alpha
Server
8400
24
SQL 7 Testimonial
• We started using it March 4 1997
–
–
–
–
SQL 7 Pre-Alpha
SQL 7 Alpha
SLQ 7 Beta 1
SQL 7 Beta
• Loaded the DB twice
– (we made application mistakes)
•
•
•
•
Now doing it “right”
Reliability: Great! SQL 7 never lost data
Ease of use: Great!
Functionality: Great!
25
Outline
• Status report on Commodity Server Performance
• Why Most VLDBs will be Multi-Media Servers
• Preview of Microsoft’s SQL Server 7
26
SQL 7: Easy & Functional

Easy



Scalability



Data Warehousing 

Dynamic self management
Multi-site management
Alert/response management
Job scheduling and execution
Scriptable management
profiling/tuning tools
Fully Unicode
English Language Query
Integrated text search engine
27
Made It Easier!
(fewer knobs)
• Desktop & Workgroups
– Auto Configure Engine / Dynamic Disk/memory
– Reduce Learning Curve, Increase Productivity
– Self-Managing SQLAgent, Wizards, “Task Pads”
• Large Organizations
– Deploy/manage hundreds of SQL Servers
– Lower TOC for Large Environments
– Multi-Server Operations/ “Lights-out” Environment
28
Multi-Site Management
• Admin servers from one place
• Automate simple stuff
• Wizards for common stuff
• Manage arrays of servers
– operations, security,…
– Replication
– Import/export
•Interface is scriptable
– COM object model
– Script with Java, VB, ...
•Scheduling and Multi-step jobs
29
DBA and Developer Tools
• Built-in GUI
– data/schema design
– data query & edit
– intgrated with programming tools
• SQL Server Profiler
– Selected server events and trace criteria
– “Capture” output to screen or replay
• SQL Server Expert
–
–
–
–
Analyzes actual server usage history
Makes recommendations to improve performance
Recommends Index design
Recommends operations procedures
30
•
•
•
•
Wizards and GUIs
Wizards galore (over 50 at last count)
MS Access as a query interface
Built-in data access tools (integrated with tools)
Graphical show plan
31
Many New Wizards...
•
•
•
•
•
•
•
•
•
•
Create a Database
Scheduled Backup
Create a Maintenance Plan
Create a Scheduled Job
Create an Alert
Security Wizard
Import Data to SQL Server
Export Data From SQL Server
Clustering (Wolfpack)
Index Tuning Wizard


Web Assistant
Register Servers
 Configure Replication
 Create Publication
 Create Pull Subscription
 Create Push Subscription
 Replication Partitioning
 Create an Index
 Create a Stored Procedure
 Create a View
 More to come...
32
Distributed Management Objects
(SQL-DMO)
• COM Interfaces for administering SQL Server
– Embedded Administration (no UI)
• All Administration Functions Supported
– Server, Database Configurations, Settings
– Object Creation, Security, Replication, Scripting,..
– 40+ Objects, 1000+ properties and methods
• Integration Interface for ISV Administration
– I.e., Baan using DMO for Scripted App Install
• Scripting Via VBA and Jscript + DCOM
33
DMO: Object Model (Overview)
SQL Server
SQLAgent
Databases
Jobs
Users
Tasks
DB Options
Alerts
Transaction Logs
Operators
Publications
FileGroups
Files
Logins
Configurations
Linked Servers
Remote Login
Table
View
Columns
Stored Procs
Indexes
Rules
Keys (PK/FK)
Defaults
Triggers
34
DMO Scripting
• Backup a Database
Set MyServer = CreateObject("SQLDMO.SQLServer")‘Create Server Object
Set MyBackup = CreateObject("SQLDMO.Backup") ‘Create Backup Object
MyServer.Name = “MSSALES”
MyServer.LoginSecure = True
MyServer.Connect
‘ Identify Server
‘ Windows NT Auth
‘ Connect
MyBackup.Database = ”SALESII”
MyBackup.Files = "\\MyServer\Backups\" _
+ MyBackup.Database +”.bak”
MyBackup.SQLBackup MyServer
‘ Database to backup
‘ Backup Location
‘ Name Backup File
‘ Back it Up
MyServer.Disconnect
‘ We’re Done!
35
Scalability




Scalability



Data Warehousing 
Easy
Win9x/NTW version
Dynamic row-level locking
Improved query optimizer
Intra-query parallelism
64-bit support
Replication
Distributed query
High Availability Clusters
36
Scale Down to Windows 95-98
•
•
•
•
•
Full function (same as NTW)
Self managing
Many tools
Integration with Next MS Access
Great for imbedded apps
37
Replication
•
•
•
•
•
Transactional and Merge
Remote update
ODBC and OLE DB subscribers
Wizards
Performance
OS 390
DB2
Publisher
2PC,
RPC
Distributor
DB2
VSAM
Subscriber CICS
Subscriber Subscriber Subscriber Updating Subscriber
(immediate updates)
38
Parallel Query
SMP & Disk Parallelism
Global Agg.
Result 50 rows
+
4 x 50 rows
Local Agg.
+
+
+
+
Disks
• Plus Distributed
• Plus Hash Join (fanciest on the planet)
• Plus Optimized Partitioned views
50,000 rows
•# of emp. per group
•total inc. per group
39
Distributed Heterogeneous Queries
Data Fusion / Integration
Join spread sheets,
databases, directories,
Text DBs
etc.
Any source that exposes
OLE DB interfaces
SQL Server as gateway,
even on the desktop
Directory
Service
Database
(DB2, VSAM, Oracle, …)
Spreadsheet
SQL 7.0
Query
Processor
Photos
Mail
Maps
Documents
40
and the Web
Utilities
The Key to LARGE Databases
• Backup
– Fuzzy
– Parallel
– Incremental
– Restartable
• Recovery
– Fast
– File granularity
• Reorganize
– shrinks file
– reclusters file
• Auto-repair
41
Data Warehousing




Warehousing Framework
Easy
Visual data modeler
Microsoft repository
Data transformation services
Scalability
(DTS)
 Plato & Dcube - Multi
Dimensional Data Cubes
 English query 2.0
Data Warehousing
 Built-in text-index engine
42
Key Microsoft Data Warehouse Programs
• Data Warehouse Framework (DWF)
– Process -- for building, using and managing
– Pipeline -- for metadata flow
– Protocols -- to integrate components
• Data Warehouse Alliance (DWA)
– Partners -- ISVs pledged to the framework and its parts
– Products -- complete spectrum from Microsoft and
third-parties
43
Microsoft Data Warehousing Framework
Managing
Building
Using
Data Warehouse Design
Data Mart Design**
(logical/physical schema*/ data flow**)
(Cubes/Star schema)
Operational
Data
(OLE-DB **)
DB Schema**
Data
Transformations
(DTS**)
Transformation**
Data Marts
(SQL Server** &
OLAP Server**)
Scheduling
End-User Tools
(Excel**,
Access,
English Query)
OLAP
Microsoft Repository** (Persistent Shared Meta-Data)
Data Warehouse Management
(Console*, Scheduling**, Events**,Topology*,)
** available in SQL Server 7 (* partially)
Data Flow
Meta-Data44Flow
Alliance for Data Warehousing
Technical and marketing relationship
Supports SQL Server storage engine
Third-party products tested with BackOffice
DW Build
BMC
Data Mirror
Execusoft
Informatica
Microsoft
Platinum Technology
Praxis
Prism
Sagent
SAS
Sterling
V-Mark
DW Access
Andyne
Business Objects
Cognos
IQ Software
Microsoft
NCR Data Mining
Pilot
Platinum Technology
Sagent
SAS
Seagate
Wall Data
45
DW Alliance Milestones
•
•
•
•
9/96 - Launched with 8 founding members
3/97 - Design review
1/97 - 6/97 - Expanded to 21 members
7/97 - Repository design review
– Team development of shared metadata
• 9/97 - OLE DB for OLAP API specification
• 1H’98 - Integration development with
Sphinx DTS and Replication APIs
46
Microsoft Repository
•
•
•
•
Based on joint Sterling/Microsoft design (Shipped 97Q2)
Wide distribution:VB, Visual Studio and Third-Parties
Designed with over 60 vendors
Extended to support DB schema, transformations, OLAP
– Key element of the DW Framework
• UML is abstract model
• Everything viewable
in UML terms
UML
UMX
CDE
DTM
COM
GEN
DBM
SQL
OCL
UML Unified Modeling Language
GEN Generic
UMX Uml Extensions
DBM Database Model
CDE Component Descriptions
SQL Microsoft SQL Server
COM Component Object Model
OCL Oracle
DTM Data Type Model
47
Repository & Data Warehousing
• Common infrastructure -- the meta-data pipeline
• Supports interoperability between data
warehousing tools and products
• Process:
– Initial spec developed with 12 vendors
– Gathering feedback now
– Final spec review in Redmond, 2/98
48
Data Transformation
• Workflow system manages Data Pump
– Pre-defined transforms using the DTS GUI
– Procedural VB Script, JavaScript, VBA, any COM
• Multi-stream in, Multi-stream out
Repository
Metadata
Transforms
Oracle > SQL Server
IDTSDataPump IUnknown
Transformation
Objects
ActiveX Scripts
Data Pump
Function Example() Transform()
If DTSSource(“CreditRating”) = “1” then
DTSDestination(” Risk ") = ”Good"
Else If DTSSource(”Credit") = ”2”
DTSDestination(” Risk ") = ”Average”
Else If DTSSource(”Credit") = ”3”
DTSDestination(” Risk ") = ”Bad”
Else
Example = DTS_SkipRow
End if
End Function
SQLAgent
Multiserver
Operations
49
Transformations
• Data quality and validation
– Missing values, scrubbing, exception handling
• Data integration
– Heterogeneous query, join keys, elim. dups
• Transforms
– Combine/decompose multiple columns to one
• Aggregation
• Central metadata
– Business rules, data lineage
50
ROLAP
User
View
Data
load
Persistent
Store
Data
access
User
View
MD
Cache
Hybrid
• Debates between
MOLAP and
ROLAP vendors
obscure customer
needs
• Plato is the product
that best supports
MOLAP, ROLAP
and Hybrid and
offers the most
seamless
integration of all
three
• Users & apps only
see cubes
MOLAP
Flexible Architecture
User
View
MD
Cache
51
Plato and Dcube
and HOLAP
By Year
By Make
By Make & Year
RED
WHITE
BLUE
Source table
Europe
By Color & Year
Sum
Partition 1
By Color
“Plato”
User 1
ROLAP
Designer
USA
Partition 2
Dcube
SQL
MD SQL
Client
app
Asia
Partition 3
ROLAP
Dcube
“Plato”
server
Client
app
User
2
52
How Plato
Handles Data Explosion
Product
Family
Product
Month
Quarter
Quarter
Product
Family
Products
Month
Aggregation Wizard
finds the aggregations
that feed the most
other aggregations
Fact Table
53
How Plato Handles Data Explosion
• Aggregation Wizard finds the “80-20” rule in the data
– The 20 percent of all possible pre-aggregations that provide 80 percent of
the performance gain
– Analyses level counts for each dimensions and parent-child ratios for
each level
• Independent of
OLAP data
model
54
OLE DB For OLAP
•
OLE DB extensions to access MD data
– Part of OLE DB 2.0
•
•
•
One new object: Dataset
Enhancements to existing objects
Heavily leverages OLE DB
55
OLE DB For OLAP
Objects And Interfaces
CoCreateInstance
Command
Enumerator
Flattened Rowset
Data source
Dataset
Session
Range Rowset
Schema Rowsets
56
English Query
57
OBJECT RELATIONAL
The Next Great DBMS Wave
•
•
•
•
•
All the DB vendors are adding objects
Microsoft is adding DBs to Objects
Integration with COM+
Gives user-defined types and objects
Plug-ins will be Billion dollar industry
– Blades for SQL Server razor
58
Outline
• Status report on Commodity Server Performance
• Why Most VLDBs will be Multi-Media Servers
• Preview of Microsoft’s SQL Server 7
59