Microsoft Research Directions Jim Gray Senior Researcher Microsoft Corporation [email protected] http://www.Research.Microsoft.com/~Gray ™ Microsoft Research       Goal: pursue strategic technologies for Microsoft Founded in 1991 200 researchers in 12 areas Redmond, San Francisco, Cambridge England Growing.

Download Report

Transcript Microsoft Research Directions Jim Gray Senior Researcher Microsoft Corporation [email protected] http://www.Research.Microsoft.com/~Gray ™ Microsoft Research       Goal: pursue strategic technologies for Microsoft Founded in 1991 200 researchers in 12 areas Redmond, San Francisco, Cambridge England Growing.

Microsoft Research
Directions
Jim Gray
Senior Researcher
Microsoft Corporation
[email protected]
http://www.Research.Microsoft.com/~Gray
™
Microsoft Research






Goal: pursue strategic technologies
for Microsoft
Founded in 1991
200 researchers in 12 areas
Redmond, San Francisco,
Cambridge England
Growing to 600 by 2001
Internationally recognized research teams



Many publications, conference presentations
Leadership roles in professional societies, journals,
conferences
Direct involvement with product and service
groups at Microsoft
Microsoft Research Themes

Programming tools, methodologies
and techniques


Advanced interactivity and intelligence



Basic block tool, program analysis, IP
Speech, natural language, vision
Decision theory, 3D graphics, UI
Systems and architecture

OS, databases, scalable servers
Advanced Development Tools

Analysis of executables



Dynamic analysis driven by user scenarios
Instrumented code
Automatic reorganization of executables



Reduction of code working set size
Branch straightening
Boot ordering for boot-time reduction
Initial Results


Reduced code working sets up to 50%
Improved throughput by 10%
Delivered to ~35 clients
Pages referenced

Windows NT working set size
500
400
Original
Optimized
300
200
100
0
0
20
40
60
Seconds
80
100
Speech Technology

Speech recognition


Dictation



Speaker-independent,
large vocabulary
Discrete and continuous speech
Trainable speech synthesis


Speaker-independent, command, and
control
Prosody and concatenative speech units
learned from corpus
Download from MS Research web site
Natural Language



Broad-coverage syntax analyzes
unrestricted text
Dictionary-based semantic network
provides growing knowledge base
Flexible underlying system for
multiple languages
Robotics
Vision
Machine
learning
University
Discourse/
pragmatics
Library
UI
2nd F
Dictionary
SR
Concept
normalizing
Sense
choosing
Logical form
Interactive movies
Info highway cruiser
Advanced summary
Interactive games
Peedy+
NL query
Improved IR
Bob+
Enhanced help
Improved SR
Semantic critiques
Probs
(DT)
fjeiofjdksl fjeiofjdksl
eriowe.asm eriowe.asm
qweiqpo eroqweiqpo ero
oei iqpwe iiooei iqpwe iio
qwpe ec,l;aklqwpe ec,l;akl
Revised syntax
Initial syntax
Morphology
Auto indexing
Syntactic critiques
Phrase spacing
Find and replace
Levels Of Writing Critiques




We scheduled the next
meeting for noon.
Each of the products are
designed to help.
I saw the Grand Canyon
flying to Arizona.
Ladies are requested not
to have children in the bar.
(From a sign in a Norwegian
cocktail lounge)
Comic Chat




Comic panels
based on chat input
Users control
character's emotions
Comic strip acts as
compelling record
of the conversation
Automated:






Character placement
Balloon construction
Balloon layout
Camera zoom
Panel breaks
Etc.
3D Graphics Research

Bring very high-performance,
high-quality graphics to PCs



Modeling



Interactive
Uniform treatment of multimedia
Representation of 3D models
Automatic simplification
Animation
Simplification Problem
70,000
8,700
34,100
4,200
2,600
2,300
Competing goals: accuracy and conciseness
Vision Projects

3D reconstruction from video
and images



Motion analysis for video compression
Model acquisition for rendering
Visual human/computer interaction


Communication by gestures
and expressions
Multimodal speech/vision interfaces
Motion Analysis

Convert masked images into a
background sprite for contentbased coding [Scrunch]
+
+
=

Working with Softimage
on motion tracking
+
Video-Based 3D Modeling

Convert a video sequence into a solid
3D model based on object silhouettes

Being used in Lumigraph project
Systems Research Areas

Scalable, fault-tolerant servers and
services


Most of this talk is about scalable servers
Other OS projects






Video & Audio servers - NetShow
Real time OS for set-top boxes
WindowsCE grew from an AT project
High-performance distributed computing
Zero Admin Windows
IPv6
1987: 256 Tps Benchmark



$ 14 million computer (Tandem)
A dozen people
False floor, two rooms of machines
Admin expert
A 32 node processor array
Hardware experts
Auditor
Network expert
Simulate 25,600 clients
Performance
expert
Manager
DB expert
A 40 GB disk array (80 drives)
OS expert
1997: 10 Years Later
One person and one box = 1250 tps




One breadbox ~ 5x 1987 machine room
23 GB is hand-held
One person does all the work
Cost/tps is 1,000x less
1 micro dollar per transaction
Hardware expert
OS expert
Net expert
DB expert
App expert
4x200 Mhz cpu
1/2 GB DRAM
12x4 GB disk
3x7x4 GB
disk arrays
Thesis
Many little beat few big
$1
million
Mainframe
14"




1 MM
$100,000
Mini
9"
$10,000
Micro
5.25"
3.5"
3
Pico Processor
Nano
1 MB
10 picosecond ram
100MB
10 nanosecond ram
10 GB
10 microsecond ram
1TB
10 millisecond disc
2.5" 1.8"
100 TB 10 second tape archive
Smoking, hairy golf ball
How to connect the many little parts?
How to program the many little parts?
Fault tolerance?
1 M SPECmarks, 1TFLOP
106 clocks to bulk ram
Event-horizon on chip
VM reincarnated
Multiprogram cache,
On-chip SMP
Future Super Server:
4T Machine

Array of 1,000 4B machines







A few megabucks
Challenge:







1 bps processors
1 BB DRAM
10 BB disks
1 Bbps comm lines
1 TB tape robot
Manageability
Programmability
Security
Availability
Scalability
Affordability
As easy as a single system
CPU
50 GB Disc
5 GB RAM
Cyber Brick
a 4B machine
Future servers are CLUSTERS
of processors, discs
Distributed database techniques
make clusters work
The Hardware Is In Place
And then a miracle occurs

?

SNAP: scalable
network and platforms
Commodity-distributed
OS built on:



Commodity platforms
Commodity network
interconnect
Enables parallel
applications
Scalable Computers
BOTH SMP And Cluster
SMP
Super Server
Departmental
Server
Personal
System
Grow up with SMP
4xP6 is now standard
Grow out with cluster
Cluster has inexpensive parts
Cluster
of PCs
What TPC-Benchmarks Say

PC technology 2.5x cheaper than high-end SMPs
PC performance is 1/4 high-end SMPs
4xP6
vs
24x UltraSparc
9.1k tpmC @ 49$/tpmC vs
31 ktpmC @ 109$/tpmc
6x more cpus, 3.5x more thruput.
NT 2.3 ktpmC/cpu vs Solaris 1.3 ktpmC/cpu
Still, UltraSparc performance IS impressive





Commodity solutions will come
10000
MS SQL Server tpmC vs Time
(200%/year growth)
300.0
MS SQL Server $/tpmC vs Time
(200%/year better)
250.0
tpmC

200.0
$/tpmC

150.0
5000
100.0
50.0
0
5/ 95
9/ 95
12/ 95
3/ 96
7/ 96
time
10/ 96
1/ 97
4/ 97
8/ 97
0.0
5/95
12/95
7/96
time
1/97
8/97
HP’s New TPC-C Result
TPC Price/tpmC
50
Oracle on UltraSPARC, 31 k tpmC
45
40
38
Microsoft, HP, 9.1 k tpmc
34
35
29
30
24
25
20
15
12
11
9
10
8
6
5
5
0
processor
disk
software
net
total/10
How Big Are Windows NT
™
SQL Servers ?

Study found




Several at 50 GB to 100 GB nodes
A few multi-node up to one TB
http://131.107.1.182/research/barc/gray/SQL
Server Scaleability.doc
None beyond 100 GB per node


®
A survey shows relatively few operational DBs
beyond 1 TB (1 TB ~ 500K$ of disk!)
http://www.wintercorp.com/topten.html
Want to “pioneer” large DBs on Windows NT
Goal

Build a 1 TB SQL Server database



Demo it on the Internet


Show off Windows NT and
SQL Server scalability
Stress test the product
WWW accessible by anyone
So data must be




1 TB
Unencumbered
Interesting to everyone everywhere
And not offensive
to anyone anywhere
The Hardware






DEC Alpha +
324 StorageWorks
Drives (1.4 TB)
SQL Server 7.0
USGS data
Russian Space data
Two meter
resolution
images
SPIN-2
Image Data Sources
300 GB
Src: USGS
and UCSB
UCSB
missing
some
DOQs
DOQ
Spin-2
500 GB
Worldwide
LOB app
New data
coming
Demo
Cluster Advantages

Clients and servers made from
the same stuff


Fault tolerance:


Spare modules mask failures
Modular growth


Inexpensive: built with
commodity components
Grow by adding small modules
Parallel data search

Use multiple processors and disks
Cluster: Shared What?

Shared memory multiprocessor





Shared disk cluster




Multiple processors, one memory
All devices are local
DEC, SG, Sun Sequent 16..64 nodes
Easy to program, not commodity
An array of nodes
All shared common disks
VAXcluster + Oracle
Shared nothing cluster



Each device local to a node
Ownership may change
Tandem, SP2, “Wolfpack”
Clusters Being Built

Teradata 500 nodes
(50K$/slice)

Tandem, VMScluster 150 nodes
Intel, 9,000 nodes @ $55 million
(100K$/slice)
(6K$/slice)


Teradata, Tandem, DEC moving
to Windows NT + low slice price

IBM: 512 nodes @ $100 million

PC clusters (bare handed) at dozens of nodes
Web servers (msn, PointCast...), DB servers

Key technology is the applications



Applications distribute data
Applications distribute execution
“It’s the applications STUPID!”
(200K$/slice)
Billion Transactions per Day Project







Built a 45-node Windows NT Cluster
(with help from Intel & Compaq)
> 900 disks
All off-the-shelf parts
Using SQL Server &
DTC distributed transactions
DebitCredit Transaction
Each node has 1/20 th of the DB
Each node does 1/20 th of the work
15% of the transactions are “distributed”
Billion Transactions Per Day Hardware



45 nodes (Compaq Proliant)
Clustered with 100 Mbps Switched Ethernet
140 cpu, 13 GB, 3 TB.
Type
Workflow
MTS
SQL Server
Distributed
Transaction
Coordinator
TOTAL
nodes
CPUs
DRAM
ctlrs
disks
20
Compaq
Proliant
2500
20
Compaq
Proliant
5000
5
Compaq
Proliant
5000
45
20x
20x
20x
20x
RAID
space
20x
2
128
1
1
2 GB
20x
20x
20x
20x
4
512
4
20x
36x4.2GB
7x9.1GB
130 GB
5x
5x
5x
5x
5x
4
256
1
3
8 GB
140
13 GB
105
895
3 TB
1.2 B tpd







1 B tpd ran for 24 hrs.
Sized for 30 days
Linear growth
5 micro-dollars per
transaction
Out-of-the-box software
Off-the-shelf hardware
AMAZING!
How Much Is 1 Billion Tpd?

1 billion tpd = 11,574 tps (transactions per second)
~ 700,000 tpm (transactions/minute)
ATT


Visa does ~20 million tpd




600,000 tpd
Bank of America



400 million customers
250K ATMs worldwide
7 billion transactions
(card+cheque) in 1994
New York Stock Exchange


185 million calls per peak day (worldwide)
20 million tpd checks cleared
(more than any other bank)
1.4 million tpd ATM transactions
Millions of Transactions Per Day
Mtpd

1,000.
900.
800.
100.
700.
600.
500.
10.
400.
300.
1.
200.
100.
0.
0.1
1 Btpd
Visa
Worldwide Airlines Reservations: 250 Mtpd
ATT
BofA
NYSE
Clusters (Plumbing)

Single-system image




Fault tolerance


Naming
Protection/security
Management/load balance
“Wolfpack” demo
Hot pluggable hardware
and software
So, What’s New?




When slices cost $50,000, you buy 10 or 20
When slices cost $5,000 you buy 100 or 200
Manageability, programmability, usability
become key issues (total cost of ownership)
PCs are much easier to use and program
MPP
vicious cycle
No customers!
New
New
New
New
New
New
New
New
MPP and app MPP and app MPP and app MPP and app
NewOS
NewOS
NewOS
NewOS
CP/commodity
virtuous cycle:
Standards allow progress
and investment protection
Apps
Standard
OS and Hardware
Customers
Windows NT Server
Clustering
®
High availability on standard hardware







Standard API for clusters on many platforms
No special hardware required
Resource Group is unit of failover
2-node cluster in beta
Typical resources:
test now
Available H1 ’97
 Shared disk, printer...
>2 node is next
 IP address, NetName
SQL Server and Oracle
 Service (Web,SQL, File, Print
Demo on it today
Mail, MTS …)
Key concepts
API to define
System: a node
 Resource groups
Cluster: systems working together
 Dependencies
Resource: HW/SW module
 Resources
Resource dependency: resource
GUI administrative interface
needs another
A consortium of 60 HW and
Resource group: fails over as a unit
SW vendors (everybody who
Dependencies: do not cross
is anybody)
group boundaries
Where We Are Today

Clusters moving fast



Technology ahead of schedule




OLTP
“Wolfpack”
CPUs, disks, tapes, wires...
OR databases are evolving
Parallel DBMSs are evolving
HSM still immature
Metcalf’s Law
Network Utility = Users2

How many connections
can it make?





One user: no utility
100,000 users: a few contacts
1 million users: many on Net
1 billion users: everyone on Net
That is why the Internet is so “hot”

Exponential benefit
Moore’s First Law

XXX doubles every 18 months
60% increase per year




Micro processor speeds
Chip density
Magnetic disk density
Communications bandwidth
WAN bandwidth approaching
LAN speeds
1GB
128MB
1 chip memory size
( 2 MB to 32 MB)
8MB
1MB
128KB
8KB
1970
bits: 1K

1990

The past does not matter
10x here, 10x there, soon you’re talking REAL change
PC costs decline faster than any other platform


2000
4K 16K 64K 256K 1M 4M 16M 64M 256M
Exponential growth:


1980
Volume and learning curves
PCs will be the building bricks of all future systems

Bumps In The Moore’s
Law Road
DRAM:


1988: United States
antidumping rules
1993-1995: ?price flat
1,000,000
10,000
100
1
1970

Magnetic disk:


1965-1989: 10x/decade
1989-1996: 4x/3year!
$/MB of DRAM
1980
1990
2000
$/MB of DISK
10,000
100
1
100X/decade
.01
1970
1980
1990
2000
Gordon Bell’s Seven
Price Tiers
10$:
100$:
1,000$:
10,000$:
100,000$:
1,000,000$:
10,000,000$:
wrist watch computers
pocket/ palm computers
portable computers
•
personal
computers (desktop)
departmental computers (closet)
site computers (glass house)
regional computers (glass castle)
Super server: costs more than $100,000
“Mainframe”: costs more than $1 million
Must be an array of processors, disks, tapes, comm ports
Bell’s Evolution Of
Computer Classes
Technology enables two evolutionary paths:
1. Constant performance, decreasing cost
2. Constant price, increasing performance
Mainframes (central)
Log price

Minis (dept.)
WSs
PCs (personals)
Time
??
1.26 = 2x/3 yrs - 10x/decade; 1/1.26 = .8
1.6 = 4x/3 yrs - 100x/decade; 1/1.6 = .62
Software Economics



An engineer costs about
$150,000/year
R&D gets [5%...15%]
of budget
Need [$3 million…
$1 million] revenue
per engineer
Intel: $16 billion
Profit
22%
R&D
8%
SG&A
11%
Tax
12%
P&S
47%
Microsoft: $9 billion
Profit
24%
SG&A
34%
Tax
13%
Product
and Service
13%
IBM: $72 billion
Profit
Tax 6%
5%
R&D
16%
R&D
8%
Oracle: $3 billion
Profit
15%
Tax
7%
SG&A
22%
P&S
59%
P&S
26%
R&D
9%
SG&A
43%
Software Economics:
Bill’s Law
Price =



Units
+
Marginal _Cost
Bill Joy’s law (Sun): don’t write software for less
than 100,000 platforms @ $10 million engineering
expense, $1,000 price
Bill Gate’s law: don’t write software for less than 1,000,000
platforms @ $10 engineering expense, $100 price
Examples:




Fixed_ Cost
UNIX versus Windows NT: $3,500 versus $500
Oracle versus SQL Server: $100,000 versus $6,000
No spreadsheet or presentation pack on UNIX/VMS/...
Commoditization of base software and hardware
Gordon Bell’s
Platform Economics


Traditional computers: custom or
semi-custom, high-tech and high-touch
New computers: high-tech and no-touch
100000
10000
Price (K$)
Volume (K)
Application
price
1000
100
10
1
0.1
0.01
Mainframe
WS
Computer type
Browser
Grove’s Law
The New Computer Industry



Horizontal
integration
is new structure
Each layer
picks best from
lower layer
Desktop (C/S)
market


1991: 50%
1995: 75%
Function
Operation
Example
AT&T
Integration
EDS
Applications
SAP
Middleware
Baseware
Systems
Oracle
Microsoft
Compaq
Silicon and Oxide
Intel and Seagate